Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks Prof. H. Gokhan ILK Ankara University, Faculty of Engineering, Electrical&Electronics Eng. Dept 1

Contact Details & Background Address : Ankara University, Faculty of Engineering Electrical &Electronics Engineering Department 06100 Ankara, Turkey ilk@ieee.org Ph.D on DCT Based Prototype Interpolation Speech Coding University of Manchester, UK Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 2

How do we design a speech codec, today? Analog Speech => ADC => Digital Speech => Source encoding Analog Speech <= DAC <= Digital Speech <= Source decoding Channel Coding Conventional vocoders (voice coders) encode speech (both source and channel) and transmit it (or use IP) at source and then decode at the destination. The bit rate is almost always fixed Speech Coding 3

A little bit of theory and literature Compression Expansion Original speech Input speech Compressed speech Speech signal exhibit both short and long term correlation and LPC analysis removes most of the short term correlation. We can however, remove the long term correlation, (get rid of long term redundancy), i.e. Pitch related correlation The key however is not to disturb pitch and formant frequencies. A detailed investigation of these parameters could be found in: W. Verhelst, Overlap-add methods for time-scaling of speech, Speech Commun. 30 (2000) 207 221. ETSI Workshop on selected items on telecommunication quality matters 4

Earlier work If pitch and formant frequencies are not disturbed by the compression algorithm then one can compress speech (before coding) with a compression rate of beta and then expand the decoded speech at the receiver side with an expansion factor of 1/beta. If for example beta=0.5, then one can have a full duplex channel at a half duplex bandwidth. Why? Because the same signal is represented at half duration with minimum distortion. ETSI Workshop on selected items on telecommunication quality matters 5

Waveform Similarity Overlap and ADD Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 6

Is that all? We have tried this approach with many different algorithms operating in time and frequency domains. Our experiments with the new NATO standard, Stanag 4591, MELP (mixed excitation linear predictive vocoder) indeed proved that WSOLA in conjuntion with MELP produces high quality output and it is computationally efficient at half the bit rate. Details can be found H.G. Ilk, S. Tugac, Channel and source considerations of a bit rate reduction technique for a possible wireless communications system s performance enhancement, IEEE Trans. Wireless Commun. vol. 4(1), January 2005, pp. 93 99 But what if we would like to make most of our bandwidth? Then the system should be adaptive. It means WSOLA should operate at different time compression factors. This is an engineer s dream come true. You dont operate at constant or multi-rate bit rates but you operate at flexible bit rates. That is YOU tell me how much bandwidth you got and I give tou the best quality possible. Not the other way around!!! A new approach in speech coding 7

What is Our Contribution then?? We need different beta as we proceed in time but WSOLA (or any time scale modification algorithm is unable to provide that) ETSI Workshop on selected items on telecommunication quality matters 8

Our contribution is the use of half symmetric windows and the modification of the synthesis formula Half symmetric windows in order to go back to the original time scale Expansion Modification of the WSOLA algorithm, synthesis formula 9

Finally! There is not much time and space for the mathematical derivations but details may be found at: H.G. Ilk, S. Guler,"Adaptive Time Scale Modification of Speech for Graceful Degrading Voice Quality in Congested Networks for VoIP Applications", Signal Processing Vol 86, pp. 127 139, January 2006 (Cited 12 times, 2006-2011). This approach is very useful as the proposed algorithm can be applied to any commercial system as a pre and post process, without requiring any modification in the codec s internal design. TURKCELL, 2008 Best Academic Work award Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 10

Last but not least! This approach is particularly useful in VoIP (Voice Over IP) applications in dynamic networks because the load may change abruptly and it is not symmetric at each direction. It is also equally valuable in congested voice networks because today s networks either allow multi-rates (2.4, 4.8 or 16.0 kb/s) or drops your call. In addition it can be used for speech and/or audio storage As far as the author knows, no voice network can accommodate new subscribers, as they join, with a graceful degradation in voice quality, adaptively. One day all coders will be designed this way ETSI Workshop on selected items on telecommunication quality matters 11

Samples Male Steve wore a bright red cashmere sweater Female Before Thursday s exam review every formula 128 kb/s PCM 2.4 kb/s 1.0 kb/s 128 kb/s PCM 2.4 kb/s 1.0 kb/s Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 12

Thank you very much for listening Any questions? 13