Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Similar documents
APPLICATIONS OF DSP OBJECTIVES

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Digital Speech Processing and Coding

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Overview of Code Excited Linear Predictive Coder

Low Bit Rate Speech Coding

EE482: Digital Signal Processing Applications

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Wideband Speech Coding & Its Application

Data Transmission at 16.8kb/s Over 32kb/s ADPCM Channel

Audio Signal Compression using DCT and LPC Techniques

Chapter IV THEORY OF CELP CODING

The Channel Vocoder (analyzer):

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Distributed Speech Recognition Standardization Activity

Communications Theory and Engineering

Enhanced Waveform Interpolative Coding at 4 kbps

Ninad Bhatt Yogeshwar Kosta

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

International Journal of Advanced Engineering Technology E-ISSN

Speech Compression Using Voice Excited Linear Predictive Coding

10 Speech and Audio Signals

Sound Synthesis Methods

EUROPEAN pr ETS TELECOMMUNICATION March 1996 STANDARD

Transcoding of Narrowband to Wideband Speech

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Lesson 8 Speech coding

Proceedings of Meetings on Acoustics

Page 0 of 23. MELP Vocoder

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

LOSS CONCEALMENTS FOR LOW-BIT-RATE PACKET VOICE IN VOIP. Outline

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

Comparison of CELP speech coder with a wavelet method

The Optimization of G.729 Speech codec and Implementation on the TMS320VC5402

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Overview of Digital Mobile Communications

TCET3202 Analog and digital Communications II

LMR Codecs Why codecs? Which ones? Why care? Joseph Rothweiler Sensicomm LLC Hudson NH

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Comparing CSI and PCA in Amalgamation with JPEG for Spectral Image Compression


Speech Synthesis; Pitch Detection and Vocoders

Comparison of Low-Rate Speech Transcoders in Electronic Warfare Situations: Ambe-3000 to G.711, G.726, CVSD

Spanning the 4 kbps divide using pulse modeled residual

DEPARTMENT OF DEFENSE TELECOMMUNICATIONS SYSTEMS STANDARD

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

SIGNAL CLASSIFICATION BY DISCRETE FOURIER TRANSFORM. Pauli Lallo ABSTRACT

Class 4 ((Communication and Computer Networks))

Auditory modelling for speech processing in the perceptual domain

SGN Audio and Speech Processing

Mel Spectrum Analysis of Speech Recognition using Single Microphone

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Interoperability of FM Composite Multiplex Signals in an IP Based STL

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Telecommunication Electronics

Pulse Code Modulation

Bilateral Waveform Similarity Overlap Add Approach based on Time Scale Modification Principle for Packet Loss Concealment of Speech Signals

Voice Transmission --Basic Concepts--

Analog and Telecommunication Electronics

ENHANCED TIME DOMAIN PACKET LOSS CONCEALMENT IN SWITCHED SPEECH/AUDIO CODEC.

EC 2301 Digital communication Question bank

Speech Coding using Linear Prediction

Analysis/synthesis coding

A Modified Image Coder using HVS Characteristics

Adaptive Forward-Backward Quantizer for Low Bit Rate. High Quality Speech Coding. University of Missouri-Columbia. Columbia, MO 65211

Interoperability of FM Composite Multiplex Signals in an IP based STL

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

Low Bit Rate Speech Coding Using Differential Pulse Code Modulation

CODING TECHNIQUES FOR ANALOG SOURCES

Speech Enhancement using Wiener filtering

An Approach to Very Low Bit Rate Speech Coding

JPEG Image Transmission over Rayleigh Fading Channel with Unequal Error Protection

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Packet Loss Concealment for Speech Transmissions in Real-Time Wireless Applications

Multiplexing Module W.tra.2

Ap A ril F RRL RRL P ro r gra r m By Dick AH6EZ/W9

ece 429/529 digital signal processing robin n. strickland ece dept, university of arizona ECE 429/529 RNS

UNIVERSITY OF SURREY LIBRARY

Defense Technical Information Center Compilation Part Notice

Audio and Speech Compression Using DCT and DWT Techniques

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

Summary of the PhD Thesis

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

Speech Coding in the Frequency Domain

Voice Excited Lpc for Speech Compression by V/Uv Classification

EUROPEAN pr ETS TELECOMMUNICATION November 1996 STANDARD

Dct Based Image Transmission Using Maximum Power Adaptation Algorithm Over Wireless Channel using Labview

SGN Audio and Speech Processing

Chapter 2: Digitization of Sound

Factors impacting the speech quality in VoIP scenarios and how to assess them

ENGR 4323/5323 Digital and Analog Communication

3GPP TS V5.0.0 ( )

COMPUTER COMMUNICATION AND NETWORKS ENCODING TECHNIQUES

NOISE ESTIMATION IN A SINGLE CHANNEL

ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY

EUROPEAN pr I-ETS TELECOMMUNICATION June 1996 STANDARD

Transcription:

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks Prof. H. Gokhan ILK Ankara University, Faculty of Engineering, Electrical&Electronics Eng. Dept 1

Contact Details & Background Address : Ankara University, Faculty of Engineering Electrical &Electronics Engineering Department 06100 Ankara, Turkey ilk@ieee.org Ph.D on DCT Based Prototype Interpolation Speech Coding University of Manchester, UK Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 2

How do we design a speech codec, today? Analog Speech => ADC => Digital Speech => Source encoding Analog Speech <= DAC <= Digital Speech <= Source decoding Channel Coding Conventional vocoders (voice coders) encode speech (both source and channel) and transmit it (or use IP) at source and then decode at the destination. The bit rate is almost always fixed Speech Coding 3

A little bit of theory and literature Compression Expansion Original speech Input speech Compressed speech Speech signal exhibit both short and long term correlation and LPC analysis removes most of the short term correlation. We can however, remove the long term correlation, (get rid of long term redundancy), i.e. Pitch related correlation The key however is not to disturb pitch and formant frequencies. A detailed investigation of these parameters could be found in: W. Verhelst, Overlap-add methods for time-scaling of speech, Speech Commun. 30 (2000) 207 221. ETSI Workshop on selected items on telecommunication quality matters 4

Earlier work If pitch and formant frequencies are not disturbed by the compression algorithm then one can compress speech (before coding) with a compression rate of beta and then expand the decoded speech at the receiver side with an expansion factor of 1/beta. If for example beta=0.5, then one can have a full duplex channel at a half duplex bandwidth. Why? Because the same signal is represented at half duration with minimum distortion. ETSI Workshop on selected items on telecommunication quality matters 5

Waveform Similarity Overlap and ADD Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 6

Is that all? We have tried this approach with many different algorithms operating in time and frequency domains. Our experiments with the new NATO standard, Stanag 4591, MELP (mixed excitation linear predictive vocoder) indeed proved that WSOLA in conjuntion with MELP produces high quality output and it is computationally efficient at half the bit rate. Details can be found H.G. Ilk, S. Tugac, Channel and source considerations of a bit rate reduction technique for a possible wireless communications system s performance enhancement, IEEE Trans. Wireless Commun. vol. 4(1), January 2005, pp. 93 99 But what if we would like to make most of our bandwidth? Then the system should be adaptive. It means WSOLA should operate at different time compression factors. This is an engineer s dream come true. You dont operate at constant or multi-rate bit rates but you operate at flexible bit rates. That is YOU tell me how much bandwidth you got and I give tou the best quality possible. Not the other way around!!! A new approach in speech coding 7

What is Our Contribution then?? We need different beta as we proceed in time but WSOLA (or any time scale modification algorithm is unable to provide that) ETSI Workshop on selected items on telecommunication quality matters 8

Our contribution is the use of half symmetric windows and the modification of the synthesis formula Half symmetric windows in order to go back to the original time scale Expansion Modification of the WSOLA algorithm, synthesis formula 9

Finally! There is not much time and space for the mathematical derivations but details may be found at: H.G. Ilk, S. Guler,"Adaptive Time Scale Modification of Speech for Graceful Degrading Voice Quality in Congested Networks for VoIP Applications", Signal Processing Vol 86, pp. 127 139, January 2006 (Cited 12 times, 2006-2011). This approach is very useful as the proposed algorithm can be applied to any commercial system as a pre and post process, without requiring any modification in the codec s internal design. TURKCELL, 2008 Best Academic Work award Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 10

Last but not least! This approach is particularly useful in VoIP (Voice Over IP) applications in dynamic networks because the load may change abruptly and it is not symmetric at each direction. It is also equally valuable in congested voice networks because today s networks either allow multi-rates (2.4, 4.8 or 16.0 kb/s) or drops your call. In addition it can be used for speech and/or audio storage As far as the author knows, no voice network can accommodate new subscribers, as they join, with a graceful degradation in voice quality, adaptively. One day all coders will be designed this way ETSI Workshop on selected items on telecommunication quality matters 11

Samples Male Steve wore a bright red cashmere sweater Female Before Thursday s exam review every formula 128 kb/s PCM 2.4 kb/s 1.0 kb/s 128 kb/s PCM 2.4 kb/s 1.0 kb/s Vienna, 27-28 November 2012 ETSI Workshop on selected items on telecommunication quality matters 12

Thank you very much for listening Any questions? 13