Voice and Audio Compression for Wireless Communications

Size: px

Start display at page:

Download "Voice and Audio Compression for Wireless Communications"

Frederica Price
5 years ago
Views:

1 page 1 Voice and Audio Compression for Wireless Communications by c L. Hanzo, F.C.A. Somerville, J.P. Woodard, H-T. How School of Electronics and Computer Science, University of Southampton, UK

2 page i Contents Preface and Motivation 1 Acknowledgements 11 I Speech Signals and Waveform Coding 13 1 Speech Signals and Coding Motivation of Speech Compression Basic Characterisation of Speech Signals Classification of Speech Codecs Waveform Coding Time-domain Waveform Coding Frequency-domain Waveform Coding Vocoders Hybrid Coding Waveform Coding Digitisation of Speech Quantisation Characteristics Quantisation Noise and Rate-Distortion Theory Non-uniform Quantisation for a Known PDF: Companding PDF-independent Quantisation using Logarithmic Compression The µ-law Compander The A-law Compander Optimum Non-uniform Quantisation Chapter Summary Predictive Coding Forward Predictive Coding DPCM Codec Schematic Predictor Design i

3 page ii ii CONTENTS Problem Formulation Covariance Coefficient Computation Predictor Coefficient Computation Adaptive One-word-memory Quantization DPCM Performance Backward-Adaptive Prediction Background Stochastic Model Processes The 32 kbps G.721 ADPCM Codec Functional Description of the G.721 Codec Adaptive Quantiser G.721 Quantiser Scale Factor Adaptation G.721 Adaptation Speed Control G.721 Adaptive Prediction and Signal Reconstruction Speech Quality Evaluation G.726 and G.727 ADPCM Coding Motivation Embedded G.727 ADPCM coding Performance of the Embedded G.727 ADPCM Codec Rate-Distortion in Predictive Coding Chapter Summary II Analysis by Synthesis Coding 83 3 Analysis-by-synthesis Principles Motivation Analysis-by-synthesis Codec Structure The Short-term Synthesis Filter Long-Term Prediction Open-loop Optimisation of LTP parameters Closed-loop Optimisation of LTP parameters Excitation Models Adaptive Post-filtering Lattice-based Linear Prediction Chapter Summary Speech Spectral Quantization Log-area Ratios Line Spectral Frequencies Derivation of the Line Spectral Frequencies Computation of the Line Spectral Frequencies Chebyshev-description of Line Spectral Frequencies Spectral Vector Quantization Background Speaker-adaptive Vector Quantisation of LSFs

4 page iii CONTENTS iii Stochastic VQ of LPC Parameters Background The Stochastic VQ Algorithm Robust Vector Quantisation Schemes for LSFs LSF Vector-quantisers in Standard Codecs Spectral Quantizers for Wideband Speech Coding Introduction to Wideband Spectral Quantisation Statistical Properties of Wideband LSFs Speech Codec Specifications Wideband LSF Vector Quantizers Memoryless Vector Quantization Predictive Vector Quantization Multimode Vector Quantization Simulation Results and Subjective Evaluations Conclusions on Wideband Spectral Quantisation Chapter Summary RPE Coding Theoretical Background The 13 kbps RPE-LTP GSM Speech encoder Pre-processing STP analysis filtering LTP analysis filtering Regular Excitation Pulse Computation The 13 kbps RPE-LTP GSM Speech Decoder Bit-sensitivity of the GSM Codec A Tool-box Based Speech Transceiver Chapter Summary Forward-Adaptive CELP Coding Background The Original CELP Approach Fixed Codebook Search CELP Excitation Models Binary Pulse Excitation Transformed Binary Pulse Excitation Excitation Generation TBPE Bit Sensitivity Dual-rate Algebraic CELP Coding ACELP Codebook Structure Dual-rate ACELP Bitallocation Dual-rate ACELP Codec Performance CELP Optimization Introduction Calculation of the Excitation Parameters Full Codebook Search Theory

5 page iv iv CONTENTS Sequential Search Procedure Full Search Procedure Sub-Optimal Search Procedures Quantization of the Codebook Gains Calculation of the Synthesis Filter Parameters Bandwidth Expansion Least Squares Techniques Optimization via Powell s Method Simulated Annealing and the Effects of Quantization CELP Error-sensitivity Introduction Improving the Spectral Information Error Sensitivity LSF Ordering Policies The Effect of FEC on the Spectral Parameters The Effect of Interpolation Improving the Error Sensitivity of the Excitation Parameters The Fixed Codebook Index The Fixed Codebook Gain Adaptive Codebook Delay Adaptive Codebook Gain Matching Channel Codecs to the Speech Codec Error Resilience Conclusions Dual-mode Speech Transceiver The Transceiver Scheme Re-configurable Modulation Source-matched Error Protection Low-quality 3.1 kbd Mode High-quality 3.1 kbd Mode Packet Reservation Multiple Access kbd System Performance kbd System Summary Multi-slot PRMA Transceiver Background and Motivation PRMA-assisted Multi-slot Adaptive Modulation Adaptive GSM-like Schemes Adaptive DECT-like Schemes Summary of Adaptive Multi-slot PRMA Chapter Summary Standard Speech Codecs Background The US DoD FS kbits/s CELP Codec Introduction LPC Analysis and Quantization The Adaptive Codebook The Fixed Codebook

6 page v CONTENTS v Error Concealment Techniques Decoder Post-Filtering Conclusion The IS-54 DAMPS speech codec The JDC speech codec The Qualcomm Variable Rate CELP Codec Introduction Codec Schematic and Bit Allocation Codec Rate Selection LPC Analysis and Quantization The Pitch Filter The Fixed Codebook Rate 1/8 Filter Excitation Decoder Post-Filtering Error Protection and Concealment Techniques Conclusion Japanese Half-Rate Speech Codec Introduction Codec Schematic and Bit Allocation Encoder Pre-Processing LPC Analysis and Quantization The Weighting Filter Excitation Vector Excitation Vector Channel Coding Decoder Post Processing The half-rate GSM codec Half-rate GSM codec outline Half-rate GSM Codec s Spectral Quantisation Error protection The 8 kbits/s G.729 Codec Introduction Codec Schematic and Bit Allocation Encoder Pre-Processing LPC Analysis and Quantization The Weighting Filter The Adaptive Codebook The Fixed Algebraic Codebook Quantization of the Gains Decoder Post Processing G.729 Error Concealment Techniques G.729 Bit-sensitivity Turbo-coded OFDM G.729 Speech Transceiver Background System Overview Turbo Channel Encoding

7 page vi vi CONTENTS OFDM in the FRAMES Speech/Data Sub Burst Channel model Turbo-coded G.729 OFDM Parameters Turbo-coded G.729 OFDM Performance Turbo-coded G.729 OFDM Summary G.729 Summary The Reduced Complexity G.729 Annex A Codec Introduction The Perceptual Weighting Filter The Open Loop Pitch Search The Closed Loop Pitch Search The Algebraic Codebook Search The Decoder Post Processing Conclusions The Enhanced Full-rate GSM codec Codec Outline Operation of the EFR-GSM Encoder Spectral Quantisation in the EFR-GSM Codec Adaptive Codebook Search Fixed Codebook Search The IS-136 Speech Codec IS-136 codec outline IS-136 Bitallocation scheme Fixed Codebook Search IS-136 Channel Coding The ITU G Dual-Rate Codec Introduction G Encoding Principle Vector-Quantisation of the LSPs Formant-based Weighting Filter The 6.3 kbps High-rate G Excitation The 5.3 kbps low-rate G excitation G Bitallocation G Error Sensitivity Advanced Multi-rate JD-CDMA Transceiver Multi-rate codecs and systems System Overview The Adaptive Multi-Rate Speech Codec AMR Codec Overview Linear Prediction Analysis LSF Quantization Pitch Analysis Fixed Codebook With Algebraic Structure Post-Processing The AMR Codec s Bit Allocation Codec Mode Switching Philosophy

8 page vii CONTENTS vii The AMR Speech Codec s Error Sensitivity Redundant Residue Number System Based Channel Coding Redundant Residue Number System Overview Source-Matched Error Protection Joint Detection Code Division Multiple Access Overview Joint Detection Based Adaptive Code Division Multiple Access System Performance Subjective Testing Conclusions Chapter Summary Backward-Adaptive CELP Coding Introduction Motivation and Background Backward-Adaptive G728 Schematic Backward-Adaptive G728 Coding G728 Error Weighting G728 Windowing Codebook Gain Adaption G728 Codebook Search G728 Excitation Vector Quantization G728 Adaptive Postfiltering Adaptive Long-term Postfiltering G728 Adaptive Short-term Postfiltering Complexity and Performance of the G728 Codec Reduced-Rate 16-8 kbps G728-Like Codec I The Effects of Long Term Prediction Closed-Loop Codebook Training Reduced-Rate 16-8 kbps G728-Like Codec II Programmable-Rate 8-4 kbps CELP Codecs Motivation kbps Codec Improvements kbps Codecs - Forward Adaption of the STP Synthesis Filter kbps Codecs - Forward Adaption of the LTP Initial Experiments Quantization of Jointly Optimized Gains kbps Codecs - Voiced/Unvoiced Codebooks Low Delay Codecs at 4-8 kbits/s Low Delay ACELP Codec Backward-adaptive Error Sensitivity Issues The Error Sensitivity of the G728 Codec The Error Sensitivity of Our 4-8 kbits/s Low Delay Codecs The Error Sensitivity of Our Low Delay ACELP Codec A Low-Delay Multimode Speech Transceiver

9 page viii viii CONTENTS Background kbps Codec Performance Transmission Issues Higher-quality Mode Lower-quality Mode Speech Transceiver Performance Chapter Summary III Wideband Coding and Transmission Wideband Speech Coding Subband-ADPCM Wideband Coding Introduction and Specifications G722 Codec Outline Principles of Subband Coding Quadrature Mirror Filtering Analysis Filtering Synthesis Filtering Practical QMF Design Constraints G722 Adaptive Quantisation and Prediction G722 Coding Performance Wideband Transform-Coding at 32 kbps Background Transform-Coding Algorithm Subband-Split Wideband CELP Codecs Background Subband-based Wideband CELP coding Motivation Low-band Coding Highband Coding Bit allocation Scheme Fullband Wideband ACELP Coding Wideband ACELP Excitation Wideband 32 kbps ACELP Coding Wideband 9.6 kbps ACELP Coding Turbo-coded Wideband Speech Transceiver Background and Motivation System Overview System Parameters Constant Throughput Adaptive Modulation Adaptive Wideband Transceiver Performance Multi mode Transceiver Adaptation Transceiver Mode Switching The Wideband G Codec Audio Codec Overview

10 page ix CONTENTS ix Detailed Description of the Audio Codec Wideband Adaptive System Performance Audio Frame Error Results Audio Segmental SNR Performance and Discussions G Audio Transceiver Summary and Conclusions Turbo-Detected IRCC AMR-WB Transceivers Introduction The AMR-WB Codec s Error Sensitivity System Model Design of Irregular Convolutional Codes An Example Irregular Convolutional Code UEP AMR IRCC Performance Results UEP AMR Conclusions The AMR-WB+ Audio Codec Introduction Audio requirements in mobile multimedia applications Summary of audio-visual services Bit rates supported by the radio network Overview of the AMR-WB+ codec Encoding the high frequencies Stereo encoding Complexity of AMR-WB Transport and file format of AMR-WB Performance of AMR-WB Summary of the AMR-WB+ codec Chapter Summary Advanced Multi-Rate Speech Transceivers Introduction The Adaptive Multi-Rate Speech Codec Overview Linear Prediction Analysis LSF Quantization Pitch Analysis Fixed Codebook With Algebraic Structure Post-Processing The AMR Codec s Bit Allocation Codec Mode Switching Philosophy Speech Codec s Error Sensitivity System Background System Overview Redundant Residue Number System (RRNS) Channel Coding Overview Source-Matched Error Protection Joint Detection Code Division Multiple Access Overview

11 page x x CONTENTS Joint Detection Based Adaptive Code Division Multiple Access System Performance Subjective Testing A Turbo-Detected Irregular Convolutional Coded AMR Transceiver Motivation The AMR-WB Codec s Error Sensitivity System Model Design of Irregular Convolutional Codes An Example Irregular Convolutional Code UEP AMR IRCC Performance Results UEP AMR Conclusions Chapter Summary MPEG-4 Audio Compression and Transmission Overview of MPEG-4 Audio General Audio Coding Advanced Audio Coding Gain Control Tool Psychoacoustic Model Temporal Noise Shaping Stereophonic Coding AAC Quantization and Coding Noiseless Huffman Coding Bit-Sliced Arithmetic Coding Transform-domain Weighted Interleaved Vector Quantization Parametric Audio Coding Speech Coding in MPEG-4 Audio Harmonic Vector Excitation Coding CELP Coding in MPEG LPC Analysis and Quantization Multi Pulse and Regular Pulse Excitation MPEG-4 Codec Performance MPEG-4 Space-Time Block Coded OFDM Audio Transceiver System Overview System parameters Frame Dropping Procedure Space-Time Coding Adaptive Modulation System Performance Turbo-Detected STTC Aided MPEG-4 Audio Transceivers Motivation and Background Audio Turbo Transceiver Overview The Turbo Transceiver Turbo Transceiver Performance Results MPEG-4 Turbo Transceiver Summary Turbo-Detected STTC Aided MPEG-4 Versus AMR-WB Transceivers

12 page xi CONTENTS xi Motivation and Background The AMR-WB Codec S Error Sensitivity The MPEG-4 TwinVQ Codec S Error Sensitivity The Turbo Transceiver Performance Results AMR-WB and MPEG-4 TwinVQ Turbo Transceiver Summary Chapter Summary IV Very Low Rate Coding and Transmission Overview of Low-rate Speech Coding Low Bit Rate Speech Coding Analysis-by-Synthesis Coding Speech Coding at 2.4kbps Background to 2.4kbps Speech Coding Frequency Selective Harmonic Coder Sinusoidal Transform Coder Multiband Excitation Coders Subband Linear Prediction Coder Mixed Excitation Linear Prediction Coder Waveform Interpolation Coder Speech Coding Below 2.4kbps Linear Predictive Coding model Short Term Prediction Long Term Prediction Final Analysis-by-Synthesis Model Speech Quality Measurements Objective Speech Quality Measures Subjective Speech Quality Measures kbps Selection Process Speech Database Chapter Summary Linear Predictive Vocoder Overview of a Linear Predictive Vocoder Line Spectrum Frequencies Quantization Line Spectrum Frequencies Scalar Quantization Line Spectrum Frequencies Vector Quantization Pitch Detection Voiced-Unvoiced Decision Oversampled Pitch Detector Pitch Tracking Computational Complexity Integer Pitch Detector Unvoiced Frames

13 page xii xii CONTENTS 13.5 Voiced Frames Placement of Excitation Pulses Pulse Energy Adaptive Postfilter Pulse Dispersion Filter Pulse Dispersion Principles Pitch Independent Glottal Pulse Shaping Filter Pitch Dependent Glottal Pulse Shaping Filter Results for Linear Predictive Vocoder Chapter Summary Wavelets and Pitch Detection Conceptual Introduction to Wavelets Fourier Theory Wavelet Theory Detecting Discontinuities with Wavelets Introduction to Wavelet Mathematics Multiresolution Analysis Polynomial Spline Wavelets Pyramidal Algorithm Boundary Effects Preprocessing the Wavelet Transform Signal Spurious Pulses Normalization Candidate Glottal Pulses Voiced-Unvoiced Decision Wavelet Based Pitch Detector Dynamic Programming Autocorrelation Simplification Chapter Summary Zinc Function Excitation Introduction Overview of Prototype Waveform Interpolation Zinc Function Excitation Coding Scenarios U-U-U Encoder Scenario U-U-V Encoder Scenario V-U-U Encoder Scenario U-V-U Encoder Scenario V-V-V Encoder Scenario V-U-V Encoder Scenario U-V-V Encoder Scenario V-V-U Encoder Scenario U-V Decoder Scenario U-U Decoder Scenario V-U Decoder Scenario

14 page xiii CONTENTS xiii V-V Decoder Scenario Zinc Function Modelling Error Minimization Computational Complexity Reducing the Complexity of Zinc Function Excitation Optimization Phases of the Zinc Functions Pitch Detection Voiced-Unvoiced Boundaries Pitch Prototype Selection Voiced Speech Energy Scaling Quantization Excitation Interpolation Between Prototype Segments ZFE Interpolation Regions ZFE Amplitude Parameter Interpolation ZFE Position Parameter Interpolation Implicit Signalling of Prototype Zero Crossing Removal of ZFE Pulse Position Signalling and Interpolation Pitch Synchronous Interpolation of Line Spectrum Frequencies ZFE Interpolation Example Unvoiced Speech Adaptive Postfilter Results for Single Zinc Function Excitation Error Sensitivity of the 1.9kbps PWI-ZFE Coder Parameter Sensitivity of the 1.9kbps PWI-ZFE coder Line Spectrum Frequencies Voiced-Unvoiced Flag Pitch Period Excitation Amplitude Parameters Root Mean Square Energy Parameter Boundary Shift Parameter Degradation from Bit Corruption Error Sensitivity Classes Multiple Zinc Function Excitation Encoding Algorithm Performance of Multiple Zinc Function Excitation A Sixth-rate, 3.8 kbps GSM-like Speech Transceiver Motivation The Turbo-coded Sixth-rate 3.8 kbps GSM-like System Turbo Channel Coding The Turbo-coded GMSK Transceiver System Performance Results Chapter Summary

15 page xiv xiv CONTENTS 16 Mixed-Multiband Excitation Introduction Overview of Mixed-Multiband Excitation Finite Impulse Response Filter Mixed-Multiband Excitation Encoder Voicing Strengths Mixed-Multiband Excitation Decoder Adaptive Postfilter Computational Complexity Performance of the Mixed-Multiband Excitation Coder Performance of a Mixed-Multiband Excitation Linear Predictive Coder Performance of a Mixed-Multiband Excitation and Zinc Function Prototype Excitation Coder A Higher Rate 3.85kbps Mixed-Multiband Excitation Scheme A 2.35 kbit/s Joint-Detection CDMA Speech Transceiver Background The Speech Codec s Bit Allocation The Speech Codec s Error Sensitivity Channel Coding The JD-CDMA Speech System System performance Conclusions on the JD-CDMA Speech Transceiver Chapter Summary Sinusoidal Transform Coding Below 4kbps Introduction Sinusoidal Analysis of Speech Signals Sinusoidal Analysis with Peak Picking Sinusoidal Analysis using Analysis-by-Synthesis Sinusoidal Synthesis of Speech Signals Frequency, Amplitude and Phase Interpolation Overlap-Add Interpolation Low Bit Rate Sinusoidal Coders Increased Frame Length Incorporating Linear Prediction Analysis Incorporating Prototype Waveform Interpolation Encoding the Sinusoidal Frequency Component Determining the Excitation Components Peak-Picking of the Residual Spectra Analysis-by-Synthesis of the Residual Spectrum Computational Complexity Reducing the Computational Complexity Quantizing the Excitation Parameters Encoding the Sinusoidal Amplitudes Vector Quantization of the Amplitudes Interpolation and Decimation

16 VOICE-BO page xv CONTENTS xv Vector Quantization Vector Quantization Performance Scalar Quantization of the Amplitudes Encoding the Sinusoidal Phases Vector Quantization of the Phases Encoding the Phases with a Voiced-Unvoiced Switch Encoding the Sinusoidal Fourier Coefficients Equivalent Rectangular Bandwidth Scale Voiced-Unvoiced Flag Sinusoidal Transform Decoder Pitch Synchronous Interpolation Fourier Coefficient Interpolation Frequency Interpolation Computational Complexity Speech Coder Performance Chapter Summary Conclusions on Low Rate Coding Summary Listening Tests Summary of Very Low Rate Coding Further Research Comparison of Speech Transceivers Background to Speech Quality Evaluation Objective Speech Quality Measures Introduction Signal to Noise Ratios Articulation Index Ceptral Distance Cepstral Example Logarithmic likelihood ratio Euclidean Distance Subjective Measures Quality Tests Comparison of Quality Measures Background Intelligibility tests Subjective Speech Quality of Various Codecs Speech Codec Bit-sensitivity Transceiver Speech Performance Chapter Summary A Constructing the Quadratic Spline Wavelets 827 B Zinc Function Excitation 831

17 page 1 CONTENTS 1 C Probability Density Function for Amplitudes 837 Bibliography 843 Index 887 Author Index 887

18 page 1 Preface and Motivation The Speech Coding Scene Despite the emergence of sophisticated high-rate multimedia services, voice communications remain the predominant means of human communications, although the compressed voice signals may be delivered via the Internet. The large-scale, pervasive introduction of wireless Internet services is likely to promote the unified transmission of both voice and data signals using the Voice over Internet Protocol (VoIP) even in the third - generation (3G) wireless systems, despite wasting much of the valuable frequency resources for the transmission of packet headers. Even when the predicted surge of wireless data and Internet services becomes a reality, voice remains the most natural means of human communications, although this may be delivered via the Internet. This book is dedicated to audio and voice compression issues, although the aspects of error resilience, coding delay, implementational complexity and bitrate are also at the centre of our discussions, characterising many different speech codecs incorported in source-sensitivity matched wireless transceivers. A unique feature of the book is that it also provides cuttingedge turbo-transceiver-aided research-oriented design examples and an a chapter on the VoIP protocol. Here we attempt a rudimentary comparison of some of the codec schemes treated in the book in terms of their speech quality and bitrate, in order to provide a road map for the reader with reference to Cox s work [1, 2]. The formally evaluated Mean Opinion Score (MOS) values of the various codecs portrayed in the book are shown in Figure 1. Observe in the figure that over the years a range of speech codecs have emerged, which attained the quality of the 64 kbps G.711 PCM speech codec, although at the cost of significantly increased coding delay and implementational complexity. The 8 kbps G.729 codec is the most recent addition to this range of the International Telecommunications Union s (ITU) standard schemes, which significantly outperforms all previous standard ITU codecs in robustness terms. The performance target of the 4 kbps ITU codec (ITU4) is also to maintain this impressive set of specifications. The family of codecs designed for various mobile radio systems - such as the 13 kbps Regular Pulse Excited (RPE) scheme of the Global System of Mobile communications known as GSM, the 7.95 kbps IS-54, and the IS-95 Pan-American schemes, the 6.7 kbps Japanese Digital Cellular (JDC) and 3.45 kbps half-rate JDC arrangement (JDC/2) - exhibits slightly lower MOS values than the ITU codecs. Let us now consider the subjective quality of these schemes in a little more depth. The 2.4 kbps US Department of Defence Federal Standard codec known as FS-1015 is the only vocoder in this group and it has a rather synthetic speech quality, associated with the lowest subjective assessment in the figure. The 64 kbps G.711 PCM codec and the G.726/G.727 Adaptive Differential PCM (ADPCM) schemes are waveform codecs. They exhibit a low im- 1

19 page 2 2 CONTENTS plementational complexity associated with a modest bitrate economy. The remaining codecs belong to the so-called hybrid coding family and achieve significant bitrate economies at the cost of increased complexity and delay. Excellent ITU4 New Research G.723 G.729 G.728 G.726 G.711 PCM Good MOS JDC/2 GSM IS54 IS96 JDC Fair MELP In-M FS1016 Complexity Delay FS1015 Poor bit rate (kb/s) Figure 1: Subjective speech quality of various codecs [1] c IEEE, 1996 Specifically, the 16 kbps G.728 backward-adaptive scheme maintains a similar speech quality to the 32 and 64 kbps waveform codecs, while also maintaining an impressively low, 2 ms delay. This scheme was standardised during the early nineties. The similar-quality, but significantly more robust 8 kbps G.729 codec was approved in March 1996 by the ITU. Its standardisation overlapped with the G codec developments. The G codec s 6.4 kbps mode maintains a speech quality similar to the G.711, G.726, G.727, G.728 and G.728 codecs, while its 5.3 kbps mode exhibits a speech quality similar to the cellular speech codecs of the late eighties. The standardisation of a 4 kbps ITU scheme, which we refer to here as ITU4 is also a desirable design goal at the time of writing. In parallel to the ITU s standardisation activities a range of speech coding standards have been proposed for regional cellular mobile systems. The standardisation of the 13 kbps RPE- LTP full-rate GSM (GSM-FR) codec dates back to the second half of the eighties, representing the first standard hybrid codec. Its complexity is significantly lower than that of the more recent Code Excited Linear Predictive (CELP) based codecs. Observe in the figure that there is also a similar-rate Enhanced Full-Rate GSM codec (GSM-EFR), which matches the speech quality of the G.729 and G.728 schemes. The original GSM-FR codec s development was followed a little later by the release of the 7.95 kbps Vector Sum Excited Linear Predictive

20 page 3 CONTENTS 3 (VSELP) IS-54 American cellular standard. Due to advances in the field the 7.95 kbps IS-54 codec achieved a similar subjective speech quality to the 13 kbps GSM-FR scheme. The definition of the 6.7 kbps Japanese JDC VSELP codec was almost coincident with that of the IS-54 arrangement. This codec development was also followed by a half-rate standardisation process, leading to the 3.2 kbps Pitch-Synchroneous Innovation CELP (PSI-CELP) scheme. The IS-95 Pan-American CDMA system also has its own standardised CELP-based speech codec, which is a variable-rate scheme, supporting bitrates between 1.2 and 14.4 kbps, depending on the prevalent voice activity. The perceived speech quality of these cellular speech codecs contrived mainly during the late eighties was found subjectively similar to each other under the perfect channel conditions of Figure 1. Lastly, the 5.6 kbps half-rate GSM codec (GSM-HR) also met its specification in terms of achieving a similar speech quality to the 13 kbps original GSM-FR arrangements, although at the cost of quadruple complexity and higher latency. Recently the advantages of intelligent multimode speech terminals (IMT), which can reconfigure themselves in a number of different bitrate, quality and robustness modes attracted substantial research attention in the community, which led to the standardisation of the High- Speed Downlink Packet Access (HSDPA) mode of the 3G wireless systems. The HSDPAstyle transceivers employ both adaptive modulation and adaptive channel coding, which result in a channel-quality dependent bit-rate fluctuation, hence requiring reconfigurable multimode voice and audio codecs, such as the Advanced Multi-Rate codec referred to as the AMR scheme. Following the standardisation of the narrowband AMR codec, the wideband AMR scheme referred to as the AMR-WB arrangement and encoding the 0-7 KHz band was also developed, which will also be characterised in the book. Finally, the most recent AMR codec, namely the so-called AMR-WB+ scheme will also be the subject of our discussions. Rcent research on sub-2.4 kbps speech codecs is also covered extensively in the book, where the aspects of auditory masking become more dominant. Finally, since the classic G.722 subband-adpcm based wideband codec has become obsolete in the light of exciting new developments in compression, the most recent trend is to consider wideband speech and audio codecs, providing susbtantially enhanced speech quality. Motivated by early seminal work on transform-domain or frequency-domain based compression by Noll and his colleagues, in this field the wideband G codec - which can be programmed to operate between 10 kbps and 32 kbps and hence lends itself to employment in HSDPA-style nearinstantaneously adaptive wireless communicators - is the most attractive candidate. This codec is portrayed in the context of a sophisticated burst-by-burst adaptive wideband turbocoded Orthogonal Frequency Division Multiplex (OFDM) IMT in the book. This scheme is also capable of transmitting high-quality audio signals, behaving essentially as a high-quality waveform codec. Mile-stones in Speech Coding History Over the years a range of excellent monographs and text books have been published, characterising the state-of-the-art at its various stages of development and constituting significant mile-stones. The first major development in the history of speech compression can be considered the invention of the vocoder, dating back to as early as Delta modulation was contrived in 1952 and later it became well established following Steele s monograph on the

21 page 4 4 CONTENTS topic in 1975 [3]. Pulse Coded Modulation (PCM) was first documented in detail in Cattermole s classic contribution in 1969 [4]. However, it was realised in 1967 that predictive coding provides advantages over memory-less coding techniques, such as PCM. Predictive techniques were analysed in depth by Markel and Gray in their 1976 classic treatise [5]. This was shortly followed by the often cited reference [6] by Rabiner and Schafer. Also Lindblom and Ohman contributed a book in 1979 on speech communication research [7]. The foundations of auditory theory were layed down as early as 1970 by Tobias [8], but these principles were not exploited to their full potential until the invention of the analysis by synthesis (AbS) codecs, which were heralded by Atal s multi-pulse excited codec in the early eighties [9]. The waveform coding of speech and video signals has been comprehensively documented by Jayant and Noll in their 1984 monograph [10]. During the eighties the speech codec developments were fuelled by the emergence of mobile radio systems, where spectrum was a scarce resource, potentially doubling the number of subscribers and hence the revenue, if the bitrate could be halved. The RPE principle - as a relatively low-complexity analysis by synthesis technique - was proposed by Kroon, Deprettere and Sluyter in 1986 [11], which was followed by further research conducted by Vary [12,13] and his colleagues at PKI in Germany and IBM in France, leading to the 13 kbps Pan-European GSM codec. This was the first standardised AbS speech codec, which also employed long-term prediction (LTP), recognising the important role the pitch determination plays in efficient speech compression [14, 15]. It was in this era, when Atal and Schroeder invented the Code Excited Linear Predictive (CELP) principle [16], leading to perhaps the most productive period in the history of speech coding during the eighties. Some of these developments were also summarised for example by O Shaughnessy [17], Papamichalis [18], Deller, Proakis and Hansen [19]. It was during this era that the importance of speech perception and acoustic phonetics [20] was duly recognised for example in the monograph by Lieberman and Blumstein. A range of associated speech quality measures were summarised by Quackenbush, Barnwell III and Clements [21]. Nearly concomitantly Furui also published a book related to speech processing [22]. This period witnessed the appearance of many of the speech codecs seen in Figure 1, which found applications in the emerging global mobile radio systems, such as IS-54, JDC, etc. These codecs were typically associated with source-sensitivity matched error protection, where for example Steele, Sundberg and Wong [23 26] have provided early insights on the topic. Further sophisticated solutions were suggested for example by Hagenauer [27]. Both the narrow-band and wide-band AMR, as wello as the AMR-WB+ (AMR) codecs [28, 29] are capable of adaptively adjusting their bitrate. This also allows the user to adjust the ratio between the speech bit rate and the channel coding bit rate constituting the error protection oriented redundancy according to the prevalent near-instantaneous channel conditions in HSDPA-style transceivers. When the channel quality is inferior, the speech encoder operates at low bit rates, thus accommodating more powerful forward error control within the total bit rate budget. By contrast, under high-quality channel conditions the speech encoder may benefit from using the total bit rate budget, yielding high speech quality, since in this high-rate case low redundancy error protection is sufficient. Thus, the AMR concept allows the system to operate in an error-resilient mode under poor channel conditions, while benefitting from a better speech quality under good channel conditions. Hence, the source coding scheme must be designed for seamless switching between rates available without annoying artifacts.

Preface, Motivation and The Speech Coding Scene

Preface, Motivation and The Speech Coding Scene In the era of third-generation (3G) wireless personal communications standards, despite the emergence of broad-band access network standard proposals, the