Convention Paper Presented at the 120th Convention 2006 May Paris, France

Similar documents
PRACTICAL, COMPUTATION EFFICIENT HIGH-ORDER NEURAL NETWORK FOR ROTATION AND SHIFT INVARIANT PATTERN RECOGNITION. Evgeny Artyomov and Orly Yadid-Pecht

To: Professor Avitabile Date: February 4, 2003 From: Mechanical Student Subject: Experiment #1 Numerical Methods Using Excel

Parameter Free Iterative Decoding Metrics for Non-Coherent Orthogonal Modulation

High Speed ADC Sampling Transients

Calculation of the received voltage due to the radiation from multiple co-frequency sources

OVER-SAMPLING FOR ACCURATE MASKING THRESHOLD CALCULATION IN WAVELET PACKET AUDIO CODERS

Side-Match Vector Quantizers Using Neural Network Based Variance Predictor for Image Coding

High Speed, Low Power And Area Efficient Carry-Select Adder

Chaotic Filter Bank for Computer Cryptography

antenna antenna (4.139)

A study of turbo codes for multilevel modulations in Gaussian and mobile channels

Digital Transmission

Walsh Function Based Synthesis Method of PWM Pattern for Full-Bridge Inverter

Efficient Large Integers Arithmetic by Adopting Squaring and Complement Recoding Techniques

NATIONAL RADIO ASTRONOMY OBSERVATORY Green Bank, West Virginia SPECTRAL PROCESSOR MEMO NO. 25. MEMORANDUM February 13, 1985

Low Switching Frequency Active Harmonic Elimination in Multilevel Converters with Unequal DC Voltages

A High-Sensitivity Oversampling Digital Signal Detection Technique for CMOS Image Sensors Using Non-destructive Intermediate High-Speed Readout Mode

The Performance Improvement of BASK System for Giga-Bit MODEM Using the Fuzzy System

Time-Variant Least Squares Harmonic Modeling

Uncertainty in measurements of power and energy on power networks

HUAWEI TECHNOLOGIES CO., LTD. Huawei Proprietary Page 1

Space Time Equalization-space time codes System Model for STCM

Subarray adaptive beamforming for reducing the impact of flow noise on sonar performance

ANNUAL OF NAVIGATION 11/2006

Section 5. Signal Conditioning and Data Analysis

Performance Analysis of Multi User MIMO System with Block-Diagonalization Precoding Scheme

Rejection of PSK Interference in DS-SS/PSK System Using Adaptive Transversal Filter with Conditional Response Recalculation

A Comparison of Two Equivalent Real Formulations for Complex-Valued Linear Systems Part 2: Results

Low-Delay 16 kb/s Wideband Speech Coder with Fast Search Methods

Phasor Representation of Sinusoidal Signals

Speech Enhancement Based on Analysis Synthesis Framework With Improved Pitch Estimation and Spectral Envelope Enhancement

Evaluate the Effective of Annular Aperture on the OTF for Fractal Optical Modulator

An Alternation Diffusion LMS Estimation Strategy over Wireless Sensor Network

Design of Shunt Active Filter for Harmonic Compensation in a 3 Phase 3 Wire Distribution Network

Th P5 13 Elastic Envelope Inversion SUMMARY. J.R. Luo* (Xi'an Jiaotong University), R.S. Wu (UC Santa Cruz) & J.H. Gao (Xi'an Jiaotong University)

Learning Ensembles of Convolutional Neural Networks

Shunt Active Filters (SAF)

Simulation of Distributed Power-Flow Controller (Dpfc)

Figure.1. Basic model of an impedance source converter JCHPS Special Issue 12: August Page 13

NOISE ESTIMATION USING STANDARD DEVIATION OF THE FREQUENCY MAGNITUDE SPECTRUM FOR MIXED NON-STATIONARY NOISE

PERFORMANCE EVALUATION OF BOOTH AND WALLACE MULTIPLIER USING FIR FILTER. Chirala Engineering College, Chirala.

FFT Spectrum Analyzer

A GENERIC AUDIO CLASSIFICATION AND SEGMENTATION APPROACH FOR MULTIMEDIA INDEXING AND RETRIEVAL

LOCAL DECODING OF WALSH CODES TO REDUCE CDMA DESPREADING COMPUTATION

Passive Filters. References: Barbow (pp ), Hayes & Horowitz (pp 32-60), Rizzoni (Chap. 6)

A Robust Feature Extraction Algorithm for Audio Fingerprinting

1 GSW Multipath Channel Models

Multipath Mitigation in GPS/Galileo Receivers with Different Signal Processing Techniques

Application of Intelligent Voltage Control System to Korean Power Systems

Resource Allocation Optimization for Device-to- Device Communication Underlaying Cellular Networks

Inverse Halftoning Method Using Pattern Substitution Based Data Hiding Scheme

Improved Error Detection in the JPEG2000 Codec

Multipath Propagation. Outline. What is OFDM? (OFDM) for Broadband Communications and. Orthogonal Frequency Division Multiplexing

MTBF PREDICTION REPORT

Comparative Analysis of Reuse 1 and 3 in Cellular Network Based On SIR Distribution and Rate

Multicarrier Modulation

Graph Method for Solving Switched Capacitors Circuits

Chapter 2 Two-Degree-of-Freedom PID Controllers Structures

DWA TECHNIQUE TO IMPROVE DAC OF SIGMA-DELTA FRACTIONAL-N FREQUENCY SYNTHESIZER FOR WIMAX

Wideband Extension of Narrowband Speech for Enhancement and Coding

IEE Electronics Letters, vol 34, no 17, August 1998, pp ESTIMATING STARTING POINT OF CONDUCTION OF CMOS GATES

A Perceptual Model for Sinusoidal Audio Coding Based on Spectral Integration

Optimal Placement of PMU and RTU by Hybrid Genetic Algorithm and Simulated Annealing for Multiarea Power System State Estimation

ETSI TS V8.4.0 ( )

Distributed Uplink Scheduling in EV-DO Rev. A Networks

DTIC DTIC. 9o o FILE COPY NATIONAL COMMUNICATIONS SYSTEM TECHNICAL INFORMATION BULLETIN 87-8 PULSE CODE MODULATION FOR GROUP 4 FACSIMILE

EE 508 Lecture 6. Degrees of Freedom The Approximation Problem

THE USE OF CONVOLUTIONAL CODE FOR NARROWBAND INTERFERENCE SUPPRESSION IN OFDM-DVBT SYSTEM

A Simple Yet Efficient Accuracy Configurable Adder Design

IIR Filters Using Stochastic Arithmetic

New Wavelet Based Performance Analysis and Optimization of Scalable Joint Source/Channel Coder (SJSCC & SJSCCN) for Time-Varying Channels.

Estimation of Critical Performance and Optimization of Scalable Joint Source/Channel Coder (SJSCC) For Time Varying Channels

In-system Jitter Measurement Based on Blind Oversampling Data Recovery

The Impact of Spectrum Sensing Frequency and Packet- Loading Scheme on Multimedia Transmission over Cognitive Radio Networks

Multi-transmitter aperture synthesis with Zernike based aberration correction

Impact of Interference Model on Capacity in CDMA Cellular Networks. Robert Akl, D.Sc. Asad Parvez University of North Texas

ECE 2133 Electronic Circuits. Dept. of Electrical and Computer Engineering International Islamic University Malaysia

LP-BLIT: BANDLIMITED IMPULSE TRAIN SYNTHESIS OF LOWPASS-FILTERED WAVEFORMS

Harmonic Balance of Nonlinear RF Circuits

64-QAM Communication System using Three-beam Spatial Power Combining Technology

AC-DC CONVERTER FIRING ERROR DETECTION

Pulse Extraction for Radar Emitter Location

PERFORMANCE COMPARISON OF THREE ALGORITHMS FOR TWO-CHANNEL SINEWAVE PARAMETER ESTIMATION: SEVEN PARAMETER SINE FIT, ELLIPSE FIT, SPECTRAL SINC FIT

Review: Our Approach 2. CSC310 Information Theory

STUDY OF MATRIX CONVERTER BASED UNIFIED POWER FLOW CONTROLLER APPLIED PI-D CONTROLLER

NEUROMORPHIC NOISE ATTENUATION BASED ON PITCH IN HEARING AIDS

Dynamic Optimization. Assignment 1. Sasanka Nagavalli January 29, 2013 Robotics Institute Carnegie Mellon University

A Heuristic Speech De-noising with the aid of Dual Tree Complex Wavelet Transform using Teaching-Learning Based Optimization

Wavelet Multi-Layer Perceptron Neural Network for Time-Series Prediction

A Proposal of Mode Shape Estimation Method Using Pseudo-Modal Response : Applied to Steel Bridge in Building

A Novel GNSS Weak Signal Acquisition Using Wavelet Denoising Method

An Improved Method for GPS-based Network Position Location in Forests 1

A New Calibration Method for Current and Voltage Sensors Used in Power Quality Measurements

Analysis of Time Delays in Synchronous and. Asynchronous Control Loops. Bj rn Wittenmark, Ben Bastian, and Johan Nilsson

Performance Analysis of the Weighted Window CFAR Algorithms

A New Multistage Search of Algebraic CELP Codebooks Based on Trellis Coding

熊本大学学術リポジトリ. Kumamoto University Repositor

DVB-T/H Digital Television Transmission and its Simulation over Ricean and Rayleigh Fading Channels

A Current Differential Line Protection Using a Synchronous Reference Frame Approach

Steganography in JPEG2000 Compressed Images

Transcription:

Audo Engneerng Socety Conventon Paper Presented at the 120th Conventon 2006 May 20 23 Pars, France Ths conventon paper has been reproduced from the author's advance manuscrpt, wthout edtng, correctons, or consderaton by the Revew Board. The AES takes no responsblty for the contents. Addtonal papers may be obtaned by sendng request and remttance to Audo Engneerng Socety, 60 East 42 nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rghts reserved. Reproducton of ths paper, or any porton thereof, s not permtted wthout drect permsson from the Journal of the Audo Engneerng Socety. A Novel Integrated Audo Bandwdth Extenson Toolkt (ABET) Deepen Snha 1, Anbal Ferrera 1, 2, and Harnarayanan E. V. 1 1 ATC Labs, New Jersey, USA 2 Unversty of Porto, Portugal Correspondence should be addressed to D. Snha (snha@atc-labs.com) ABSTRACT Bandwdth Extenson has emerged as an mportant tool for the satsfactory performance of low bt rate audo and speech codecs. In ths paper we descrbe the components of a novel ntegrated audo bandwdth extenson toolkt (ABET). The ABET toolkt s a combnaton of two bandwdth extenson tools: () The Fractal Self-Smlarty Model () for sgnal spectrum; and, () Accurate Spectral Replacement (ASR). Combnaton of these two tools, whch are appled drectly to hgh frequency resoluton representaton of the sgnal such as the Modfed Cosne Transform (MDCT), has several benefts for ncreased accuracy and codng effcency of the hgh frequency sgnal components. At the same tme the combnaton of the two tools entals a number of mportant algorthmc and perceptual consderatons. In ths paper we descrbe the components of the ABET bandwdth extenson toolkt n detal. Algorthmc detals, audo demonstratons, and, ABET confguraton detals are presented. Addtonal nformaton and audo samples are avalable at http://www.atc-labs.com/abet/. 1. INTRODUCTION Audo codng at low bt rates has many establshed and emergng applcatons. These nclude Satellte and Terrestral Dgtal Audo Broadcastng, audo delvery over the moble network, hgh qualty audo communcaton over the IP and moble network, Internet musc download and streamng, sold-state audo playback devces, etc. In many of these applcatons the demand for hgher compresson effcency contnues to grow. In fact there appears to be a prolferaton of applcatons demandng CD lke qualty stereo at bt rates of 32-48 kbps and hgh qualty FM grade mono audo at bt rates of 16-24 kbps. These n turn contnue to spur the demand for newer algorthms for audo bt rate reducton. Audo Bandwdth Extenson has emerged as a key technque for achevng hgher compresson effcency at hence hgh subjectve qualty at low bt rates. The rapd growth n the feld of Perceptual Audo Codng has yelded a number of audo codng technologes based on the prncple of Adaptve Transform Codng [1]. These nclude propretary schemes such as PAC (Bell Labs, Lucent) [2] and

A Novel Integrated Audo Bandwdth Extenson ATRAC (Sony) [3] as well as standard based codecs such as MPEG-1 Layer 3 (popularly known as MP3) [4], MPEG-2 AAC [5], Dolby AC-3 [6]. At best these conventonal audo codng technques are capable of producng full fdelty CD qualty audo n the range of 96-128kbps. Furthermore, near-cd qualty audo wth somewhat lower audo bandwdth (~ 15 khz) and lmted stereo s achevable n the range of 48-64 kbps. For the vablty of these codng schemes for new and emergng applcatons t s desrable to reduce the bt rate further wthout sacrfcng the audo bandwdth. A second class of codng schemes s geared prmarly towards the codng of voce sgnal for two way communcatons [14][7]. At the lowest bt rates these typcally employ a varaton of the Code Excted Lnear Predcton (CELP) technque. These codng schemes typcally code a small audo bandwdth (< 4 khz). For these exstng [7] and emergng [14] low bt rate codng schemes t s attractve to mprove the audo bandwdth sgnfcantly wth as lttle bt overhead as possble. In order to reduce the bt rate requrement of adaptve transform codng schemes further, or to provde ncreased audo bandwdth wth very low bt rate CELP based codecs, t becomes necessary to rely on a compact parametrc descrpton of all or a porton of the audo sgnal. One such approach that has proven to be partcularly effectve s the so called Bandwdth Extenson approach. In Bandwdth Extenson only a low pass fltered verson of the sgnal s drectly coded usng the conventonal perceptual codng or another sutable paradgm. The hgh frequency porton of the sgnal spectrum s recreated at the decoder by a mappng generated from the low frequency spectrum of the sgnal. Typcally an attempt s made to match the reconstructed hgh frequency spectrum to the orgnal hgh frequency spectrum as closely as possble. In [9] and [10] we ntroduced two novel bandwdth extenson technques whch are appled drectly to the hgh resoluton frequency representaton of the sgnal. The frst descrbed n [9] s based on a Fractal Self Smlarty Model () for the MDCT representaton of audo sgnal. It was shown that the model works across a wde class of natural audo and s capable of provdng detaled and natural soundng audo reconstructon. The second scheme, Accurate Spectral Replacement (ASR), was ntroduced [10]. ASR s capable of an extremely accurate reconstructon of the tonal components and harmonc structures n the syntheszed hgh frequency spectrum of the sgnal. Further work wth the ASR and bandwdth extenson tools has led to the understandng that the two technques have several complementary aspects. The two technques have therefore been combned nto an ntegrated bandwdth extenson platform called the Audo Bandwdth Extenson Toolkt (ABET). Some of the hghlghts of ABET are as follows: ABET works wth vrtually any baseband codng scheme. It s partcularly sutable for use n conjuncton wth codng schemes that employ a hgh resoluton flterbank for codng. However, ABET has also been used wth consderable success n combnaton wth other codng schemes such as low bt rate speech codecs. ABET makes use of both the and ASR algorthms n an adaptve framework and also allows for the combnaton of aspects of and ASR synthess. In other words through ABET, ASR and may ether be used ndependently or n combnaton to explot ther complementary nature. ABET bandwdth extenson models (ASR and ) are appled n the doman of a hgh frequency resoluton flterbank such as the Odd Dscrete Frequency Transform (ODFT) or the Modfed Dscrete Cosne Transform (MDCT) [8]. ABET also ncorporates a thrd essental tool Mult Band Temporal Ampltude Codng (MBTAC) (also descrbed n [9]). MBTAC may (optonally) be employed when the tme resoluton of the prmary MDCT/ODFT flterbank s too low to allow for sutable temporal shapng of the reconstructed hgh frequency components. For the computaton and applcaton of MBTAC sgnal s analyzed usng a secondary Utlty Flter Bank (UFB) that has a sgnfcantly better tme resoluton. The presence of effcent and hgh qualty codng tools for the stereo envelope allows ABET to functon as the man buldng block of a parametrc audo codng scheme offerng accurate reproducton of stereo envelopes. These technques offer the promse of a more accurate reconstructon of the syntheszed hgh Page 2 of 12

A Novel Integrated Audo Bandwdth Extenson frequency spectrum n comparson to prevously reported approaches such as the Spectral Band Replcaton approach [9]. In ths paper we descrbe the components of ABET and dscuss ts applcaton to actual audo codng schemes. We have utlzed (parts of) ABET n the buldng of three audo codng products. These nclude the TeslaPro codec [12], the Audo Communcaton Codec [13][14], and a new very low bt rate codng technques for mxed contents [15]. The organzaton of the rest of the paper s as follows. In Secton 2 we take a closer look at the ABET encoder followed by the ABET decoder n Secton 3. The prmary codng tools n ABET the, the ASR, and the UFB/MBTAC are further descrbed n Secton 4. Sectons 5 and 6 Audo present the functonal descrpton of the ABET Encoder and Decoder processng blocks respectvely. Codng results and the codecs utlzng the ABET scheme are dscussed n secton 7. 2. THE ABET ENCODER The ABET Encoder s shown n Fgure 1. ABET works n conjuncton wth a baseband codng scheme whch s expected to encode the low pass-fltered sgnal nformaton. ABET encoder s confgurable usng a set L MDCT & ODFT R Analyss of optons. These optons are used to nvoke one or more of the ABET components, control the relatve precedent of the ABET components, and control the level of detal n a partcular component. In a complete audo codng scheme, selecton of the confguraton parameter s typcally a functon of the bt rates and may need to be carefully tuned n conjuncton wth other codec parameters. As noted above, the bandwdth extenson tools nherent n ABET,.e. ASR and, operate on hgh resoluton frequency representaton of the sgnal. The ABET encoder therefore ncorporates an ntegrated MDCT/ ODFT computaton module. If the baseband codng scheme also operates n the MDCT/ODFT doman the transform nformaton may be shared between the ABET encoder and the encoder of the baseband codng scheme; ABET encoder therefore makes the low-pass-fltered MDCT/ODFT coeffcents avalable as part of ts output. ABET supports wndow swtchng; n other words t s possble to use a conventonal wndow swtchng algorthm of the type Long Start Short Stop Long (e.g., as n [2][4]) and the ABET parameters are sutably adopted to the tme varyng flterbank resoluton. However, as noted above ABET ncorporates addtonal tools for temporal shapng, reducng (and/or n certan cases elmnatng) the need for wndow swtchng n a conventonal MDCT/ODFT based baseband codng scheme. Low Pass Flter (Baseband) Confguraton Optons Harmonc Analyss ASR/ Model Confguraton Parameter Estmaton & ASR Parameter Estmaton & Btstream Formattng UFB Analyss Stereo Multband Temporal (MBTAC) Output(s) Fgure 1: ABET Encoder Page 3 of 12

A Novel Integrated Audo Bandwdth Extenson The ABET Encoder encompasses the followng functonal areas: 1. Frequency Analyss: MDCT/ODFT analyss as descrbed above. 2. Hgh Resoluton Spectrum and Harmonc Analyss: detecton of tones and harmonc features n a sgnal segment 3. ASR/ Model Confguraton: Selecton of ASR and codng tools matched to the codng specfc sgnal features. Ths s drven by the output of the spectrum/harmonc analyss block and the confguraton parameters 4. ASR Parameter Extracton and codng 5. parameters extracton and codng 6. UFB analyss: a second tme-frequency analyss of the sgnal wth a hgher tme resoluton than the prmary MDCT/ODFT analyss. 7. Stereo MBTAC Codng: Jont encodng of stereo tme-frequency envelope of the sgnal. 8. Huffman codng: noseless codng of MBTAC, ASR, and, parameters. 9. Btstream packng of all the encoded parameters. ABET Btstream Baseband MDCT Input Btstream Parsng and Huffman Decodng MDCT2ODFT Baseband ASR Synthess + - 3. ABET DECODER STRUCTURE The ABET Decoder s shown n Fgure 2. The prmary job of the ABET decoder s to perform sgnal synthess usng the and ASR model n the ODFT doman. If the baseband coder utlzes MDCT representaton, then a MDCT to ODFT mappng s utlzed. In the cases where both ASR and models are smultaneously actve addtonal processng s necessary to ensure that any harmonc pattern syntheszed by ASR s not duplcated by the model. To ensure ths partal ASR synthess and subtracton n baseband s performed pror to the applcaton of model synthess. In the cases where the tme resoluton of the MDCT/ODFT flterbank s too hgh to allow for adequate temporal shapng, the MBTAC nformaton s appled n the UFB doman. The ABET decoder ncorporates the followng functonal areas UFB Synthess 1. Huffman decodng and de-quantzaton of ASR,, and, MBTAC nformaton. 2. MDCT to ODFT transformaton 3. synthess module 4. ASR synthess module, ncludng the baseband Hgh Frequency Synthess Stereo MBTAC Applcaton ASR Hgh Frequency Synthess IODFT UFB Analyss L synt R synt Fgure 2: ABET Decoder Page 4 of 12

A Novel Integrated Audo Bandwdth Extenson ASR synthess and removal to ensure harmonous combnatons of ASR and syntheszed components. 5. Inverse ODFT/MDCT transformaton 6. UFB analyss 7. MBTAC applcaton n the UFB doman 4. PRIMARY CODING TOOLS IN ABET As noted above, the proposed codng scheme utlzes two bandwdth extenson tools. Here we provde a hgh level descrpton of the two tools and ASR. For a detaled descrpton of the reader s referred to [9], smlarly, a detaled descrpton of ASR may be found n [10]. In ths secton we descrbe the essental elements of both these models and also another mportant aspect of ABET,.e. UFB flterbank and MBTAC. The bandwdth extenson paradgm may be formalzed as below. It s assumed that n each audo frame, the spectral representaton of the sgnal (such as the MDCT representaton) up to certan frequency f c, denoted as X LP ( f ), s coded drectly usng effcent quantzaton and codng technques. It may be noted that t s not requred that the baseband codec be a MDCT/ODFT doman codng scheme. What s requred by ABET s that after decodng the sgnal s transformed nto MDCT/ODFT doman. The MDCT/ODFT spectrum for frequences f > f c s to be reconstructed usng a mappng BE such that X HP ( f ) = BE( X LP ( f )) (3) Where, X LP s the quantzed baseband and X HP s the reconstructed hgher frequences n MDCT/ODFT doman. 4.1. The Bandwdth Extenson Model X ( f ) LEO o ( L( EO1 o ( EO0 o X LP ( f )) L) (4) HP = Where each expanson operator EO s assumed to have the form EO o X f ) = H X ( α f f ) (5) LP ( LP where α s a dlaton parameter ( α 1) and f s a frequency translatonal parameter. H s a hgh pass (brck-wall) flter wth a cutoff frequency f ( 1) 0 c = α fc + f, wth f c = fc. Ths sequence of nested expanson operators resultng n bandwdth expanson s descrbed further n [9]. The dlaton/translaton equatons suggest a Fractal lke Model for whch s able to reconstruct the hgh frequency spectral detals wth a hgh level of accuracy across a wde range of dfferent audo sgnals. The sgnfcance of the dlaton and translaton terms n s llustrated wth the help of codng examples n Fgures 3 (a), (b), (c). For example, the translaton term mproves the accuracy of reconstructon for muscal nstruments wth a ptch structure and also for voced speech and vocal sgnals. For these classes of sgnals the lack of dlaton terms results n a dscontnuty n the ptch structure. Ths s llustrated n Fgure 3 (a) and (b). Fgure 3(a) shows the reconstructed spectrum supermposed over the orgnal spectrum usng a dfferent bandwdth extenson scheme (such as the spectrum replcaton approach of [11]).Ths s compared aganst the reconstructon usng the model as shown n Fgure 3(b). The ncluson of dlaton parameter on the other hand leads to accurate sgnal spectrum reconstructon for a dfferent class of audo sgnals, n partcular for cases when the ptch structure s ether not present n (part of) hgh frequences or s more dffuse towards the hgher frequences. Example of a sgnal ( Ara ) that benefts from the ncluson of the dlaton terms n s shown n Fgure 3(c). In the technque hgh frequency components of the sgnal are reconstructed usng an teratve sequence of Expanson Operators ( EO ) as below, Page 5 of 12

A Novel Integrated Audo Bandwdth Extenson The model n general s a +Isolated Tones+Nose model. In a subsequent sub-secton we dscuss that n the cases where s used as the prmary bandwdth extenson model, t s advantageous to encode secondary tonal components (e.g., a secondary harmonc sequence and solated tones) usng the ASR model. In general t may stll be necessary to add synthetc nose for part or the entre short term spectrum that does not ft the (or ASR tonal model). In practce, however, f the dlaton parameter n the model s sutably estmated, the occurrence of such cases s rather nfrequent. Fgure 3(a): Reconstructed sgnal spectrum (sold lne) and orgnal spectrum (dashed lne) usng a spectrum replcaton approach. An nterestng observaton related to the model s that the temporal envelope of the reconstructed hgh frequency components usng the model shows a hgh level of coherence wth the temporal envelope of the base band components. Ths observaton s llustrated wth the help of a synthetc narrowband nose sgnal n Fgure 4. The fgure shows the base band sgnal (Fgure 4a), the constructed hgh frequency sgnal (Fgure 4b) and the Hlbert envelopes of the two sgnals supermposed on each other (Fgure 4c). Fgure 3(b): Reconstructed sgnal spectrum (sold lne) and orgnal spectrum (dashed lne) wth the model. Fgure 4: (a)base band nose sgnal, (b) constructed hgh frequences, (c) s of (a) & (b) 4.2. The ASR Bandwdth Extenson Model Fgure 3(c): Example of a sgnal (short-term PSD) that benefts from the ncluson of the dlaton term n the model. The ASR Model for bandwdth extenson s descrbed n detal n [10]. It takes nto account the specfcty of the coherent (.e., snusodal) components of an audo sgnal, as well as the specfcty of the ncoherent (.e., nose) components of an audo sgnal, namely wth respect to ther dfferent perceptual mpact and ther dfferent spectral nature and fne spectral structure. At Page 6 of 12

A Novel Integrated Audo Bandwdth Extenson the heart of ASR s a snusodal analyss and synthess algorthm wth sub-bn accuracy. The ASR model s partcularly effectve when the audo sgnal exhbts a well defned harmonc structure of snusods. In ths case a bandwdth extenson technque based on the replcaton of base band components may not provde satsfactory reconstructon of hgher order partals. A replcaton model n ths case, as noted above, has a sgnfcant defcency n the sense that t may ether break the organzaton of the harmoncs n frequency whch s lkely to be notceable to the human audtory system n the form of a ptch shft or the appearance of several ptches nstead of a sngle one. ASR also allows suffcent and flexble control over the phase of the syntheszed hgher order partals whch may not be possble n technques utlzng mappng based on the lower frequences (base band). The most general form of ASR processng conssts of the followng steps. 1. Normalzaton of the audo spectrum by a model of the smooth spectral envelope, the nose part of the resultng flattened spectrum s very approxmately whte. 2. Segmentaton of the flattened spectrum nto snusods and a resdual (or nose), ths resdual results by removng (.e., by subtractng) snusods drectly from a complex dscrete frequency representaton of the audo sgnal, presumng that ths representaton s able to resolve all exstng snusodal components. 3. Synthess and bandwdth extenson of snusods wth sub-bn accuracy and usng a reduced set of parameters (frequences, magntude, or phases) descrbng the orgnal audo snusodal components. 4. Synthess and bandwdth extenson of nose wth bn accuracy (n the next sub-secton we dscuss how t may be advantageous to extend the nose component usng the model). 5. Sum of both bandwdth extended components and nverse normalzaton n order to recover the spectral envelope model of the orgnal spectrum. The ASR model s hghly flexble n terms controllng the spectral balance of the reconstructed hgh frequency components. For example, the spectral tlt affectng the ncoherent components, and the spectral tlt controllng the snusodal components can be shaped and controlled n an ndependent way. Further detals on ASR may be found n [10] and at http://www.atc-labs.com/asr. In the ASR model the parameters necessary for the synthess of harmonc partals are sutably reduced. For example n many cases the phase nformaton may be completely dscarded, or n other cases t s transmtted only at the tme of harmonc brth and used n conjuncton wth a synthess technque that nsures phase contnuty from frame to frame. The prmary flterbank doman for ASR processng s the Odd-DFT (ODFT). At the encoder snusodal components are estmated from the ODFT spectrum and removed by drect synthess of ODFT spectral bns usng a model of the frequency response of sne wndow [23, 24] and the estmated frequency, magntude, and phase parameters. It has been concluded that only a small number of frequency bns per snusodal component are needed to generate a good qualty snusod and to effectvely remove t from the ODFT spectrum. The snusodal components are further analyzed to detect the presence of one or more harmonc patterns (ncludng harmoncs wth mssng fundamentals) as well as solated (non-harmonc) snusodal components. Parameters necessary for the synthess of hgh frequency snusodal components are then analyzed and sutably reduced (e.g., by dscardng the phase components). The reduced parameters are forwarded to the decoder. In the decoder the hgh frequency snusodal components can be syntheszed drectly n the ODFT doman, avodng the TDAC mechansm assocated wth MDCT. A snusodal contnuaton algorthm s used to generate snusodal trajectory usng only the transmtted frequency and magntude parameters. In most cases phase nformaton s only needed at the tme of harmonc brth. Furthermore, n most cases a reduced level of magntude nformaton n the form of a smooth spectral envelope s needed for the snusodal contnuaton algorthm. The accuracy of snusodal synthess usng the ASR model s depcted n Fgure 5 usng a synthetc FM modulated snusod. The synthess accuracy for natural audo s hghlghted n [10] wth addtonal llustratons. Page 7 of 12

A Novel Integrated Audo Bandwdth Extenson 1. In ths case the ASR model s utlzed for the encodng and synthess of the domnant harmonc sequence n the sgnal and also solated (nharmonc) tones. The secondary harmonc sequence (f one s present) and the non-tonal components are modeled by the algorthm. (a) 2. In ths case all the harmoncally related tonal components and nharmonc sgnal components are coded by the model. The ASR model s then used to encode and synthess solated (nharmonc) tones. In ths confguraton the model estmaton algorthm emphaszes accurate reconstructon of the domnant harmonc tone sequence n the sgnal. 3. In ths case all the tonal components (up to two harmonc sequences and solated tones) are coded usng the ASR model. The non-tonal nose-floor s then modeled usng the approach. (b) Fgure 5: Spectrogram of FM Modulated Snusod; (a) Orgnal (b) ASR Syntheszed 4.3. Adaptve Combnaton of and ASR Models As noted above, ABET allows for the flexble applcaton of and ASR bandwdth extenson models ether ndependently or n combnaton wth each other. Practcal combnatons of the two models nclude (but are not lmted by) the scenaros descrbed below. It may also be noted that ABET allows for the model confguraton to change on a frame to frame bass. In audo frames where both model and ASR snusodal synthess model s actve, t s mportant to ensure that the harmoncs syntheszed by the and the ASR models do not nterfere wth each other. Ths may happen for example n a case when models the domnant harmonc n the sgnal and ASR s used to synthesze the secondary harmonc. Unless care s taken, the synthess wll also create hgh frequency partals correspondng to the secondary harmoncs (albet wth naccurate frequency locaton). These wll then nterfere wth partals generated by the ASR model. To elmnate ths problem the partals due to the secondary harmonc pattern are subtracted from the baseband before the applcaton of the model (ths process s llustrated n the ABET decoder block dagram, Fgure 2). 4.4. Utlty Flter Bank (UFB) and Multband Temporal Ampltude Codng (MBTAC) Snce the frequency resoluton of the prmary codng and bandwdth extenson flter bank s typcally qute hgh ABET ncorporates addtonal tools for the shapng of the temporal envelopes of the sgnal n multple frequency bands whch may be optonally nvoked. Ths aspect s dscussed n more detal below. Page 8 of 12

A Novel Integrated Audo Bandwdth Extenson At the heart of the temporal shapng tools n ABET s the Utlty Flter Bank (UFB).The UFB s a complex, over-sampled modulated flter bank [9]. An over samplng rato between (and ncludng) 2 and 8 s permtted by ABET. Dependng upon confguraton parameters (e.g., based on the complexty profle of the decoder and bt rate of operaton) the UFB may take one of the followng 2 forms. A complex modulated flter bank wth an oversamplng rato between 2 and 8 and sub-band flters of the form h 2π j n N = h0 e (7) where h 0 s an optmzed prototype flter. N = 128 and N = 256 are allowed. A complex non-unform flter bank; e.g., one wth two unform sectons and transton flters to lnk the 2 adjacent unform sectons as descrbed n [9]. Ths flter bank s desgned usng the technque descrbed n [27]. The sub-bands n the lower sectons have ½ the bandwdth of the sub-bands at hgher frequences. The hgher frequency resoluton at lower frequences s useful, for example, n parametrc stereo codng. MBTAC nformaton to perform the temporal shapng s computed by analyzng the output of the UFB and transmttng a sutable representaton as sde nformaton. The overhead for ths nformaton can be reduced by utlzng the temporal shape that may already exst and by groupng the nformaton n adjacent tme and frequency bands. The hghlghts of the MBTAC algorthm are as follows. Supports non-unform tme-frequency tlng for the computaton of sgnal envelope. The ntal frequency resoluton s confgurable nto bands whch are ether full, half, or, quarter crtcal band wde. Incorporates several tools for the effcent codng of envelope whch look for typcal and/or perceptually sgnfcant patterns n the tme-frequency envelopes. These nclude technques for noseless codng and groupng based on perceptual crteron. Effcent technques for the codng of stereo envelopes. 5. FUNCTIONAL DESCRIPTION OF ABET ENCODER PROCESSING BLOCKS In ths secton we present addtonal detals regardng several functonal blocks of the ABET Encoder. Hgh Resoluton Frequency Analyss (MDCT/ODFT) s the frst block n the ABET encoder. It smultaneously computes the MDCT and ODFT representaton (for two channels). The MDCT/ODFT analyss s computed for two frequency resolutons: () a Long wndow whch s typcally 2048 sample long (wth 1024 sample overlap between two consecutve wndows), () a Short wndow whch s typcally 256 samples long (wth 128 sample overlap between two consecutve wndows). ABET ncludes ts own wndow state detector. Ths nformaton needs to be shared and synchronzed wth the baseband codng scheme n the case where the frequency analyss s common. Accurate Harmonc Analyss s the next functonal block n the encoder t nvolves the detecton of all the tonal components n sgnal usng the ODFT representaton. The frequences of the tonal components are accurately estmated usng the algorthm descrbed n [23]. The tonal components are further analyzed to determne f these ft nto a harmonc structure (the possblty of mssng harmoncs up to a 7 th order s allowed). The output of the accurate harmonc analyss block s the parameters correspondng to one or more detected harmonc patterns as well as the parameters of solated (nharmonc) tonal components n the hgh frequency regon. ASR/ Model Confguraton: Based on the user selected parameter (e.g., ASR/ model order, number of harmonc patterns to be coded etc.) and the output of accurate harmonc analyss, a decson s made regardng the frequency structures (harmoncs and tones) whch are to be coded by and ASR respectvely. Accurate Spectral Replacement (ASR) model parameter estmaton s the next functonal block at the encoder. For the harmonc patterns coded by the ASR model, the transmtted nformaton conssts of the fundamental frequency as well as the envelope of the hgh frequency partals computed usng a sutable Page 9 of 12

A Novel Integrated Audo Bandwdth Extenson frequency band structure. Ths envelope s dfferentally coded and further compressed usng noseless codng. For stereo sgnal same harmonc pattern s present n both the channels, the parameters are jontly coded for hgher effcency. For solated tones transmtted nformaton conssts of the frequency and magntude of the tone. Fractal self-smlarty model () follows the ASR functonal block. The model parameters are estmated usng a combnaton of 3 crtera: (1) Maxmzaton of a Self-smlarty coherence (SSC) functon as defned below: ( f ) = X ( f ) X ( α f f ) Φ α, (6) (2) A harmonc contnuty crteron to ensure the accuracy of the domnant harmonc structure n the sgnal, (3) Consstency crteron over tme (multple audo frame) to ensure steady alas-free reconstructon of steady harmoncs. Furthermore, the qualty of the estmates mproves sgnfcantly f the MDCT spectrum s normalzed by the coarse envelope pror to the estmaton of these parameters. The Utlty Flterbank (UFB) as descrbed above s a complex modulated flterbank wth several tmes oversamplng. It allows for a tme resoluton as hgh as 16/Fs (where Fs s the samplng frequency) and frequency resoluton as hgh Fs/256. It also optonally supports a non-unform tme-frequency resoluton. Mult Band Temporal Ampltude Codng (MBTAC) nvolves effcent codng of two channel (stereo) tmefrequency envelopes n multple frequency bands. The resoluton of MBTAC frequency bands s user selectable. The envelope nformaton s grouped n tme and frequency and jontly coded (across two channels) for codng effcency. Varous noseless codng tools are used to reduce bt demand. 6. FUNCTIONAL DESCRIPTION OF ABET DECODER PROCESSING BLOCKS The MDCT coeffcents from the encoder are mapped to ODFT coeffcents usng a mappng descrbed n [24].The low pass spectrum s analyzed for the presence lower order partals correspondng to the harmonc structure(s) whch are desgnated to be encoded by the ASR model. The dentfed partals are syntheszed and subtracted from the nput low-pass spectrum to get a flattened spectrum. reconstructon s appled on the flattened spectrum. On applyng dlaton and translaton parameters wth spectral norm values, the hgh frequency flattened spectrum s approxmately reconstructed. ASR at the decoder nvolves syntheszng the chosen harmonc structure and hgh frequency tones from the encoder nformaton. The syntheszed snusods are added to the full band spectrum to reconstruct the orgnal spectrum. MBTAC applcaton n the UFB doman ensures that the temporal envelope approachng the orgnal sgnal s mantaned after the reconstructon from the bandwdth extenson technque. MBTAC applcaton nvolves sutable smoothng technques. 7. CODING RESULTS The ABET toolkt (or ts subset) has been employed n three audo codecs developed by ATC Labs. In frst of these products TeslaPro [12], whch s geared towards broadcast applcatons, ABET s employed n ts full strength wth adaptve combnaton of the, ASR, and, MBTAC tools. Codng results at multple bt rates (between 20 48 kbps) usng TeslaPro are avalable at http://www.atc-labs.com/teslapro. In ths codec ABET s used to encode up to 75% of the audo bandwdth. In a second audo codec geared towards two-way audo communcaton, the ASR and models are used for bandwdth extenson. The shorter block length of ths codec, called the Audo Communcaton Codec (ACC) [13][14] obvates the need for addtonal temporal envelope shapng, hence UFB/MBTAC s not employed. Codng results usng ACC are avalable at http://www.atc-labs.com/acc. In a thrd audo codng product geared towards very low bt rate codng of voce and mxed content, ABET s employed for bt rates as low as 4-6 kbps. Codng results usng ths recently ntroduced codec [15] may be found at http://www.atc-labs.com/lbrcodec. The bt overhead due to ABET s a functon of the model confguraton parameters and the fracton of bandwdth coded ABET. The table below summarzes Page 10 of 12

A Novel Integrated Audo Bandwdth Extenson the overhead for a few preferred confguratons. It typcally ranged between 2-3 kbps/channel. % BW Coded by ABET ASR/ Confg MBTAC Confg Overhead per channel 50 1 st har- 2 nd har & so tones - ASR 50 1 st har- 2 nd har & so tones - ASR 50 1 st ha- so tones - ASR 75 1 st har- so tones - ASR 50 1 st har & so tones - ASR 2 nd har & nose floor - 50 1 st har & so tones - ASR 2 nd har & nose floor - 8. CONCLUSIONS Very Detaled Moderately Detaled Moderately Detaled Moderately Detaled Moderately Detaled No 3.1 kbps 2.5 kbps 2.1 kbps 2.6 kbps 3.5 kbps 2.5 kbps We descrbed a novel audo bandwdth extenson toolkt wth applcaton to low bt rate audo and speech codng. The proposed toolkt, called ABET, allows for flexble combnaton of the and ASR bandwdth extenson models. It ncorporates addtonal tools for accurate shapng of the tme-frequency envelope of the sgnal. Codng results ndcate that ABET allows for a very hgh qualty reconstructon of the hgh frequency sgnal components that s sgnfcantly more accurate than other smlar technques. 9. REFERENCES [1] A. Gersho and R. Gray, Vector Quantzaton and Sgnal Compresson, Kluwer Academc Press, 1992. [2] J. D. Johnston, D. Snha, S. Dorward, and S. R Quackenbush, AT&T Perceptual Audo Codng (PAC), n AES Collected Papers on Dgtal Audo Bt-Rate Reducton, N. Glchrst and C. Grewn, Eds. 1996, pp. 73-82. [3] Kyoya Tsutu, Hrosh Suzuk, Mto Sonohara Osamu Shmyosh, Kenzo Akagr, and Robert M.Heddle, ATRAC: Adaptve Transform Acoustc Codng for MnDsc, 93rd Conventon of the Audo Engneerng Socety, October 1992, Preprnt n. 3456. [4] K. Bradenburg, G. Stoll, et al. The ISO- MPEG- Audo Codec: A Generc-Standard for Codng of Hgh Qualty Dgtal Audo, n 92 nd AES Conventon, 1992, Preprnt no. 3336. [5] Marna Bos et al., ISO/IEC MPEG-2 Advanced Audo Codng, 101st Conventon of the Audo Engneerng Socety, November 1996, Preprnt n. 4382. [6] Mark Davs, The AC-3 Multchannel Coder, 95 th Conventon of the Audo Engneerng Socety, October 1993, Preprnt n. 3774. [7] ITU-T Recommendaton G.729,- "Codng of Speech at 8 kbt/s Usng Conjugate-Structure Algebrac-Code-Excted Lnear-Predcton (CS- ACELP)", March 1996 [8] J. P. Prncen, A. W. Johnson, and A. B. Bradley, Subband/Transform Codng Usng Flter Bank Desgns Based on Tme Doman Alas Cancellaton," n IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng, 1987, pp. 2161-2164. [9] Deepen Snha, Anbal Ferrera, and, Deep Sen A Fractal Self-Smlarty Model for the Spectral Representaton of Audo Sgnals, 118th Conventon of the Audo Engneerng Socety, May 2005, Paper 6467. Page 11 of 12

A Novel Integrated Audo Bandwdth Extenson [10] Anbal J. S. Ferrera and Deepen Snha, Accurate Spectral Replacement, 118th Conventon of the Audo Engneerng Socety, May 2005, Paper 6383. [11] M Detz, L. Lljeryd, K. Kjorlng, and O. Kunz, Spectral Band Replcaton, a novel approach n audo codng, 112th Conventon of the Audo Engneerng Socety, May 2002, Paper 5553. [12] Deepen Snha and Anbal Ferrera A New Broadcast Qualty Low Bt Rate Audo Codng Scheme Utlzng Novel Bandwdth Extenson Tools, 119th Conventon of the Audo Engneerng Socety, October 2005. Paper 6588. [13] Anbal J. S. Ferrera and Deepen Snha, A New Low-Delay Codec for Two-way Hgh-Qualty Audo-Communcaton, 119th Conventon of the Audo Engneerng Socety, October 2005, Paper 6572. [14] Anbal J. S. Ferrera and Deepen Snha, Audo Communcaton Coder, n the preprnts of 120th Conventon of the Audo Engneerng Socety, May 2006. [15] Raghuram A., Anbal J. S. Ferrera, and Deepen Snha, A New Low Bt Rate Speech Codng Scheme for Mxed Content, n the preprnts of 120th Conventon of the Audo Engneerng Socety, May 2006. [16] Joseph L. Hall, Audtory Psychophyscs for Codng Applcatons, Secton IX, Chapter 39, The Dgtal Sgnal Processng Handbook, CRC Press, Edtors: Vjay K. Madsett and Douglas B. Wllams, 1998. [17] B.C.J. Moore, An Introducton to the Psychology of Hearng, 5th Ed., Academc Press, San Dego (2003). [18] Eberhard Zwcker, and Hugo Fastl, Psychoacoustcs: Facts and Models, Sprnger Seres n Informaton Scences (Paperback), Second updated edton. [19] Anbal J. S. Ferrera, Spectral Codng and Post- Processng of Hgh Qualty Audo, Ph.D. thess, Faculdade de Engenhara da Unversdade do Porto-Portugal, 1998, http://telecom.nescn.pt/doc/phd_en.html. [20] D. Snha, Low bt rate transparent audo compresson usng adapted wavelets. Ph.D. thess, Unversty of Mnnesota, 1993. [21] Hall JW, Grose JH, Mendoza L (1995) Acrosschannel processes n maskng. In: Hearng (Moore BCJ, ed), pp 243 266. San Dego:Academc. [22] Jesko L. Verhey, Torsten Dau, and Brger Kollmeer Wthn-channel cues n comodulaton maskng release (CMR): Experments and model predctons usng a modulaton flter bank model Journal of the Acoustcal Socety of Amerca, 106(5), p. 2733-2745. [23] Anbal J. S. Ferrera and Deepen Snha, Accurate and Robust Frequency Estmaton n ODFT Doman, n the proceedngs of the 2005 IEEE Workshop on Applcatons of Sgnal Processng to Audo and Acoustcs, October 16-19, 2005. [24] Anbal J. S. Ferrera, Combned Spectral Normalzaton and Subtracton of Snusodal Components n the ODFT and MDCT Frequency Domans, n 2001 IEEE Workshop on Applcatons of Sgnal Processng to Audo and Acoustcs, October 21-24 2001, pp. 51-54. [25] Anbal J. S. Ferrera, Accurate Estmaton n the ODFT Doman of the Frequency, Phase and Magntude of Statonary Snusods," n 2001 IEEE Workshop on Applcatons of Sgnal Processng to Audo and Acoustcs, October 21-24 2001, pp. 47-50. [26] Anbal J. S. Ferrera, Perceptual Codng Usng Snusodal Modelng n the MDCT Doman," 112th Conventon of the Audo Engneerng Socety, May 2002, Paper 5569. [27] Z. Cvetkovc and J. D. Johnston, Nonunform Oversampled Flter Banks for Audo Sgnal Processng, IEEE Transactons on Speech and Audo Processng, Vol. 11, No. 5, September 2003. [28] Nkl Jayant, James Johnston, and Robert Safranek, Sgnal Compresson Based on Models of Human Percepton, Proceedngs of the IEEE, vol. 81, no. 10, pp. 1385-1422, October 1993. [29] A. V. Oppenhem and R. W. Schafer, Dgtal Sgnal Processng, Prentce-Hall, 1975. Page 12 of 12