I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Similar documents
Autoregressive Models of Amplitude. Modulations in Audio Compression

Autoregressive Models Of Amplitude Modulations In Audio Compression

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

Signal Analysis Using Autoregressive Models of Amplitude Modulation. Sriram Ganapathy

Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization

HIGH-FREQUENCY TONAL COMPONENTS RESTORATION IN LOW-BITRATE AUDIO CODING USING MULTIPLE SPECTRAL TRANSLATIONS

Enhanced Waveform Interpolative Coding at 4 kbps

Chapter IV THEORY OF CELP CODING

Overview of Code Excited Linear Predictive Coder

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Auditory modelling for speech processing in the perceptual domain

Communications Theory and Engineering

Digital Watermarking and its Influence on Audio Quality

PLP 2 Autoregressive modeling of auditory-like 2-D spectro-temporal patterns

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

NOISE SHAPING IN AN ITU-T G.711-INTEROPERABLE EMBEDDED CODEC

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Audio Watermarking Scheme in MDCT Domain

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Audio Compression using the MLT and SPIHT

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Encoding higher order ambisonics with AAC

Enhancing 3D Audio Using Blind Bandwidth Extension

Flexible and Scalable Transform-Domain Codebook for High Bit Rate CELP Coders

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER /$ IEEE

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Audio Signal Compression using DCT and LPC Techniques

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Speech Synthesis; Pitch Detection and Vocoders

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service

A spatial squeezing approach to ambisonic audio compression

Chapter 4 SPEECH ENHANCEMENT

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

QUESTION BANK. SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

APPLICATIONS OF DSP OBJECTIVES

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

EE482: Digital Signal Processing Applications

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Perceptually Motivated Linear Prediction Cepstral Features for Network Speech Recognition

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Digital Speech Processing and Coding

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Speech/Music Change Point Detection using Sonogram and AANN

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Perceptual Speech Enhancement Using Multi_band Spectral Attenuation Filter

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

DERIVATION OF TRAPS IN AUDITORY DOMAIN

TWO ALGORITHMS IN DIGITAL AUDIO STEGANOGRAPHY USING QUANTIZED FREQUENCY DOMAIN EMBEDDING AND REVERSIBLE INTEGER TRANSFORMS

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Improving Sound Quality by Bandwidth Extension

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes

RECENTLY, there has been an increasing interest in noisy

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

DWT based high capacity audio watermarking

A new quad-tree segmented image compression scheme using histogram analysis and pattern matching

Speech Compression Using Voice Excited Linear Predictive Coding

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

REAL-TIME BROADBAND NOISE REDUCTION

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Information. LSP (Line Spectrum Pair): Essential Technology for High-compression Speech Coding. Takehiro Moriya. Abstract

Copyright S. K. Mitra

FPGA implementation of DWT for Audio Watermarking Application

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

The Channel Vocoder (analyzer):

L19: Prosodic modification of speech

BLIND BANDWIDTH EXTENSION USING K-MEANS AND SUPPORT VECTOR REGRESSION. Chih-Wei Wu 1 and Mark Vinton 2

Adaptive Filters Application of Linear Prediction

NOISE ESTIMATION IN A SINGLE CHANNEL

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

A Closed-loop Multimode Variable Bit Rate Characteristic Waveform Interpolation Coder

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Localized Robust Audio Watermarking in Regions of Interest

Das, Sneha; Bäckström, Tom Postfiltering with Complex Spectral Correlations for Speech and Audio Coding

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Speech Signal Analysis

ENHANCED TIME DOMAIN PACKET LOSS CONCEALMENT IN SWITCHED SPEECH/AUDIO CODEC.

Transcription:

R E S E A R C H R E P O R T I D I A P Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain Sriram Ganapathy a b Petr Motlicek a Hynek Hermansky a b Harinath Garudadri c IDIAP RR 08-16 June 2008 published in Interspeech 2008 a IDIAP Research Institute, Martigny, Switzerland b Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland c Qualcomm Inc., San Diego, CA, USA IDIAP Research Institute www.idiap.ch Av. des Prés Beudin 20 Tel: +41 27 721 77 11 P.O. Box 592 1920 Martigny Switzerland Fax: +41 27 721 77 12 Email: info@idiap.ch

IDIAP Research Report 08-16 Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain Sriram Ganapathy Petr Motlicek Hynek Hermansky Harinath Garudadri June 2008 published in Interspeech 2008 Abstract. Audio coding based on Frequency Domain Linear Prediction () uses autoregressive models to approximate Hilbert envelopes in frequency sub-bands. Although the basic technique achieves good coding efficiency, there is a need to improve the reconstructed signal quality for tonal signals with impulsive spectral content. For such signals, the quantization noise in the codec appears as frequency components not present in the input signal. In this paper, we propose a technique of Spectral Noise Shaping (SNS) for improving the quality of tonal signals by applying a Time Domain Linear Prediction (TDLP) filter prior to the processing. The inverse TDLP filter at the decoder shapes the quantization noise to reduce the artifacts. Application of the SNS technique to the codec improves the quality of the tonal signals without affecting the bit-rate. Performance evaluation is done with Perceptual Evaluation of Audio Quality (PEAQ) scores and with subjective listening tests.

2 IDIAP RR 08-16 1 Introduction A new speech/audio coding technique based on modeling the temporal evolution of the spectral dynamics was proposed in [1, 2]. The approach is based on representing Amplitude Modulating (AM) signal using Hilbert envelope estimate and Frequency Modulating (FM) signal using Hilbert carrier. Speech/audio signals are analyzed in time using a non-uniform Quadrature Mirror Filter (QMF) bank to decompose the signal into frequency sub-bands. For each sub-band signal, Hilbert envelopes are estimated using Frequency Domain Linear Prediction (), which is an efficient technique for auto-regressive (AR) modelling of the temporal envelopes of a signal [3]. The parameters of the AR model are transmitted along with a few spectral components of the residual. At the decoder, these steps are inverted to reconstruct the signal back. The codec achieves good compression efficiency for speech/audio signals. However, there is need to improve quality of the reconstructed signal for inputs with tonal components. The technique of fails to model these signals because of the impulsive spectral content. Hence, most of the important signal information is present in the residual. For such signals, the quantization error in the codec spreads across all the frequencies around the tone. This results in significant degradation in the reconstructed signal quality. In conventional codecs such as [4], the dual problem arises in encoding transients in the time domain. This is efficiently solved by Temporal Noise Shaping (TNS) [5]. Specifically, coding artifacts arise mainly in handling transient signals (like the castanets) and pitched signals. Using spectral signal decomposition for quantization and encoding implies that a quantization error introduced in this domain will spread out in time after reconstruction by the synthesis filter bank. TNS represents one solution to overcome this problem by shaping the quantization noise in the time domain according to the input transient. This technique is widely used in modern audio codecs such as [6]. In this paper, we propose a technique of Spectral Noise Shaping (SNS) to overcome the problem of encoding tonal signals in based speech/audio codec. The technique is motivated by the fact that tonal signals are highly predictable in the time domain. If a sub-band signal is found to be tonal, it is analyzed using TDLP [7] and the residual of this operation is processed with the codec. At the decoder, the output of the codec is filtered by the inverse TDLP filter. Since the inverse TDLP filter follows the spectral impulses for tonal signals, it shapes the quantization noise according to the input signal. Application of the SNS technique to the codec improves the quality of the reconstruction for these signals without affecting the bit-rate. The rest of the paper is organized as follows. Sec. 2 describes the technique for AR modelling of Hilbert Envelopes. The basic structure of the codec is described in Sec. 3. Sec. 4 explains the technique of SNS in detail. The objective and subjective evaluations are reported in Sec. 5. 2 Frequency Domain Linear Prediction Typically, auto-regressive (AR) models have been used in speech applications for representing the envelope of the power spectrum of the signal by performing the operation of TDLP [7]. This paper utilizes AR models for obtaining smoothed, minimum phase, parametric models of temporal rather than spectral envelopes. The duality between the time and frequency domains means that AR modeling can be applied equally well to discrete spectral representations of the signal instead of time-domain signal samples. The block schematic showing the steps involved in deriving the AR model of Hilbert Input Signal Compute Analytic signal Hilbert Envelope Fourier Transform Spectral Auto correlation Linear Prediction AR model of Hilb. env. Figure 1: Steps involved in technique for AR modelling of Hilbert Envelopes.

IDIAP RR 08-16 3 Amplitude 2000 0 2000 100 5 10 15 20 25 30 35 Time (ms) 50 Power Spectrum LP Spectrum 0 0 500 1000 1500 2000 2500 3000 3500 00 Frequency (Hz) 60 Hilbert Envelope Envelope 5 10 15 20 25 30 35 TIme (ms) Figure 2: Linear Prediction in time and frequency domains for a portion of speech signal: (a) input signal, (b) Power Spectrum and LP spectrum and (c) Hilbert Envelope and envelope. envelope is shown in Fig. 1. The first step is to compute the analytic signal for the input signal. For a discrete time signal, the analytic signal can be obtained using the Fourier Transform [8]. Hilbert envelope (squared magnitude of the analytic signal) and spectral auto-correlation function form Fourier transform pairs [5]. This relation is used to derive the auto-correlation of the spectral components of a signal which are then used for deriving the models (in manner similar to the computation of the TDLP models from temporal autocorrelations [7]). For the technique, the squared magnitude response of the all-pole filter approximates the Hilbert envelope of the signal. This is in exact duality to the approximation of the power spectrum of the signal by the TDLP, as shown in Fig. 2. 3 Speech/Audio codec based on Long temporal segments (1000 ms) of the input speech/audio signals are decomposed into 32 nonuniform QMF sub-bands which approximate the critical band decomposition in auditory system. In each sub-band, is applied and Line Spectral Frequencies (LSFs) approximating the subband temporal envelopes are quantized using Vector Quantization (VQ). The residual signals are processed in spectral domain with the magnitude spectral parameters quantized using VQ. Phase spectral components are scalar quantized (SQ) as they were found to have a uniform distribution. Graphical scheme of the encoder is given in Fig. 3. In the decoder, shown in Fig. 4, quantized spectral components of the residual signals are reconstructed and transformed back to the time-domain using inverse Discrete Fourier Transform (DFT). The reconstructed envelopes (from LSF parameters) are used to modulate the corresponding sub-band residual signals. Finally, sub-band synthesis is applied to reconstruct the full-band signal. 4 Improvements in codec For improving the reconstruction quality of tonal signals, we include the tonality detector and the SNS module to the codec.

4 IDIAP RR 08-16 1 2 Input QMF Analysis...... 32 DFT Residual Envelope LSFs Mag Phase Q Q Q Figure 3: Scheme of the encoder. 1 2 Q 1 Q 1 Q 1 Envelope LSFs Mag Phase Residual Inv. IDFT 32...... QMF Synthesis Output Figure 4: Scheme of the decoder. 4.1 Tonality Detector The task of the tonality detector is to identify the QMF sub-band signals which have strong tonal components. Since the codec efficiently encodes non-tonal and partly tonal signals, highly tonal signals are alone processed using SNS. For this purpose, a global and a local tonality measure is computed and the tonality decision is taken based on both these measures. The global tonality measure is based on the Spectral Flatness Measure (SFM defined as the ratio of the geometric mean to the arithmetic mean of the spectral magnitudes) of the full-band signal. If the SFM is below a threshold, all the sub-bands for that input frame are checked for tonality locally. The local tonality measure is determined from the spectral auto-correlation of the sub-band signal (used for estimation of envelopes in Fig. 1). The ratio of the maximum spectral auto-correlation in higher lags (from lag 1 to the model order) to the zeroth lag of the spectral auto-correlation forms the local tonality measure. If the sub-band signal is highly tonal, its spectrum is impulsive and therefore, the spectral auto-correlation is impulsive as well. On the other hand, if the higher lags of spectral auto-correlation (within the model order) contain significant percentage of the energy (zeroth lag of spectral auto-correlation), the spectrum of the signal is predictable and the base-line codec (without the SNS) is able to model this signal structure. 4.2 Spectral Noise Shaping As explained earlier, the tonal sub-band signals are applied to a TDLP filtering block. For the tonal signals, the TDLP and the model order are made equal to 20 as compared to a model order of for the non-tonal signals. Hence, there is no increase in the bit-rate by the inclusion of the SNS. At the decoder, inverse TDLP filtering applied on the decoded signal gives the sub-band

IDIAP RR 08-16 5 100 80 60 100 80 60 (a) 50 100 150 200 250 300 350 (b) 50 100 150 200 250 300 350 Frequency (Hz) Figure 5: Inverse TDLP filter used for spectral noise shaping: (a) Power Spectrum of tonal sub-band signal, and (b) Magnitude response of the inverse TDLP filter in SNS. QMF Sub band Tonality Detector Yes TDLP Filtering Analysis Synthesis Inverse TDLP Filtering Reconstructed Sub band No SNS Analysis Encoder Synthesis Decoder Reconstructed Sub band Figure 6: codec with SNS. signal back. The technique of SNS is motivated by the fundamental property of the linear prediction: For AR signals, the inverse TDLP filter has magnitude response characteristics similar to the Power Spectral Density (PSD) of the input signal [7]. As an example, Fig. 5 shows the power spectrum of a tonal sub-band signal and the frequency response of the inverse TDLP filter for this sub-band signal. Since the quantization noise passes through the inverse TDLP filter, it gets shaped in the frequency domain according to PSD of the input signal and hence the name, spectral noise shaping. Fig. 6 shows the block schematic of the codec with SNS. The additional side information involved is only the signalling of the tonality decision to the decoder (32bps). 5 Results The subjective and objective evaluations of the proposed audio codec are performed using challenging audio signals (sampled at 48 khz) present in the framework for exploration of speech and audio coding [9]. It is comprised of speech, music and speech over music recordings. For purpose of detailed evaluation, more tonal signals from [10] are also used.

6 IDIAP RR 08-16 (a) 20 0 20 0 20 0 250 300 350 0 450 500 550 600 650 700 (b) 250 300 350 0 450 500 550 600 650 700 (c) 250 300 350 0 450 500 550 600 650 700 Frequency (Hz) Figure 7: Improvements in reconstruction signal quality with SNS: A portion of power spectrum of (a) a tonal input signal, (b) reconstructed signal using the base-line codec without SNS, and (c) reconstructed signal using the codec with SNS. Input Base-line codec With SNS Flute 1-0.49-0.41 Flute 2-1.86-1.66 Violin -0.43-0.32 Organ -2.82-1.12 Alto-flute -2.10-2.04 Avg. -1.54-1.10 Table 1: PEAQ scores for tonal files with and without SNS. bit-rate [kbps] 66 64 64 system LAME AAC Avg. -1.11-1.61-0.77 Table 2: Average objective quality test results provided by PEAQ for 27 files. 5.1 Objective Evaluations The objective measure employed is the Perceptual Evaluation of Audio Quality (PEAQ) distortion measure [11]. In general, the perceptual degradation of the test signal with respect to the reference signal is measured, based on the ITU-R BS.1387 (PEAQ) standard. The output combines a number of model output variables (MOV s) into a single measure - Objective Difference Grade (ODG) score. ODG is an impairment scale which indicates the measured audio quality of the signal under test on a continuous scale from 4 (very annoying impairment) to 0 (imperceptible impairment). Table 1 shows the comparison of the PEAQ scores for some tonal files with and without SNS. The objective quality score (average PEAQ scores) is improved by the application of SNS (on the average by about 0.4), without affecting the bit-rate. The improvement provided by SNS for tonal signals is also illustrated in Fig. 7, where we show a portion of power spectrum of the (a) input signal, (b) the reconstructed signal using the base-line codec, and (c) reconstructed signal using the codec with SNS. This figure illustrates the ability of the proposed technique in reducing the artifacts present in tonal signals.

IDIAP RR 08-16 7 100 90 80 70 Score 60 50 30 20 Original LAME AAC 7kHz LP 3.5kHz LP Figure 8: MUSHRA results for 8 audio files with 22 listeners using three coded versions (, AAC and LAME), hidden reference (original) and two anchors (7 khz low-pass filtered and 3.5 khz low-pass filtered). For comparison with state-of-art codecs, the following 3 codecs are considered: 1. codec with SNS at 66 kbps denoted as. 2. LAME MP3 (MPEG 1, layer 3) [4] at 64 kbps denoted as LAME. 3. High Efficiency Advanced Audio Coding (AAC+v1) with Spectral Band Replication (SBR) [6, 13] at 64 kbps denoted as AAC. For the 27 speech/audio files from [9], the results of objective quality evaluations are shown in Table 2. 5.2 Subjective Evaluations MUSHRA (MUltiple Stimuli with Hidden Reference and Anchor) is a methodology for subjective evaluation of audio quality. It is defined by ITU-R recommendation BS.1534 [12]. We perform the MUSHRA tests on 8 audio samples from the database with 22 listeners. The results of the MUSHRA tests are shown in Figure 8. It is found that the proposed version of the codec, with SNS, is competitive with the state-of-art codecs at similar bit-rates. 6 Conclusions We identify the problem of encoding tonal signals in codecs based on spectral dynamics in sub-bands. We propose the technique of spectral noise shaping to overcome this issue. The technique relies on the fact that tonal signals are temporally predictable and the residual of the prediction can be efficiently processed using the codec. Without increasing the bit-rate, the quantization noise at the receiver can be shaped in the frequency domain according to the spectral characteristics of the input signal. For some audio samples like Alto-flute and Trumpet, the current version of the SNS module does not significantly improve the performance as the inverse TDLP filter is unable to completely capture the signal dynamics. Further refinements of the SNS module for these signals, form part of the future work.

8 IDIAP RR 08-16 References [1] P. Motlicek, H. Hermansky, S. Ganapathy and H. Garudadri, Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes, Proc. of TSD, LNCS/LNAI series, Springer-Verlag, Berlin, pp. 350-357, September 2007. [2] S. Ganapathy, P. Motlicek, H. Hermansky, and H. Garudadri, Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding, Audio Engineering Society, 124th Convention, Amsterdam, Netherlands, May 2008. [3] Marios Athineos and Dan Ellis, Autoregressive Modeling of Temporal Envelopes, IEEE Trans. on Signal Proc., Vol. 55, Issue 11, Nov. 2007 pp. 5237-5245. [4] LAME MP3 codec: http://lame.sourceforge.net [5] J. Herre and J.D Johnston, Enhancing the Performance of Perceptual Audio Coders by using Temporal Noise Shaping (TNS), Audio Engineering Society, 101st Convention, Los. Angeles, USA, November 1996. [6] 3GPP TS 26.1: Enhanced aacplus general audio codec; General Description, 2004. [7] J. Makhoul, Linear Prediction: A Tutorial Review, in Proc. of the IEEE, Vol 63(4), pp. 561-580, 1975. [8] L.S. Marple, Computing the Discrete-Time Analytic Signal via FFT, IEEE Trans. on Acoustics, Speech and Signal Proc., Vol. 47, pp. 2600-2603, 1999. [9] ISO/IEC JTC1/SC29/WG11: Framework for Exploration of Speech and Audio Coding, MPEG2007/N9254, July 2007, Lausanne, Switzerland. [10] Musical Instrumental Samples, http://theremin.music.uiowa.edu/mis.html. [11] ITU-R Recommendation BS.1387: Method for objective psychoacoustic model based on PEAQ to perceptual audio measurements of perceived audio quality, December 1998. [12] ITU-R Recommendation BS.1534: Method for the subjective assessment of intermediate audio quality, June 2001. [13] Martin Dietz, Lars Liljeryd, Kristofer Kjorling and Oliver Kunz, Spectral Band Replication, a novel approach in audio coding, Audio Engineering Society, 112th Convention, Munich, Germany, May 2002.