Evaluation of Audio Compression Artifacts M. Herrera Martinez

Similar documents
Audio Compression using the MLT and SPIHT

FPGA implementation of DWT for Audio Watermarking Application

THE STATISTICAL ANALYSIS OF AUDIO WATERMARKING USING THE DISCRETE WAVELETS TRANSFORM AND SINGULAR VALUE DECOMPOSITION

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

Speech Coding in the Frequency Domain

Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization

Assistant Lecturer Sama S. Samaan

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Audio and Speech Compression Using DCT and DWT Techniques

Auditory modelling for speech processing in the perceptual domain

Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Chapter 4 SPEECH ENHANCEMENT

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Communications Theory and Engineering

Objective Evaluation of Edge Blur and Ringing Artefacts: Application to JPEG and JPEG 2000 Image Codecs

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Audio Coding based on Integer Transforms

World Journal of Engineering Research and Technology WJERT

Sound Synthesis Methods

Digital Signal Processing

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Pre-Echo Detection & Reduction

TRANSFORMS / WAVELETS

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

An Audio Watermarking Method Based On Molecular Matching Pursuit

EE482: Digital Signal Processing Applications

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

TRADITIONAL PSYCHOACOUSTIC MODEL AND DAUBECHIES WAVELETS FOR ENHANCED SPEECH CODER PERFORMANCE. Sheetal D. Gunjal 1*, Rajeshree D.

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Audio Watermarking Scheme in MDCT Domain

Image Compression Technique Using Different Wavelet Function

Audio Fingerprinting using Fractional Fourier Transform

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

Image compression using Thresholding Techniques

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Digital Image Processing

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Wavelet Transform Based Islanding Characterization Method for Distributed Generation

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Introduction to Wavelets. For sensor data processing

Audio Signal Compression using DCT and LPC Techniques

Ensemble Empirical Mode Decomposition: An adaptive method for noise reduction

High capacity robust audio watermarking scheme based on DWT transform

WAVELET SIGNAL AND IMAGE DENOISING

SGN Audio and Speech Processing

Original Research Articles

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Digital Audio Watermarking With Discrete Wavelet Transform Using Fibonacci Numbers

Advances in Direction-of-Arrival Estimation

Analysis of LMS Algorithm in Wavelet Domain

Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich *

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

Nonlinear Filtering in ECG Signal Denoising

A DUAL TREE COMPLEX WAVELET TRANSFORM CONSTRUCTION AND ITS APPLICATION TO IMAGE DENOISING

Power System Failure Analysis by Using The Discrete Wavelet Transform

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Drum Transcription Based on Independent Subspace Analysis

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

CHAPTER 3 WAVELET TRANSFORM BASED CONTROLLER FOR INDUCTION MOTOR DRIVES

Signal Processing Toolbox

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

technology, Algiers, Algeria.

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Detection of Voltage Sag and Voltage Swell in Power Quality Using Wavelet Transforms

Multi-Resolution Wavelet Analysis for Chopped Impulse Voltage Measurements

An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Encoding higher order ambisonics with AAC

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

Application of Hilbert-Huang Transform in the Field of Power Quality Events Analysis Manish Kumar Saini 1 and Komal Dhamija 2 1,2

/$ IEEE

Feature analysis of EEG signals using SOM

Comparative Analysis between DWT and WPD Techniques of Speech Compression

Audio Signal Performance Analysis using Integer MDCT Algorithm

Introduction of Audio and Music

DWT based high capacity audio watermarking

Subband Analysis of Time Delay Estimation in STFT Domain

New Windowing Technique Detection of Sags and Swells Based on Continuous S-Transform (CST)

Sound pressure level calculation methodology investigation of corona noise in AC substations

Multi scale modeling and simulation of the ultrasonic waves interfacing with welding flaws in steel material

Speech Compression Using Wavelet Transform

SSIM based Image Quality Assessment for Lossy Image Compression

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Quality Evaluation of Reconstructed Biological Signals

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

GUI Based Performance Comparison of Noise Reduction Techniques based on Wavelet Transform

APPLICATION OF DISCRETE WAVELET TRANSFORM TO FAULT DETECTION

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Transcription:

Evaluation of Audio Compression Artifacts M. Herrera Martinez This paper deals with subjective evaluation of audio-coding systems. From this evaluation, it is found that, depending on the type of signal and the algorithm of the audio-coding system, different types of audible errors arise. These errors are called coding artifacts. Although three kinds of artifacts are perceivable in the auditory domain, the author proposes that in the coding domain there is only one common cause for the appearance of the artifact, inefficient tracking of transient-stochastic signals. For this purpose, state-of-the art audio coding systems use a wide range of signal processing techniques, including application of the wavelet transform, which is described here. Keywords: Audio-coding, Artifacts, Wavelet transform, Psychoacoustics, Orthonormal transforms. This text was a part of the International Conference POSTER 2006 which was held in Faculty of Electrical Engineering CTU in Prague. 1 Introduction Information Technology has seen big advances in the audio data storage and transmission field. In 1981, the CD (Compact Disc) was developed by Philips Corporation in the Netherlands, implementing a storage solution based on optical laser and digital representation. However, data transmission bounds have demanded lower transmission rates, and therefore compression algorithms for reducing the data information stream without significantly distorting the signal. In 1987, the Fraunhoffer Institute released a compression algorithm standard, based on perceptual models of the hearing system, using masking phenomena models. The quantization noise introduced by these coding systems, specially when coding at low bit rates, gave rise to audible distortion errors, known as artifacts. Subjective evaluation led to a blurred classification of these artifacts. One type of artifact preecho is dealt with in this paper. Preecho cancelation is discussed, and then a wavelet transform technique for this purpose is discussed, as well as some mathematic considerations about the transform. Hybrid coders, making use of FFT or DCT for the quasi-periodic s of the signal, and DWT for the transient attacks of the signal seem to be, in author s opinion, the right direction for further research. 2 Two psychometric methods for evaluating coding systems DBTS and SR are two pscyhometric methods that have been tested for subjective evaluation of audio codecs. Results from these tests have been published in [1][2], together with a description of the tests, excerpts and results. The DBTS method is a psychometric method which introduces the reference signal. The listener compares the coded signal with the reference, while in SR the reference is not introduced. Here we show the ANOVA tables which show that the DBTS method is stricter than SR. Table 1: ANOVA results for DBTS methodology Source of variation Degr of Freed Sums of Squares Mean Square Variance Ratio (F) Probability Factor A 5 109.8867 21.9773 59.0312 p<0.05 Factor B 6 12.0919 2.0153 5.4131 p<0.05 Factor A B 30 47.5625 1.5854 4.2584 p<0.05 Error 840 312.7176 0.3723 Total 881 482.2587 Table 2: ANOVA results for SR methodology Source of variation Degr of Freed Sums of Squares Mean Square Variance Ratio (F) Probability Factor A 5 7.1488 1.4298 5.7146 p<0.05 Factor B 6 9.6904 1.6151 6.4552 p<0.05 Factor A B 30 7.4262 0.2475 1.12 p<0.05 Error 966 241.6617 0.2502 Total 1007 265.9271 12 Czech Technical University Publishing House http://ctn.cvut.cz/ap/

3 Artifacts from audio compression Subjective tests performed on coded-audio signals show that individual codecs vary considerably in performance (this is validated by the ANOVA method), and also differ in performancedependingonthetypeofsignalthatisusedforthe test. Coding signals with a strongly aperiodic character, called attack signals or signals with transient behaviour lead to an artifact known as preecho. Similarly, speech signal coding introduces to the signal an artifact known as reverberation. Sometimes, when coding at low bit rates, variations in the masking threshold from one frame to the next may lead to different bit assignments, and as a result some groups of spectral coefficients can appear or disappear[3]. Preecho is analyzed here, and some techniques for canceling it are described. When describing artifact generation, researchers explain that a pointed artifact originates because of incorrect bit assignment from frame to frame, due to dispersion of the signal energy, which spreads out to neighbouring frames and even subbands. The relations between these dispersion lengths give rise to various perceptual artifacts. In the time domain, it is signals with a transient-stochastic character, that are affected. Percussive signals such as castanets, cymbals, clicks, claps, drums, etc. give rise to preecho when coding. Some authors observe that subband coders give better results when tracking transient signals, but the fixed window length that they apply does not track these signals accurately. For this purpose a wide range of techniques have been implemented, as will be described below. 4 Audio critical material selection During this work, the author designed a program in the Matlab environment to describe the energy of the signal in each of the subbands that subband coders use. Signals with a transient character show dispersion of their energy through the neighbouring subbands. Fig. 2: Power spectrum density of a castanet audio signal Fig. 1: A pre-echo artifact in a castanet excerpt [3] Plosive phonemes are stochastical speech signals with a noisy character arising from turbulent air streaming in the formation of some consonants. When coding these signals, which of course occur together with vowel sounds of quasi-periodic character, reverberation is perceived. When coding a signal which consists not only of the s explained above, but which has a frecuency representation that gives strong variations of the masking threshold from one frame to the next, the birdies artifact is perceived. 3.1 Origin of compression artifacts The general structure of an audio-coder is given in [4]. There are three types of audio-coding systems, which differ according to the way they feed the input signal into the psychoacoustic model. The first type are transform coders, where samples from the input signal are transformed to the frequency domain. The second type are subband coders, where the transformation is performed, and then the masking thresholds are calculated for each subband. The third type are so-called parametric coders, in which a definite type of parametrization is observed. Therefore, for selecting audio material suitable for the subjective assessment of audio-codecs the program provides an estimation of which signals will behave critically and which not. Further research needs to be done to determine the relations between the subband representation of the particular signal and the artifact produced while compressed with a definite algorithm for transient tracking. Stating the relations between the power spectrum levels inside each subband should give a cue to further research. Figure 2 shows the energy allocation of the power spectrum density of signal castanets. 5 Current state-of-the-art for transient audio signal detection Digital Signal Processing clearly has some potential for transient detection. This includes modifications of the Discrete Cosine Transform, DCT, with block variable lengths, tracking transient signals more accurately. The discrete wavelet transform, DWT, is also a powerful tool for transient tracking. Some implementations use a hybrid DWT/DCT. Other approaches combine non-linear transform coding and structured approximation techniques, together with hybrid modeling of the signal class under consideration. Techniques with non-uniform lapped transforms are also used. Here, a non-uniform filter bank is obtaining by joining uniform cosine modulated filter banks using a transition filter. Audio Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 13

watermarking, in which a watermark signal modifies the statistical characteristics of audio signals, in particular its stationarity, is also used [5]. 5.1 Application of the wavelet transform while tracking transients The representation of the signal in the frequency domain in earlier coders, such as MPEG-1 layer III, Ogg Vorbis and others was based on FFT, or DCT. Nowadays, applications aimed at transient tracking, use hybrid DCT, DWT among others. Discarding the noise, an audio signal can be represented in the following way [6], x ton, x tran, (1) where { n, n 0,, N 1} is a wavelet basis, and { m, m 0,, N 1} is an MDCT basis. The resulting signal is x xtran xton r (2) Daudet et al. [6] describe and as subsets of the index sets, termed significance maps. Residual signal r is not sparse with respect to the two bases considered here. The main idea is that DCT, FFT and the other algorithms usually implemented in audio compression are very suitable for analysing and tracking the sinusoids or the quasi-stationary s of the signal. Transient tracking is more convenient with DWT. DWT transformation, and its ability to localize sharp attacks in time comes from the Fourier- -Plancharel transformation and the uncertainty principle. Further work is being done to apply these algorithms in improving codec performance. Signal Quasi-stationary Transient Fig. 4: Original castanet signal, critical material excerpt 2 1 t N e jt t 012,,,, N 1 N, 012,,,, N 1 In the Fourier basis, frequency localization is precise, but time localization is poor. The Euclidean orthonormal basis, which has the form (1,0,0,,N1), (0, 1, 0,, N1), (4) unlike FFT, performs precise localization in time, but is poor in frequency. STFT represented a possible solution to the problem. It windows the signal, and therefore gives the possibility to separate the signal into frames and get the frequency representation of these frames separately. However, it still faced the problem that because of the fixed window length, transient attack signals were non-efficiently tracked. DWT represents a compromise between these two limit representations, and performs good localization either in frequency or in time. Signal decomposition into a particular basis can be viewed as a scalar product of the signal with the corresponding coefficient of the basis. Mathematically, (3) Noisy Fig. 3: Signal decomposition used in state-of-the art codecs 5.2 Demonstration of the wavelet transform when solving a transient signal When castanets, one of the critical material excerpts, is processed by FFT or DCT with fixed window length, the spectrum disperses in such a manner that the bit-assignment derived from the psychoacoustic model is non-efficient and therefore an audible artifact known as preecho originates. The following figure shows the original castanet signal, DWT, FFT, DCT and the other orthonormal transforms perform signal decomposition of the signal to the decomposition basis. In the case of FFT, the decomposition orthogonal basis is the set of all functions, Fig. 5: 1-step decomposition of the signal using the wavelet transform 14 Czech Technical University Publishing House http://ctn.cvut.cz/ap/

( f, g) f( x) g( x) d x (5) representing how similar function f is to the corresponding coefficient of the orthonormal basis g. Signal decomposition, mathematically expressed, is a set- -mapping from the set of complex numbers to the set where the decomposition is described, C n ( z( 0), z( 1),, z( n 1)). (6) Let us perform a one-step decomposition of a castanet signal, with DWT. After one-step decomposition we achieve two signal s, depicted in Fig. 5. Let us reconstruct the signal with the coefficients that arose after one-step decomposition. Fig. 6 gives the reconstructed signal. Fig.8: Reconstructed detailed coefficients at levels 1, 2 and 3, from the wavelet decomposition structure. The upper figure is the original signal, followed by the reconstructed signal, and then the coefficients. Fig. 6: Invert direct decomposition of a signal using coefficients Higher levels of signal decomposition, of course, will give more accurate representations of the audio signal, in a similar manner as higher frequency resolution improves the accuracy of the frequency representation of the signal in FFT. DWT, then, has a hierarchical structure in which the higher the level that the decomposition affords, the longer the hierarchical DWT tree. Comparing Figures 4 and 6, we see that the reconstruction was succesfully performed. Now, let us perform a 3-step decomposition. A finite set of coefficients is obtained. Coefficient extraction is then performed, and this is presented in Fig. 7. Finally we reconstruct an approximation at level 3 from the wavelet decomposition structure. We perform reconstructions of detailed coefficients at levels 1, 2 and 3 from the wavelet decomposition structure (Fig. 8). Fig. 7: Detailed coefficients at levels 1, 2 and 3 from the wavelet decomposition structure. Original signals, ca3, cd3, cd2 and cd1. Fig. 9: Original and reconstructed signal Czech Technical University Publishing House http://ctn.cvut.cz/ap/ 15

The last step is signal reconstruction from the wavelet decomposition structure (Fig. 9). Transient signal reconstruction shows that DWT is a suitable method for decomposing transient signals, even performing just a 3-level decomposition. This result shows that a hybrid codec implementing FFT for extracting and processing quasi-stationary signals and DWT for extracting and processing transient signals is a more suitable algorithm for sound-coding than formerly-used codecs, which tracked signals with fixed window length DCT or FFT transforms. 6 Conclusions Psychometric methods were used to evaluate audio-coding systems. DBTS and SR were the methods chosen to perform the evaluation. From these tests, the ANOVA validation of results shows that not only the codec performance but also the characteristics of the signal have a strong impact on the evaluation. Signals with a percussive character, such as castanets, cymbals, claps and others, when coded by algorithms which implement DCT and FFT for frequency representation of the signal, show preecho as an auditory artifact produced due to compression. The two other artifacts, while appearing to differ from preecho in the auditory domain, in the author s opinion, they have the same origin: the incorrect bit-allocation of the masking coefficients. This is because the critical signal has a power spectrum which spreads out not only to two neighbouring frames, but to the neighbouring bands. The signal criticality can be checked by the program. Finally, some state-of-the-art techniques are discussed in order to efficiently track these critical audio signals, giving special attention to the wavelet transform. Acknowledgments This work has been supported by research project MSM 6840770014 Research in the Area of Prospective Information and Communication Technologies and by National Science Foundation grant No. 102/05/2054 Qualitative aspects of Audiovisual Information Processing in Multimedia Systems. References [1] Herrera, M.: Summary of the subjective evaluation of audio-coding testing at the CVUT during the period 2003 2005. In: XI. International Symposium of Audio and Video, Krakov (Poland), 2005. [2] Husnik, L., Herrera, M.: Comparison of Two Methods Used for the Subjective Evaluation of Compressed Sound Signals. In: Forum Acousticum. Budapest, 2005. [3] AES. Tutorial CD-ROM, Perceptual Audio Coders, What to listen for. New York, 2002. [4] Herrera, M., Dolejsi, P.: Subjective Evaluation of Audio-Coding Systems. In: INTERNOISE 2004. Prague, 2004. [5] Larbi, S., Jaidane, M.: Audio Watermarking: A Way to Stationnarize Audio Signals. In: IEEE Transactions of Signal Processing, Vol.53 (2005), No. 2, February 2005. [6] Daudet, L., Molla, S., Torresani, B.: Towards a Hybrid Audio Coder. In: Proceedings of the International Conference on Wavelet Analysis and Applications. February 2004. [7] http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/wavelet.htm Marcelo Herrera Martinez e-mail: herrerm@feld.cvut.cz Department of Radioelectronics Czech Technical University in Prague Technická 2 166 27 Prague, Czech Republic 16 Czech Technical University Publishing House http://ctn.cvut.cz/ap/