Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization

Similar documents
HIGH-FREQUENCY TONAL COMPONENTS RESTORATION IN LOW-BITRATE AUDIO CODING USING MULTIPLE SPECTRAL TRANSLATIONS

SUBJECTIVE QUALITY OF SVC-CODED VIDEOS WITH DIFFERENT ERROR-PATTERNS CONCEALED USING SPATIAL SCALABILITY

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

A 100MHz voltage to frequency converter

Benefits of fusion of high spatial and spectral resolutions images for urban mapping

Compound quantitative ultrasonic tomography of long bones using wavelets analysis

Power- Supply Network Modeling

3D MIMO Scheme for Broadcasting Future Digital TV in Single Frequency Networks

Optical component modelling and circuit simulation

QPSK-OFDM Carrier Aggregation using a single transmission chain

Linear MMSE detection technique for MC-CDMA

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

A high PSRR Class-D audio amplifier IC based on a self-adjusting voltage reference

A New Scheme for No Reference Image Quality Assessment

Communications Theory and Engineering

On the role of the N-N+ junction doping profile of a PIN diode on its turn-off transient behavior

An Audio Watermarking Method Based On Molecular Matching Pursuit

RFID-BASED Prepaid Power Meter

BANDWIDTH WIDENING TECHNIQUES FOR DIRECTIVE ANTENNAS BASED ON PARTIALLY REFLECTING SURFACES

Adaptive noise level estimation

Study on a welfare robotic-type exoskeleton system for aged people s transportation.

Gis-Based Monitoring Systems.

Wireless Energy Transfer Using Zero Bias Schottky Diodes Rectenna Structures

Impact of the subjective dataset on the performance of image quality metrics

A New Approach to Modeling the Impact of EMI on MOSFET DC Behavior

Gate and Substrate Currents in Deep Submicron MOSFETs

Exploring Geometric Shapes with Touch

Enhanced spectral compression in nonlinear optical

Modelling and Hazard Analysis for Contaminated Sediments Using STAMP Model

Floating Body and Hot Carrier Effects in Ultra-Thin Film SOI MOSFETs

Neel Effect Toroidal Current Sensor

Dictionary Learning with Large Step Gradient Descent for Sparse Representations

On the Use of Vector Fitting and State-Space Modeling to Maximize the DC Power Collected by a Wireless Power Transfer System

A multi-sine sweep method for the characterization of weak non-linearities ; plant noise and variability estimation.

Analysis of the Frequency Locking Region of Coupled Oscillators Applied to 1-D Antenna Arrays

Influence of ground reflections and loudspeaker directivity on measurements of in-situ sound absorption

Two Dimensional Linear Phase Multiband Chebyshev FIR Filter

Stewardship of Cultural Heritage Data. In the shoes of a researcher.

Improved Estimation of the Amplitude Envelope of Time Domain Signals Using True Envelope Cepstral Smoothing.

Measures and influence of a BAW filter on Digital Radio-Communications Signals

The Galaxian Project : A 3D Interaction-Based Animation Engine

PANEL MEASUREMENTS AT LOW FREQUENCIES ( 2000 Hz) IN WATER TANK

Small Array Design Using Parasitic Superdirective Antennas

On the robust guidance of users in road traffic networks

MODELING OF BUNDLE WITH RADIATED LOSSES FOR BCI TESTING

Towards Cognitive Radio Networks: Spectrum Utilization Measurements in Suburb Environment

The HL7 RIM in the Design and Implementation of an Information System for Clinical Investigations on Medical Devices

Concepts for teaching optoelectronic circuits and systems

Process Window OPC Verification: Dry versus Immersion Lithography for the 65 nm node

Computational models of an inductive power transfer system for electric vehicle battery charge

Augmented reality as an aid for the use of machine tools

Indoor Channel Measurements and Communications System Design at 60 GHz

Towards Decentralized Computer Programming Shops and its place in Entrepreneurship Development

High finesse Fabry-Perot cavity for a pulsed laser

VR4D: An Immersive and Collaborative Experience to Improve the Interior Design Process

A generalized white-patch model for fast color cast detection in natural images

An image segmentation for the measurement of microstructures in ductile cast iron

A STUDY ON THE RELATION BETWEEN LEAKAGE CURRENT AND SPECIFIC CREEPAGE DISTANCE

Application of CPLD in Pulse Power for EDM

Ironless Loudspeakers with Ferrofluid Seals

Sound level meter directional response measurement in a simulated free-field

Feature extraction and temporal segmentation of acoustic signals

Globalizing Modeling Languages

Characterization of Few Mode Fibers by OLCI Technique

A design methodology for electrically small superdirective antenna arrays

Design of Cascode-Based Transconductance Amplifiers with Low-Gain PVT Variability and Gain Enhancement Using a Body-Biasing Technique

A technology shift for a fireworks controller

Antenna Ultra Wideband Enhancement by Non-Uniform Matching

Application of the multiresolution wavelet representation to non-cooperative target recognition

Indoor MIMO Channel Sounding at 3.5 GHz

L-band compact printed quadrifilar helix antenna with Iso-Flux radiating pattern for stratospheric balloons telemetry

Motor Nonlinearities in Electrodynamic Loudspeakers: Modelling and Measurement

Dynamic Platform for Virtual Reality Applications

PMF the front end electronic for the ALFA detector

FeedNetBack-D Tools for underwater fleet communication

Enhancement of Directivity of an OAM Antenna by Using Fabry-Perot Cavity

Analytic Phase Retrieval of Dynamic Optical Feedback Signals for Laser Vibrometry

Improvement of The ADC Resolution Based on FPGA Implementation of Interpolating Algorithm International Journal of New Technology and Research

Overview of Code Excited Linear Predictive Coder

Electronic sensor for ph measurements in nanoliters

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

A perception-inspired building index for automatic built-up area detection in high-resolution satellite images

100 Years of Shannon: Chess, Computing and Botvinik

UML based risk analysis - Application to a medical robot

Design Space Exploration of Optical Interfaces for Silicon Photonic Interconnects

Electrical model of an NMOS body biased structure in triple-well technology under photoelectric laser stimulation

NOVEL BICONICAL ANTENNA CONFIGURATION WITH DIRECTIVE RADIATION

A Comparison of Phase-Shift Self- Oscillating and Carrier-based PWM Modulation for Embedded Audio Amplifiers

S-Parameter Measurements of High-Temperature Superconducting and Normal Conducting Microwave Circuits at Cryogenic Temperatures

Performance comparison of pulse-pair and wavelets methods for the pulse Doppler weather radar spectrum

A Low-cost Through Via Interconnection for ISM WLP

Resonance Cones in Magnetized Plasma

New Structure for a Six-Port Reflectometer in Monolithic Microwave Integrated-Circuit Technology

Concentrated Spectrogram of audio acoustic signals - a comparative study

An improved topology for reconfigurable CPSS-based reflectarray cell,

STUDY OF RECONFIGURABLE MOSTLY DIGITAL RADIO FOR MANET

Nonlinear Ultrasonic Damage Detection for Fatigue Crack Using Subharmonic Component

INVESTIGATION ON EMI EFFECTS IN BANDGAP VOLTAGE REFERENCES

Design of an Efficient Rectifier Circuit for RF Energy Harvesting System

Transcription:

Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization Imen Samaali, Monia Turki-Hadj Alouane, Gaël Mahé To cite this version: Imen Samaali, Monia Turki-Hadj Alouane, Gaël Mahé. Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization. International Symposium on I/V Communications and Mobile Networks, Sep 2, Rabat, Morocco. <hal-686323> HAL Id: hal-686323 https://hal.archives-ouvertes.fr/hal-686323 Submitted on Apr 22 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Attack restoration in low bit-rate audio coding, using an algebraic detector for attack localization Imen Samaali, Monia Turki-Hadj Alouane Unité Signaux et Systèmes (U2S) Ecole Nationale d Ingénieurs de Tunis, Tunisie Email: imen.samaali@mi.parisdescartes.fr, m.turki@enit.rnu.tn Gaël Mahé LIPADE Université Paris Descartes, France Email: Gael.Mahe@mi.parisdescartes.fr Abstract This paper deals with pre-echo reduction in low bit-rate audio compression. [] proposed an attack restoration method based on the correction of the temporal envelop of the decoded signal. A small set of coefficients were then transmitted through a limited bit-rate auxiliary channel. However, the transmission of the transient position computed on the original audio signal was required. In this paper, we deployed a new method of attack localization based on differential algebraic, which guaranties a successful detection on the decoded audio signal. The algebraic method has also a reduced complexity compared to the index stationary detector used in []. The new proposed approach is evaluated for single audio coding-decoding, using objective perceptual measures. The experimental results for MP3 coding exhibits an efficient restoration of the attacks and a significant improvement of the audio quality. Index Terms Temporal envelope, ARMA modeling, audio coding, sound attack, algebraic detector, attack restoration..5.5 5.6 5.8 5. 5.2 5.4 x 4.5 pre echo phenomenon.5 5.5 5. 5.5 x 4 I. INTRODUCTION Transient waveforms, window length and psycho-acoustic bit allocation interact to produce pre-echo in low bit-rate audio coding (see Figure ). When a transient occurs, a perceptual model allocates few bits for the quantization of the frame parameters. At the decoder, the quantization noise, supposed to be fully masked, may spread over the entire block. Therefore, this noise precede the time domain transient and then produce a potentially audible artefact known as pre-echo [2]. In addition, in a low bit-rate context, the attacks may be smoothed through coding, which reduces the percussive quality of sounds. Many methods have been proposed to tackle the problem of echo in transform audio coding, especially for the case of modified discrete cosine transform (MDCT) coding. The most popular approach is to make the filterbank signal adaptive, using window switching controlled by transient detection [2] or close-loop decision. Usually window switching implies extra delay and complexity compared with using a nonadaptive filterbank. Another popular approach is the temporal noise shaping (TNS) [3] which allows the encoder to control the temporal fine structure of the quantization noise. A method was proposed in [] aiming at reducing echo artifacts after transform decoding. The principle is to restore the temporal envelope of the signal. Time envelope computation is based on linear prediction in frequency domain. Note that the restoration method requires the transmission of the coefficients Fig.. illustration of the pre-echo artefact from castanet signal coded at 56 kbps using MP3 coder of the ARMA model describing the temporal envelope and the transient position as side information. We suppose that we have an auxiliary channel to convey this side information, with a reduced bit-rate ( 5 bps, for example a watermark). In order to reduce the complexity of pre-echo reduction and to allocate all the available bit-rate to the transmission of the temporal envelope parameters, we propose a new method based on an algebraic detector, to localize transient positions on the decoded audio signal. Therefore, there is no more need to the transmit transient position information. The remain of this paper is structured as follows: in section 2, we present the algebraic detection algorithm used to estimate transient positions. In section 3, the new approach dedicated to attack restoration is developed. Section 4 presents a performance evaluation of the proposed algorithm in the case of MP3 simple encoding. II. TRANSIENT LOCALIZATION BY ALGEBRAIC DETECTOR The transient localization proposed in [] is based on a distance measurement between successive time-frequency representations of the signal, which is quite complex. Moreover, since the localization may be inaccurate in the decoded signal, two localizations are performed in the coding part: before and after coding-decoding, in order to transmit the anticipated error of localization through the side-channel.

In order to reduce the complexity of transient localization, we propose to use the change point detection method described in [4]. This latter is based on algebraic manipulations of a piecewise polynomial signal. The input signal,, can be represented as a piecewise polynomial with maximum one discontinuity on the time interval,. Let set () for the restriction of the signal in and redefine the discon- with: tinuity point, say, relatively to if is smooth otherwise In the sense of the distribution theory, the order derivative of the input signal can be written: where and is the jump of the order derivative at the point represents the regular part of the order derivative of the signal. If, there is no spike in the given interval. If, there is a spike in given interval at location. [4] proposes a detector, based on an algebraic determination of, computed through simple filtering of. The detector D(t) relies on a decision function which must be greater than some threshold if a changing point exists in the interval. To illustrate the ability of the method to detect different change points in the same signal, the algorithm described below is implemented using only a third order derivative. The test signal is a batteries composed of 4 attacks (figure 2 (a)). The results in figures 2 (b) and (c) show how all the change-points are correctly detected: each change point position matches a corresponding attack position. In the next experiment, we investigate the quality of algebraic detector versus the stationarity index detector [6]. Figure 3 compares error detections for both algebraic and stationarity index detector using original and mp3 decoded triangle-castanet audio signal. The error detection corresponds to the difference between the actual and estimated transient positions. The algebraic detector has high accuracy and closely approximates all the considered attacks positions. III. PRE-ECHO REDUCTION SYSTEM As detailled in [], the restored audio signal,, is given at time by: P (3) P where P is the decoded audio signal, is the temporal P is the temporal envelope of the original signal, and envelope of the coded-decoded signal. The correction constitutes a post-processing performed at the decoder. The parameters related to the temporal envelope (2) Fig. 2. (c) 2 4 6 8 x 4 (b) 2 4 6 8 x 4.5 (c).5 2 4 6 8 x 4 Batterie signal (a), Decision function (b) and change point detector Error detection 3 25 2 5 5 5 AD : original signal AD: decoded signal SI: original signal SI: decoded signal 5 2 3 4 5 6 7 attack s index Fig. 3. Batterie signal (top) and corresponding change point detector (bottom) estimation are extracted by the encoder and transmitted to the decoder through an auxiliary channel. Figures 4 and 5 illustrate the block diagrams of the basic treatments at the encoder and the decoder. Input Audio Signal, Fig. 4. Frame Type Characterization Transient Localization Temporal Envelope Parameters Parameters Coding Block diagram of treatment at the encoder. (a) Auxiliary Channel The audio signal is first fed into a frame characterization in order to check if the frame is transient or not. To detect transient frames, we use the technique described in [5]. For

➒ 5 3 Decoded Audio Signal, auxiliary Channel Frame Type Characterization Transient Temporal Envelope Computation Localization Parameters Decoding Audio Signal Correction Restored Audio signal, Fig. 5. Block diagram of treatment at the decoder. the transient frames, a localization of the attack time positions is performed using the algebraic detector presented in section 2. Non transient frames are divided into two equal sub-frames. An estimate of the temporal envelope is computed using a frequency domain linear prediction model (FDLP) based on an ARMA model, which parameters are transmitted over a very low bit-rate auxiliary channel. At the decoder, the received bitstream is decoded in order to extract the ARMA coefficients for computing the estimate of the original time envelope. In parallel, a frame type characterization and a transient localization are performed from the decoded signal. Estimates of the temporal envelopes for both original and decoded signals are computed. Finally, the restored audio signal,, is obtained according to 3. A. Temporal envelope ARMA modeling he temporal envelope is estimated using the frequency domain linear prediction (FDLP). In fact, in the same way that TDLP (Time domain linear prediction) estimates the power spectrum, FDLPO estimates the temporal envelope of the signal, specifically the square of its Hilbert envelope [7]: (4) i.e. the inverse Fourier transform of the autocorrelation of the single sided (positive frequency) spectrum. The block diagrams of the temporal envelope estimate is depicted in Figure 6. To get an approximation of the Hilbert envelope, first the Discrete Cosine Transform (DCT) is applied to a given audio segment. Next, a linear prediction is applied to the DCT transformed signal in order to get an ARMA model. An estimation of the temporal envelope of is therefore given by: where q q sr t (5) 2 2 2 4 3 65 2 7 2 2 q98 q9 (6) Fig. 6. ➍ where 7 2 2 ➓ DCT ➐ ➑ ➎ ➏ ❶ ❶ ❷ ❸ ❹❺ ❻ ❼❾❽ ❿ ➀ ➁➃➂ ➄ ➀ ➅ ➆ ➀ ➇➉➈ ❽➋➊➀ ➁➃➂➃➌ ➀ ➅ ➆ ➀ ❸ ARMA(p,q) Block diagram of the temporal envelope estimation. 2 2 and are the ARMA coefficients. The selection of the FDLP model order is guided by the temporal structure of signal in the same way as the TDLP model order is dictated by the formant structure. To illustrate the importance of model order, figure 7 shows a violon segment at 44. khz sampling rate and its corresponding time envelope estimates obtained by and ARMA(7,3) models. As expected, with an, only a smooth version of the time envelope is given, however, in the case of ARMA(7,3), the envelope almost fits the pitch pulses. An evaluation of the FDLP model order based on objective measurements of the audio quality will be presented in section IV. B. Parameter coding After the ARMA parameters are estimated, they must be coded and transmitted through the auxiliary communication channel. The autoregressive coefficients (ARMA) are characterized by large dynamic range and would require many bits per coefficient for accurate coding. For this reasons, it is necessary to transform the ARMA coefficients into reflection coefficient (RC). For each sub-frame, a vector grouping the RC coefficients is coded using a classical vector quantization technique. The codebook C is obtained by training on a database taken from various kinds of audio signals. It can be computed by the Lloyd-Max algorithm [8]. The size of the codebook and the corresponding bit-rate will be discussed in section IV..3.2...2 ARMA(7,3) original signal 3 3.2 3.4 3.6 3.8 4 4.2 x 4 Fig. 7. Original violon segment and temporal envelopes using and ARMA(7,3) model respectively.

IV. EXPERIMENTAL EVALUATION OF THE PROPOSED APPROACH The experiments aim at validating our approach and comparing the restored audio signal to the original one. We use the PEMO-Q software described in [9] as an objective measurement to compare the score of Objective Difference Grade (ODG) and instantaneous Perceptual Similarity Measure (PSM ). The ODG is a perceptual audio quality measure, which rates the difference between test and reference signals among a scale from (imperceptible) to -4 (very annoying). The values of PSM vary at the interval [,], with indicating the best similarity between the reference and the test signals. These perceptual measures are correlated to the Subjective Difference Grade (SDG) for audio quality. As a primary evaluation of the proposed approach, the coded/decoded castanet signal at 56 kbps and the corrected version are shown in Figure 8. It can be seen that the MP3 coder introduces a pre-echo and smoothes attacks. In the reconstructed signal, the pre-echo is considerably reduced and the attack is restored. Given that the bit-rate offered by the auxiliary channel is limited to 2 bits per frame (434 bps), we study the influence of the ARMA order on the audio quality. Figure 9 compares the variations, over bit-rates, of the ODG and PSM for both coded/decoded signal and its restored version. For comparison, we present the system evaluation using two ARMA model orders with two different bit-rate coding: coded at 6 bits per frame (347 bps) as used in [] and ARMA(5,3) coded at 2 bits per frame (434 bps). Remain that in addition to the parameter coefficients, 4 bits are allowed to code the transient position in []. As illustrated in Figure 9, the proposed correction using ARMA(5,3) provides a significant enhancement of the PSMt and ODG. With only the improvement is slighter..5.5.4.2.2.4 (a) original signal 2 4 6 8 2 (b) coded/decoded signal pre echo phenomenon 2 4 6 8 2 PSMt ODG.9.8.7.5 2 2.5 3 MP3 ARMA(5,3) 35 4 45 5 55 6 Bit rates (kbps) MP3 ARMA(5,3) 35 4 45 5 55 6 Bit rates (kbps) Fig. 9. Mean of Perceptual Similarity Measure and Objective Difference Grade for castanet signals using MP3 coder, with (ARMA) and without (MP3) correction. V. CONCLUSION Low-bit-rate coding-decoding with standard MP3 coder smoothes attacks in transients signals and increases the preecho. We have proposed an attack restoration method, based on accurate transient localization and temporal envelope correction, using a small set of information transmitted through an auxiliary channel. Our method enhances significantly the audio quality as measured by ODG and PSM. REFERENCES [] I. Samaali, M.turki and G.Mahé, Temporal envelope correction for restoration of attacks in low bit-rate audio coding, EUSIPCO 29. [2] K. Iwai Kyle, Pre-echo Detection and Reduction, Master of science, Massachusetts Institute of Technologie, May 994. [3] Jrgen HERRE Temporal Noise Shaping, Quantization and Coding Methods in Perceptual Audio Coding: a Tutorial Introduction AES 7 Internationanl Conference on High Quality Audio Coding. [4] M.Mboup, C.Join and M.Fliess, A delay estimation approach to changepoint detection, ICASSP 28. [5] 3GPP TS 26.43, Advanced Audio Coding (AAC) part September 24. [6] S. Larbi, M. Jaidane, Audio Watermarking: A Way To Stationarize Audio Signals, IEEE Trans. Signal Processing, Vol. 53 (2), pp. 86 823, 25. [7] M. Athineos, D. P.W.Ellis, Frequency-Domain linear Prediction For Temporal Features, ASRU 23. [8] V.S. Jayanthi, K.S. Marothi, T.M. Ishaq, M. Abbas and A. Shanmugam, Performance Analysis of Vector Quantizer using Modified Generalized Lloyd Algorithm, IJISE, vol., pp. -5, Jan 27. [9] R. Huber, B.Kollmeier, PEMO-Q A New Method for Objective Audio Quality Assessment Using a Model of Auditory Perception, IEEE Transactions on audio, speech and language processing, vol. 4, No. 6, November 26..5.5 (c) reconstructed signal 2 4 6 8 2 Fig. 8. Attack restoration for a castanet signal coded by a MP3 coder at 56 kbps