OF HIGH QUALITY AUDIO SIGNALS

Similar documents
APPLICATIONS OF DSP OBJECTIVES

2. REVIEW OF LITERATURE

Assistant Lecturer Sama S. Samaan

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Coding using Linear Prediction

Overview of Code Excited Linear Predictive Coder

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

A Bi-level Block Coding Technique for Encoding Data Sequences with Sparse Distribution

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Speech Coding in the Frequency Domain

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Communications Theory and Engineering

Digital Speech Processing and Coding

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Signal Processing. Introduction

Auditory modelling for speech processing in the perceptual domain

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Chapter 4 SPEECH ENHANCEMENT

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

Audio /Video Signal Processing. Lecture 1, Organisation, A/D conversion, Sampling Gerald Schuller, TU Ilmenau

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Introduction to Audio Watermarking Schemes

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Chapter 5: Signal conversion

SIGNALS AND SYSTEMS LABORATORY 13: Digital Communication

Chapter 1 INTRODUCTION TO DIGITAL SIGNAL PROCESSING. 1.1 Introduction 1.2 The Sampling Process

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

The Strengths and Weaknesses of Different Image Compression Methods. Samuel Teare and Brady Jacobson

Audio Signal Compression using DCT and LPC Techniques

Discrete Fourier Transform (DFT)

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

Chapter 9. Chapter 9 275

Removal of Line Noise Component from EEG Signal

Chapter 9 Image Compression Standards

Audio and Speech Compression Using DCT and DWT Techniques

RECOMMENDATION ITU-R BS

Communications IB Paper 6 Handout 3: Digitisation and Digital Signals

Image Processing Computer Graphics I Lecture 20. Display Color Models Filters Dithering Image Compression

8.3 Basic Parameters for Audio

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

SGN Audio and Speech Processing

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

Audio Compression using the MLT and SPIHT

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Computer Graphics. Si Lu. Fall er_graphics.htm 10/02/2015

Chapter 3. Data Transmission

Evaluation of Audio Compression Artifacts M. Herrera Martinez

A Novel Approach of Compressing Images and Assessment on Quality with Scaling Factor

GSM Interference Cancellation For Forensic Audio

Corso di DATI e SEGNALI BIOMEDICI 1. Carmelina Ruggiero Laboratorio MedInfo

Speech Synthesis; Pitch Detection and Vocoders

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

Prof. Feng Liu. Fall /02/2018

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Chapter 2. Early Attempts

Overview of Signal Processing

A Spread Spectrum Network Analyser

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

EXPERIMENTAL INVESTIGATION INTO THE OPTIMAL USE OF DITHER

Pulse Code Modulation

A Hybrid Technique for Image Compression

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

L19: Prosodic modification of speech

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Chapter 3 Data and Signals 3.1

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Synthesis Techniques. Juan P Bello

Voice Excited Lpc for Speech Compression by V/Uv Classification

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Signal Processing Toolbox

Digital Signal Processing of Speech for the Hearing Impaired

Principles of Communications ECS 332

image Scanner, digital camera, media, brushes,

Chapter IV THEORY OF CELP CODING

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Lab 3 FFT based Spectrum Analyzer

Lecture Schedule: Week Date Lecture Title

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Simplex. Direct link.

A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service

Overview of Digital Signal Processing

Lecture 9: Time & Pitch Scaling

Chapter 9. Digital Communication Through Band-Limited Channels. Muris Sarajlic

Transcription:

COMPRESSION OF HIGH QUALITY AUDIO SIGNALS 1. Description of the problem Fairlight Instruments, who brought the problem to the MISG, have developed a high quality "Computer Musical Instrument" (CMI) which is used for creating and manipulating music and sounds (mostly of musical instruments) for use in music recording, editing, mixing and composition for the music, film and advertising industries. These industries require sound reproduction of at least the quality of the sound stored on compact discs. [On a compact disc, sounds are sampled at 44.1kHz and digitised to 16 to 20 bits.] The CMI stores most of its input and output on multi-track magnetic tape, but there would be great advantages in speed and flexibility if the sounds could be stored on random access read/write discs. The high sampling rates required for the sound quality make this prohibitively expensive unless a method of substantially compressing the data can be developed. The compression of high-quality audio signals also has other applications; significant effort is going into the development of methods for transmitting high fidelity music along ISDN (telephone) channels. General methods of data compression are widely available but the compression of music has two special features. Firstly, the compression and decompression must occur in real time. To produce the quality of sound from compact discs 44,100 samples of 16-20 bits must be compressed per second. [Decompression will generally be a faster operation than compression.] This implies that the compression will be done using a program on a digital signal processing (DSP) or customised VLSI chip and it imposes restrictions on the complexity of the compression algorithm used. Secondly, in many circumstances it will be acceptable for the result of compressing and then decompressing a signal to be just perceptually identical to the original. Perceptually identical signals may be substantially different. Shortly before the study group, papers were presented at a conference in the USA (85th Convention of the Audio Engineering Society) which described in general terms an algorithm which compressed music from a Compact Disc in real-time, from 16 bits per sample to 1.45 bits per sample (Brandenburg et al, 1988). The main components of this algorithm were (1) A transformation of the data to the frequency domain by the discrete cosine transform. (2) Modification of the coefficientsby a "psycho-acoustic weighting function" 60

(3) Quantisation of the coefficients (4) Compression using an entropy (Huffman) coder. The Huffman coder and compression algorithms like it are reversible - the original signal can be recovered identically from the compressed signal. Because of the quantisation of the coefficients of the discrete cosine transform, the algorithm of Brandenburg is irreversible - the decompressed signal may be significantly degraded even though perceptually identical to the original signal. There is a third possibility - a near-reversible algorithm - which introduces a very slight degradation but from which the original signal may be very nearly recovered. Fairlight expressed an interest in all types of compression algorithm: reversible, near reversible and irreversible. It seemed, however, that matters such as quantisation and psycho-acoustic weighting would be determined by experiment and were beyond the scope of the one week study group and would be determined by experimental methods. We therefore decided to look at reversible and near-reversible algorithms. 2. Compression using autoregressions A general method for compressing data is to use a parametric model to predict observations. The parameters are estimated from the data and, instead of saving the raw data, the estimates of the parameters together with the residuals from the model are saved. If the model is a good one, the residuals have significantly smaller variance than the raw data and so can be stored (or transmitted) using fewer bits. The principal method of compression considered was the modelling of the data series using an autoregression, a method also known as linear predictive coding (LP C). In this method, the sequence of observations, x( n), is supposed to come from a model x(n) + b 1 x(n - 1) +...+ bpx(n - p) = e(n) and the e(n)'s are assumed to be white noise (i.e. independently and identically distributed random variables with mean zero and variance constant). For a basic implementation, an autoregression is fitted to a block of data and, as described above, the estimates of the b/s, the 'start up' values x(l),..., x(p) and the residuals e( n) are stored. For more sophisticated implementations, the final values from one block can be used as the starting values for the next. The use of theautoregressive model is attractive because there are fast algorithms 61

for estimating the parameters of the autoregression. Such algorithms operate in real time when carefully programmed on a Digital Signal Processing (DSP) chip for reasonably useful values of nand p. These algorithms are recursive in the order of the model: given the parameter estimates for an autoregression of order p, it is easy to compute quickly the parameter estimates for an autoregression of order p + 1. The issues that remain to be considered are: (i) the size of the block of data to be used. (ii) the order of the autoregression to be fitted. The two issues are related; as the size of the block of data becomes bigger, so it becomes more likely that the parameters of the model best describing the data will vary over the block - even the generating process itself will change. One approach is to decide upon a fixed block length and then for each block to fit various models (autoregressions of different orders) and to choose the best one. What is 'best' and the method of determining which model is best will depend on the final aim. For data compression, Rissanen (1978) showed that if a sequence of autoregressions of increasing order is fitted then the order yielding the encoded data of minimum length is that which minimizes BIC(p) = In s; + p InTjT where T is the length of the data and s~ is the residual mean square after fitting the autoregression of order p. To test the applicability of these ideas, a small number of experiments were conducted on digitised piano music provided by Bruce Tulloch of Fairlight Instruments. 3. Data analysis A set of 8000 data points from some piano music were analysed by fitting autoregressions. Figure 1 shows a plot of 1000 observations (0.02 seconds of music) and their periodogram. To investigate the importance of block size, autoregressions were fitted with the data divided into 16 blocks of length 500, and 8 blocks of 1000 and finally 1 block of 8000. In each case the order of the autoregression was determined using the BIC approach, with the maximum allowable order being 2v'T. In Table 1, the orders of the selected models and the corresponding residual variances are given for each block and each block size. 62

0 0 0 0C\I 0 0 8.... ~ :::J :t= g- o < o ~ o 200 400 600 800 1000 Time ~ on,..-..-~~--------r-------~------~------~------~~ 0.0 0.1 0.2 0.3 0.4 0.5 Frequency Figure 1: (a) Plot of 1000 observations (0.02 secs) of piano music. (b) The periodogram (squared amplitude of Fourier coefficients of the data plotted in (a». 63

Table 1. Results of autoregression fitting Block size = 500 Block size = 1000 Block size = 8000 Block AR NSR Block AR NSR Block AR NSR var order var order var order 53.3 34 0.016 47.6 58 0.008 19.8 110 0.00175 42.0 33 0.019 28.1 32 0.025 25.0 59 0.012 21.9 7 0.036 20.0 8 0.038 19.1 59 0.008 18.3 6 0.035 15.0 22 0.024 15.7 58 0.010 16.3 15 0.031 17.1 20 0.020 16.6 58 0.008 16.2 21 0.012 14.3 22 0.012 14.0 57 0.007 13.8 23 0.016 12.5 15 0.020 11.6 60 0.006 10.7 19 0.009 10.2 16 0.002 9.3 58 0.003 8.4 19 0.008 *NSR = Ratio of residual variance to signal (block) variance. From this table, we can. see, for example, that to fit a model to each of the first two blocks of 500 observations a total of 67 coefficients would be required, whereas the entire block of 1000 requires a model of only 58 parameters and this model has smaller residual variance than either of the models fitted to the smaller blocks. In fact the autoregression fitted to all 8000 observations requires only four parameters more than are required to fit autoregressions to each of the first four blocks of 500. The residual variance after fitting the model to all 8000 observations corresponds to a compression of about 4.5 bits per observation. There remains substantial scope for quantisation of the data (roughly removing the least significant bits from each observation) and thus compressing the data further. A further possibility would be to develop a 'vocabulary' of autoregressions. Then that member of the vocabulary which best fitted the data would be used to compute residuals. The benefit of this would be that the parameters of the autoregressions would not need to be stored each time, but rather would be stored in a dictionary. As a result, there need not be the same penalty for storing coefficients and so smaller blocks of data could be used for the same size model. 64

4. Compression of Fourier coefficients Further data compression is possible. Firstly, the residuals from the selected autoregression are not uncorrelated because the process generating the data is varying with time, and so a second autoregression could be fitted to the residuals from the first. A second approach is to transform to the frequency domain and perform data compression there. This may be particularly useful after fitting the autoregression, for the effect of that fitting is to make the amplitudes of the Fourier coefficients more nearly constant (that is to flatten or whiten the spectrum). A method is to divide the frequency range into bands, to compute the mean amplitude within the band and then instead of storing the amplitudes, to store the mean amplitude and the results of dividing each of the amplitudes by the mean amplitude. For the present data, the effect of this approach was to reduce the range of the amplitudes by a factor of 100. Clearly the effectiveness of this compression of Fourier coefficients depends on a good choice of bands. Cameron (1987) and Hannan and Rissanen (1988) provide methods for choosing the bands. After these compressions in the time and frequency domains a Huffman coder would then be used to provide a further compression of the signal. Thus a complete compression algorithm might consist of: 1. Transform to frequency domain 2. Use psychoacoustic weighting 3. Fit autoregressions and select that providing greatest compression 4. Use spectrum smoothing technique to reduce dynamic range of Fourier coefficients 5. Use entropy coder. The effectiveness of each of these steps and the interactions between them must be determined in part by experimentation. This was beyond the scope of the Study Group, but the results of the preliminary work done here suggest that the autoregression fitting would provide significant improvement in the compression of audio signals. 65

5. References Brandenburg, K. and Seitzer, D. (1988) OCF: Coding high quality audio with data rates of 64 kbit/sec. Proceedings of the 85th Convention of the Audio Engineeering Society. Cameron, M.A. (1987) An automatic, non-parametric spectrum estimator. Journal 0/ Time Series Analysis, 8, 379-387. Hannan, E.J. and Rissanen, J. (1988) The width of a spectral window, J. Appl. Prob., 25A, 301-307. Rissanen J. (1978) Modelling by shortest data description. Automatica, 14, 465-471. 66