Speech Coding in the Frequency Domain

Size: px
Start display at page:

Download "Speech Coding in the Frequency Domain"

Transcription

1 Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215

2 Introduction The speech production model can be used to efficiently encode speech signals. Real-life signals often also contain other sounds then single source speech; Background noises in real-life envirnoments Multiple speakers Music and mixed speech/music content in entertainment broadcasts Singing voices We need a generic coding mode for non-speech signals. Audio codecs are based on frequency-domain coding good choice for generic-mode coding.

3 Introduction TCX An early approach (in AMR-WB+) to combining CELP with frequency-domain coding was called transform coded excitation (TCX) Model spectral envelope with linear prediction as in CELP. Take discrete Fourier transform of predictor residual and quantize. From a statistical point of view, this is a valid approach: The excitation is a uncorrelated signal, whereby we can transform it directly and do not need overlap between frames in the same sense as classical audio codecs. Original residual and transform domain signal have the same amount of information Critical sampling and perfect reconstruction.

4 Introduction TCX The main issue with the original approach is that application of a proper objective function is difficult. Recall that the objective function is of the form H(x ˆx) 2. If we now apply a transform F on x, such that y = Fx. The objective function is transformed to HF 1 (y ŷ) 2. The matrix HF 1 is non-trivial whereby all samples of y are multiplied with each other. Direct quantization is inefficient (=does not give best SNR). Optimal quantization would require an exhaustive search.

5 Introduction Modern TCX variants To solve this problem, modern codecs (USAC and EVS) use MDCT as a time-frequency transform. MDCT is a lapped transform, that is, it is based on overlap-add, but has critical sampling. The envelope model is still used to model the shape of the spectrum. The spectral coefficients are encoded with entropy coding. A perceptual model can be applied in the frequency domain, whereby we do not get the problems with the objective function.

6 Introduction Outlook The rest of this lecture is a brief introduction to frequency domain coding. Our objtective is to obtain a method with the following properties: Transitions between windows are perceptually smooth. Critical sampling (frequency domain representation has as many samples as the input signal). Physically well-motivated = allows efficient processing = low leakage between frequency components.

7 Introduction Time-frequency domain A time-frequency representation represents the signal as frequency bands which evolv over time.

8 Windowing and overlap-add A basic component of most speech and audio methods is segmentation of the signal into windows. Processing in fixed-length blocks allows implementation of computationally efficient methods. When the signal can be assumed stationary within a segment, it can be modelled with a single stationary model, whereby output quality of simple methods is high. Audio processing methods generally use overlapping windows to obtain a fade-in/fade-out functionality. Not-overlapping methods would suffer from discontinuities which is perceptually bad. Overlapping windows which go smoothly to zero make sure that signal is continuous also after processing, which is perceptually good. Overlapping windowing is a perceptual tool!

9 Windowing and overlap-add 1 (a) Overlapping half sine windows Magnitude.5 1 x k x k+1 Time (c) Overlapping Kaiser Bessel derived windows Magnitude.5 x k x k+1 Time

10 Windowing and overlap-add For example, half-sine windows are defined as { ( ) sin (k+.5)π 2N, for k < 2N ω k =, otherwise A window of the input signal σ k is then ξ k = ω k σ k. Subsequent windows are obtained by shifting ω k in time.

11 Windowing and overlap-add Magnitude Time Magnitude (db) Frequency bin Frequency bin Frequency bin

12 Windowing and overlap-add Reconstruction After processing, the windowing function is applied once more ω k ˆξk on the processed signal ˆξ k. This makes sure that signal goes to zero at the borders also after processing. The windows of the signal are then added together. If σl,k and σ R,k are the left part of the current window and the right part of previous window, then for perfect reconstruction. we should have σ k = σ L,k + σ R,k = ω L,k ξ L,k + ω R,k ξ R,k = ω 2 L,kσ k + ω 2 R,kσ k = (ω 2 L,k + ω 2 R,k)σ k. If ωl,k 2 + ω2 R,k = 1 then the reconstructed signal is equal to the original signal.

13 Windowing and overlap-add Squared Magnitude Squared Magnitude (b) Overlapping squared half sine windows x k x k+1 Time (d) Overlapping squared Kaiser Bessel derived windows x x k k+1 Time

14 Windowing and overlap-add 1 (a) x -1 1 (b) W R 2 x w R (c) W L 2 x w L 2-1

15 Projections and time-domain aliasing cancellation If we have full overlap between windows, then for every window we get N new samples of data but every window contains 2N samples. The objective of coding is to reduce redundancy, but we have just doubled the amount of data! Overlapping windowing does not provide critical sampling. We need some method to return to critical sampling. MDCT is based on a projection known as time-domain aliasing cancellation (TDAC), which gives a representation with critical sampling.

16 Projections and time-domain aliasing cancellation The projection used in MDCT is based on splitting the signal into a symmetric and antisymmetric part. then x R,k = P R x k = [ J I ] x k, x L,k = P L x k = [ I J ] x k, P T L x L,k + P T R x R,k = (P T L P L + P T R P R)x k = x k since P T L P L + P T R P R = I. Moreover, P T R P Rx k is symmetric and P T L P Lx k is antisymmetric.

17 Projections and time-domain aliasing cancellation 1 (a) x -1 1 (b) P R H P R x -1 1 (c) P L H P L x -1

18 Projections and time-domain aliasing cancellation If x k is of length N, then the projected signals x R,k = P R x k and x L,k = P L x k are of length N/2. Each part holds exactly half the signal. Critical sampling! We can use these projections to remove redundancy at the overlap.

19 Combination of projection and windowing Perfect reconstruction works as long as the Princen-Bradley condition holds P T L P L + P T R P R = I. If we combine projection into symmetric and antisymmetric parts as P L = P L W L and P R = P R W R, where W L and W R are the windowing functions, then Princen-Bradley still holds. This is a special case windowing cannot be combined with all projections, but for the symmetric and antisymmetric parts it does work. We can use P L and P R as a projection into two parts. Smooth transitions between windows. Critical sampling.

20 Projections and time-domain aliasing cancellation 1 (a) x -1 1 (b) W R H P R H P R W R x -1 1 (c) W L H P L H P L W L x -1

21 Time-frequency transforms We have above obtained a critically sampled representation x of the signal such that transitions between windows are smooth. Next we want to transform the representation to the frequency domain y = Dx with a transform D. For a real-valued spectral representation we could use the ordinary discrete cosine transform of type II, that is, DCT-II. A benefit of real-valued transforms are that they are simpler to process than complex values. The signal is the represented as a weighted sum of basis functions y k. The reconstruction of individual basis functions is [ ] T P x k = L D P 1 y k. R

22 Time-frequency transforms DCT-II Basis functions of DCT-II and their reconstructions. (a) (b) -1

23 Time-frequency transforms DCT-II The reconstructed basis functions have strange shapes. some have discontinuities and some have odd corners. Clearly the combination of DCT-II and TDAC does not work too well. Extensions/reconstructions of basis functions do not corresponds to physical frequency-elements. Let s try DCT-IV instead!

24 Time-frequency transforms DCT-IV Basis functions of DCT-IV and their (windowed) reconstructions. (a) (b) -1 1 (c) -1

25 Time-frequency transforms MDCT The extended/reconstructed basis functions have the following properties The left and right parts are symmetric and antisymmetric Each extended basis function corresponds perfectly to a sinusoid. The extension is actually a discrete cosine transform, which we wall call the modified discrete cosine transform (MDCT) X k = 2N 1 n= [ ( π x n cos n + 1 N 2 + N ) ( k + 1 )]. 2 2 Note! The MDCT takes 2N samples as input and gives N frequencies as output, but since each window has N samples in common with the previous window, for every N new samples, we get N frequencies Critical sampling.

26 Time-frequency transforms MDCT Magnitude Time Magnitude (db) Frequency bin Frequency bin Frequency bin

27 Time-frequency transforms MDCT The MDCT thus has all the desired properties; smooth transitions, critical sampling and well-defined frequency components. The MDCT is the most commonly used time-frequency transform in audio coding. Used in AAC, USAC, EVS etc. Prof. Edler was a central developer of MDCT. The only notable drawback with MDCT is that it is a real-valued transform. If a signal perfectly aligns with a basis function in one frame, then it will be perfectly orthogonal (=) to the basis function in the next frame. Amplitudes of MDCT components have much larger variance than the original signal. Physical interpretation of amplitudes is difficult/inefficient.

28 Time-frequency transforms We now have a frequency-representation of a window of the signal. Next step is to quantize and code the signal. To obtain perceptually uniform quantization noise, we scale the spectral components X k with the perceptual envelope W k. We thus quantize X k /W k. The quantized signal is then multiplied with W k to return to the original domain.

29 Time-frequency transforms MDCT Magnitude (db) Magnitude (db) Magnitude (db) (a) (b) (c) Frequency bin X k W k X k /W k X' k /W k X k X' k

30 Basics of entropy coding We now have a quantized spectrum and our objective is to transmit that spectrum with the lowest number of bits. This is known as lossless coding. Compression such that the original (quantized) spectrum can be exactly recovered. Each quantization level can be interpreted as a symbol. We have an alphabet of symbols. We need unique identifiers for each symbol in terms of bit-strings.

31 Basics of entropy coding Consider a three-symbol alphabet with symbols a, b and c. We can assign them unique binary strings, 1 and 11. The symbols can be then transmitted with 2 bits/symbol. However, this is already inefficient because in theory, for three symbols we would need only log bits/symbol on average. When constrained to fixed-length strings of bits we cannot do better.

32 Basics of entropy coding If we know from before the occurence-probabilities of each symbol, then we can do better. Consider the following probabilities and bit-strings Symbol Probability Code Bits a.5 1 b c Clearly each symbol has a unique bit-string. From a bit-string 111, we can easily decode bca. The average bit-rate is = 1.5, that is, on average we use 1.5 bits/symbol. This is known as Huffman coding and it works optimally when the probabilities are powers of.5.

33 Arithmetic coding In the general case, when probabilities are arbitrary numbers, we can use arithmetic coding. Consider the following alphabet and the corresponding probabilities Symbol Probability Interval a b c d e Here each symbol is mapped to a unique interval in [, 1].

34 Arithmetic coding a b c d e

35 Arithmetic coding If we then want to encode a sequence of symbols, for example, adc. The first symbol then lands into the interval....41, which we call the remaining range. The intervals for the second symbol are then mapped into the remaining range. The second symbol then lands into the interval The intervals for the third symbol are then mapped into the remaining range. The last symbol then lands into the interval To transmit the sequence of symbols adc is then equivalent with transmitting a binary code which uniquely identifies the interval

36 Arithmetic coding a b c d e a b c d e a b c d e

37 Arithmetic coding The average bit-consumption is then b = p(s) log 2 p(s). s Symbols In the above example the bit-consumption is then 2.12 bits/symbol.

38 Arithmetic coding To encode an interval we use a binary representation of decimal numbers. Let correspond to the interval....5 and 1 to The string 1 then corresponds to the interval and 11 to The string 1 then corresponds to the interval and 11 to etc. When we have a bit-string whose range is inside the desired range, then we are finished.

39 Arithmetic coding Encoding of the range : Final bit-string is 111. Decoder performs then decodes the intervals from the bit-string and maps them to symbols

40 Arithmetic coding Arithmetic coding is thus a form of entropy coding which takes an alphabet (quantization levels) and their occurence-probabilities, and encodes a sequence of symbols with the optimally low number of bits. In a practical system, we do not want to use a fixed alphabet-probability combination, but model the signal. We can then either use preceeding coefficients (known as the context) to predict the probability model of the current coefficient (USAC and EVS), or model the probabilities with, say, a Laplacian distribution and deduce the variance of samples from the spectral envelope shape (EVS).

41 Integration with CELP Above we have presented principles of frequency domain coding for speech codecs. For integration to a practical codec we need methods for switching between time- and frequency-domain coding. The windowing paradigm based on MDCT fits poorly with the filter-based windows of CELP. Must use engineering solutions aka hacks, to switch windowing concept. The characteristic distortions of time- and frequency-domain codecs are very different. Switching in the middle of a phoneme can become easily audible, because the character of artifacts change, even if absolute perceptual quality remains constant. Allow switching only at phoneme borders (requires advanced signal analysis).

42 Summary of frequency domain coding Frequency domain coding is effective for stationary signals such as music, background noises, and stationary segments of speech. Transform coded excitation (TCX) is a family of frequency-domain coding methods which use linear prediction as a model of the spectral envelope. Modern frequency-domain codecs are based on MDCT, which provides smooth transitions between windows, critical sampling and a physically well-defined transform. The spectrum is weighted with perceptual model to limit perceptual effect of quantization noise. Frequency components are encoded with an entropy codec to reduce bit-rate.

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site DOCUMENT Anup Basu Audio Image Video Data Graphics Objectives Compression Encryption Network Communications Decryption Decompression Client site Presentation of Information to client site Multimedia -

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals

Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals Advanced Digital Signal Processing Part 2: Digital Processing of Continuous-Time Signals Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical Engineering

More information

Signal Characteristics

Signal Characteristics Data Transmission The successful transmission of data depends upon two factors:» The quality of the transmission signal» The characteristics of the transmission medium Some type of transmission medium

More information

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 ECE 556 BASICS OF DIGITAL SPEECH PROCESSING Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2 Analog Sound to Digital Sound Characteristics of Sound Amplitude Wavelength (w) Frequency ( ) Timbre

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Chapter 9 Image Compression Standards

Chapter 9 Image Compression Standards Chapter 9 Image Compression Standards 9.1 The JPEG Standard 9.2 The JPEG2000 Standard 9.3 The JPEG-LS Standard 1IT342 Image Compression Standards The image standard specifies the codec, which defines how

More information

Copyright S. K. Mitra

Copyright S. K. Mitra 1 In many applications, a discrete-time signal x[n] is split into a number of subband signals by means of an analysis filter bank The subband signals are then processed Finally, the processed subband signals

More information

ITM 1010 Computer and Communication Technologies

ITM 1010 Computer and Communication Technologies ITM 1010 Computer and Communication Technologies Lecture #20 Review: Communication Technologies 2003 香港中文大學, 電子工程學系 (Prof. H.K.Tsang) ITM 1010 計算機與通訊技術 1 Review of Communication Technologies! Information

More information

Module 6 STILL IMAGE COMPRESSION STANDARDS

Module 6 STILL IMAGE COMPRESSION STANDARDS Module 6 STILL IMAGE COMPRESSION STANDARDS Lesson 16 Still Image Compression Standards: JBIG and JPEG Instructional Objectives At the end of this lesson, the students should be able to: 1. Explain the

More information

Images with (a) coding redundancy; (b) spatial redundancy; (c) irrelevant information

Images with (a) coding redundancy; (b) spatial redundancy; (c) irrelevant information Images with (a) coding redundancy; (b) spatial redundancy; (c) irrelevant information 1992 2008 R. C. Gonzalez & R. E. Woods For the image in Fig. 8.1(a): 1992 2008 R. C. Gonzalez & R. E. Woods Measuring

More information

Problem Sheet 1 Probability, random processes, and noise

Problem Sheet 1 Probability, random processes, and noise Problem Sheet 1 Probability, random processes, and noise 1. If F X (x) is the distribution function of a random variable X and x 1 x 2, show that F X (x 1 ) F X (x 2 ). 2. Use the definition of the cumulative

More information

Communications IB Paper 6 Handout 3: Digitisation and Digital Signals

Communications IB Paper 6 Handout 3: Digitisation and Digital Signals Communications IB Paper 6 Handout 3: Digitisation and Digital Signals Jossy Sayir Signal Processing and Communications Lab Department of Engineering University of Cambridge jossy.sayir@eng.cam.ac.uk Lent

More information

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Chapter IV THEORY OF CELP CODING

Chapter IV THEORY OF CELP CODING Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,

More information

Chapter 3 Data Transmission COSC 3213 Summer 2003

Chapter 3 Data Transmission COSC 3213 Summer 2003 Chapter 3 Data Transmission COSC 3213 Summer 2003 Courtesy of Prof. Amir Asif Definitions 1. Recall that the lowest layer in OSI is the physical layer. The physical layer deals with the transfer of raw

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Audio Coding based on Integer Transforms

Audio Coding based on Integer Transforms Audio Coding based on Integer Transforms Ralf Geiger, Thomas Sporer, Jürgen Koller, Karlheinz Brandenburg / Fraunhofer Institut für Integrierte Schaltungen, Arbeitsgruppe für Elektronische Medientechnologie

More information

Introduction to Source Coding

Introduction to Source Coding Comm. 52: Communication Theory Lecture 7 Introduction to Source Coding - Requirements of source codes - Huffman Code Length Fixed Length Variable Length Source Code Properties Uniquely Decodable allow

More information

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University.

United Codec. 1. Motivation/Background. 2. Overview. Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University. United Codec Mofei Zhu, Hugo Guo, Deepak Music 422 Winter 09 Stanford University March 13, 2009 1. Motivation/Background The goal of this project is to build a perceptual audio coder for reducing the data

More information

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT Filter Banks I Prof. Dr. Gerald Schuller Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany 1 Structure of perceptual Audio Coders Encoder Decoder 2 Filter Banks essential element of most

More information

Entropy, Coding and Data Compression

Entropy, Coding and Data Compression Entropy, Coding and Data Compression Data vs. Information yes, not, yes, yes, not not In ASCII, each item is 3 8 = 24 bits of data But if the only possible answers are yes and not, there is only one bit

More information

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Evaluation of Audio Compression Artifacts M. Herrera Martinez Evaluation of Audio Compression Artifacts M. Herrera Martinez This paper deals with subjective evaluation of audio-coding systems. From this evaluation, it is found that, depending on the type of signal

More information

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D. The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D. Home The Book by Chapters About the Book Steven W. Smith Blog Contact Book Search Download this chapter in PDF

More information

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27)

ECEn 665: Antennas and Propagation for Wireless Communications 131. s(t) = A c [1 + αm(t)] cos (ω c t) (9.27) ECEn 665: Antennas and Propagation for Wireless Communications 131 9. Modulation Modulation is a way to vary the amplitude and phase of a sinusoidal carrier waveform in order to transmit information. When

More information

Telecommunication Electronics

Telecommunication Electronics Politecnico di Torino ICT School Telecommunication Electronics C5 - Special A/D converters» Logarithmic conversion» Approximation, A and µ laws» Differential converters» Oversampling, noise shaping Logarithmic

More information

SAMPLING THEORY. Representing continuous signals with discrete numbers

SAMPLING THEORY. Representing continuous signals with discrete numbers SAMPLING THEORY Representing continuous signals with discrete numbers Roger B. Dannenberg Professor of Computer Science, Art, and Music Carnegie Mellon University ICM Week 3 Copyright 2002-2013 by Roger

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Fundamentals of Digital Communication

Fundamentals of Digital Communication Fundamentals of Digital Communication Network Infrastructures A.A. 2017/18 Digital communication system Analog Digital Input Signal Analog/ Digital Low Pass Filter Sampler Quantizer Source Encoder Channel

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 13: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 22 th, 2015 1 o Source Code Generation Lecture Outlines Source Coding

More information

MULTIMEDIA SYSTEMS

MULTIMEDIA SYSTEMS 1 Department of Computer Engineering, Faculty of Engineering King Mongkut s Institute of Technology Ladkrabang 01076531 MULTIMEDIA SYSTEMS Pk Pakorn Watanachaturaporn, Wt ht Ph.D. PhD pakorn@live.kmitl.ac.th,

More information

Timbral Distortion in Inverse FFT Synthesis

Timbral Distortion in Inverse FFT Synthesis Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials

More information

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder

Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder Golomb-Rice Coding Optimized via LPC for Frequency Domain Audio Coder Ryosue Sugiura, Yutaa Kamamoto, Noboru Harada, Hiroazu Kameoa and Taehiro Moriya Graduate School of Information Science and Technology,

More information

Chapter 2: Digitization of Sound

Chapter 2: Digitization of Sound Chapter 2: Digitization of Sound Acoustics pressure waves are converted to electrical signals by use of a microphone. The output signal from the microphone is an analog signal, i.e., a continuous-valued

More information

Pulse Code Modulation

Pulse Code Modulation Pulse Code Modulation Modulation is the process of varying one or more parameters of a carrier signal in accordance with the instantaneous values of the message signal. The message signal is the signal

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2016 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Part 05 Pulse Code

More information

Wavelet-based image compression

Wavelet-based image compression Institut Mines-Telecom Wavelet-based image compression Marco Cagnazzo Multimedia Compression Outline Introduction Discrete wavelet transform and multiresolution analysis Filter banks and DWT Multiresolution

More information

Digital Audio. Lecture-6

Digital Audio. Lecture-6 Digital Audio Lecture-6 Topics today Digitization of sound PCM Lossless predictive coding 2 Sound Sound is a pressure wave, taking continuous values Increase / decrease in pressure can be measured in amplitude,

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued CSCD 433 Network Programming Fall 2016 Lecture 5 Physical Layer Continued 1 Topics Definitions Analog Transmission of Digital Data Digital Transmission of Analog Data Multiplexing 2 Different Types of

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

10 Speech and Audio Signals

10 Speech and Audio Signals 0 Speech and Audio Signals Introduction Speech and audio signals are normally converted into PCM, which can be stored or transmitted as a PCM code, or compressed to reduce the number of bits used to code

More information

TRANSFORMS / WAVELETS

TRANSFORMS / WAVELETS RANSFORMS / WAVELES ransform Analysis Signal processing using a transform analysis for calculations is a technique used to simplify or accelerate problem solution. For example, instead of dividing two

More information

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures

SNR Scalability, Multiple Descriptions, and Perceptual Distortion Measures SNR Scalability, Multiple Descriptions, Perceptual Distortion Measures Jerry D. Gibson Department of Electrical & Computer Engineering University of California, Santa Barbara gibson@mat.ucsb.edu Abstract

More information

Sampling and Pulse Code Modulation Chapter 6

Sampling and Pulse Code Modulation Chapter 6 Sampling and Pulse Code Modulation Chapter 6 Dr. Yun Q. Shi Dept of Electrical & Computer Engineering New Jersey Institute of Technology shi@njit.edu Sampling Theorem A Signal is said to be band-limited

More information

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission: Data Transmission The successful transmission of data depends upon two factors: The quality of the transmission signal The characteristics of the transmission medium Some type of transmission medium is

More information

Das, Sneha; Bäckström, Tom Postfiltering with Complex Spectral Correlations for Speech and Audio Coding

Das, Sneha; Bäckström, Tom Postfiltering with Complex Spectral Correlations for Speech and Audio Coding Powered by TCPDF (www.tcpdf.org) This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. Das, Sneha; Bäckström, Tom Postfiltering

More information

Hybrid Coding (JPEG) Image Color Transform Preparation

Hybrid Coding (JPEG) Image Color Transform Preparation Hybrid Coding (JPEG) 5/31/2007 Kompressionsverfahren: JPEG 1 Image Color Transform Preparation Example 4: 2: 2 YUV, 4: 1: 1 YUV, and YUV9 Coding Luminance (Y): brightness sampling frequency 13.5 MHz Chrominance

More information

Audio and Speech Compression Using DCT and DWT Techniques

Audio and Speech Compression Using DCT and DWT Techniques Audio and Speech Compression Using DCT and DWT Techniques M. V. Patil 1, Apoorva Gupta 2, Ankita Varma 3, Shikhar Salil 4 Asst. Professor, Dept.of Elex, Bharati Vidyapeeth Univ.Coll.of Engg, Pune, Maharashtra,

More information

EEE 309 Communication Theory

EEE 309 Communication Theory EEE 309 Communication Theory Semester: January 2017 Dr. Md. Farhad Hossain Associate Professor Department of EEE, BUET Email: mfarhadhossain@eee.buet.ac.bd Office: ECE 331, ECE Building Types of Modulation

More information

Analog-Digital Interface

Analog-Digital Interface Analog-Digital Interface Tuesday 24 November 15 Summary Previous Class Dependability Today: Redundancy Error Correcting Codes Analog-Digital Interface Converters, Sensors / Actuators Sampling DSP Frequency

More information

Audio Signal Performance Analysis using Integer MDCT Algorithm

Audio Signal Performance Analysis using Integer MDCT Algorithm Audio Signal Performance Analysis using Integer MDCT Algorithm M.Davidson Kamala Dhas 1, R.Priyadharsini 2 1 Assistant Professor, Department of Electronics and Communication Engineering, Mepco Schelnk

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution 2.1. General Purpose There are many popular general purpose lossless compression techniques, that can be applied to any type of data. 2.1.1. Run Length Encoding Run Length Encoding is a compression technique

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

Communications I (ELCN 306)

Communications I (ELCN 306) Communications I (ELCN 306) c Samy S. Soliman Electronics and Electrical Communications Engineering Department Cairo University, Egypt Email: samy.soliman@cu.edu.eg Website: http://scholar.cu.edu.eg/samysoliman

More information

Analog and Telecommunication Electronics

Analog and Telecommunication Electronics Politecnico di Torino - ICT School Analog and Telecommunication Electronics D5 - Special A/D converters» Differential converters» Oversampling, noise shaping» Logarithmic conversion» Approximation, A and

More information

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR

LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 1 LECTURE VI: LOSSLESS COMPRESSION ALGORITHMS DR. OUIEM BCHIR 2 STORAGE SPACE Uncompressed graphics, audio, and video data require substantial storage capacity. Storing uncompressed video is not possible

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

QUESTION BANK. SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2

QUESTION BANK. SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2 QUESTION BANK DEPARTMENT: ECE SEMESTER: V SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2 BASEBAND FORMATTING TECHNIQUES 1. Why prefilterring done before sampling [AUC NOV/DEC 2010] The signal

More information

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK Subject Name: Year /Sem: II / IV UNIT I INFORMATION ENTROPY FUNDAMENTALS PART A (2 MARKS) 1. What is uncertainty? 2. What is prefix coding? 3. State the

More information

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2 Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2 The Fourier transform of single pulse is the sinc function. EE 442 Signal Preliminaries 1 Communication Systems and

More information

Digital Signal Processing

Digital Signal Processing Digital Signal Processing Fourth Edition John G. Proakis Department of Electrical and Computer Engineering Northeastern University Boston, Massachusetts Dimitris G. Manolakis MIT Lincoln Laboratory Lexington,

More information

Discrete Fourier Transform (DFT)

Discrete Fourier Transform (DFT) Amplitude Amplitude Discrete Fourier Transform (DFT) DFT transforms the time domain signal samples to the frequency domain components. DFT Signal Spectrum Time Frequency DFT is often used to do frequency

More information

Continuous vs. Discrete signals. Sampling. Analog to Digital Conversion. CMPT 368: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals

Continuous vs. Discrete signals. Sampling. Analog to Digital Conversion. CMPT 368: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals Continuous vs. Discrete signals CMPT 368: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 22,

More information

PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual.

PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual. Lab. #1 Signal Processing & Spectral Analysis Name: Date: Section / Group: NOTE: To help you correctly answer many of the following questions, it may be useful to actually run the cases outlined in the

More information

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2 Data and Signals - Theoretical Concepts! What are the major functions of the network access layer? Reference: Chapter 3 - Stallings Chapter 3 - Forouzan Study Guide 3 1 2! What are the major functions

More information

Chapter 2: Signal Representation

Chapter 2: Signal Representation Chapter 2: Signal Representation Aveek Dutta Assistant Professor Department of Electrical and Computer Engineering University at Albany Spring 2018 Images and equations adopted from: Digital Communications

More information

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology Course Presentation Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology Data Compression Motivation Data storage and transmission cost money Use fewest number of

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

Solutions to Information Theory Exercise Problems 5 8

Solutions to Information Theory Exercise Problems 5 8 Solutions to Information Theory Exercise roblems 5 8 Exercise 5 a) n error-correcting 7/4) Hamming code combines four data bits b 3, b 5, b 6, b 7 with three error-correcting bits: b 1 = b 3 b 5 b 7, b

More information

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes

Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes Non-Uniform Speech/Audio Coding Exploiting Predictability of Temporal Evolution of Spectral Envelopes Petr Motlicek 12, Hynek Hermansky 123, Sriram Ganapathy 13, and Harinath Garudadri 4 1 IDIAP Research

More information

A Brief Introduction to Information Theory and Lossless Coding

A Brief Introduction to Information Theory and Lossless Coding A Brief Introduction to Information Theory and Lossless Coding 1 INTRODUCTION This document is intended as a guide to students studying 4C8 who have had no prior exposure to information theory. All of

More information

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued

CSCD 433 Network Programming Fall Lecture 5 Physical Layer Continued CSCD 433 Network Programming Fall 2016 Lecture 5 Physical Layer Continued 1 Topics Definitions Analog Transmission of Digital Data Digital Transmission of Analog Data Multiplexing 2 Different Types of

More information

EE390 Final Exam Fall Term 2002 Friday, December 13, 2002

EE390 Final Exam Fall Term 2002 Friday, December 13, 2002 Name Page 1 of 11 EE390 Final Exam Fall Term 2002 Friday, December 13, 2002 Notes 1. This is a 2 hour exam, starting at 9:00 am and ending at 11:00 am. The exam is worth a total of 50 marks, broken down

More information

Speech Compression Using Wavelet Transform

Speech Compression Using Wavelet Transform IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 19, Issue 3, Ver. VI (May - June 2017), PP 33-41 www.iosrjournals.org Speech Compression Using Wavelet Transform

More information

CT111 Introduction to Communication Systems Lecture 9: Digital Communications

CT111 Introduction to Communication Systems Lecture 9: Digital Communications CT111 Introduction to Communication Systems Lecture 9: Digital Communications Yash M. Vasavada Associate Professor, DA-IICT, Gandhinagar 31st January 2018 Yash M. Vasavada (DA-IICT) CT111: Intro to Comm.

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

DCSP-3: Minimal Length Coding. Jianfeng Feng

DCSP-3: Minimal Length Coding. Jianfeng Feng DCSP-3: Minimal Length Coding Jianfeng Feng Department of Computer Science Warwick Univ., UK Jianfeng.feng@warwick.ac.uk http://www.dcs.warwick.ac.uk/~feng/dcsp.html Automatic Image Caption (better than

More information

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains:

Module 8: Video Coding Basics Lecture 40: Need for video coding, Elements of information theory, Lossless coding. The Lecture Contains: The Lecture Contains: The Need for Video Coding Elements of a Video Coding System Elements of Information Theory Symbol Encoding Run-Length Encoding Entropy Encoding file:///d /...Ganesh%20Rana)/MY%20COURSE_Ganesh%20Rana/Prof.%20Sumana%20Gupta/FINAL%20DVSP/lecture%2040/40_1.htm[12/31/2015

More information

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec Send Orders for Reprints to reprints@benthamscience.ae The Open Electrical & Electronic Engineering Journal, 2014, 8, 527-535 527 Open Access Improved Frame Error Concealment Algorithm Based on Transform-

More information

6.02 Practice Problems: Modulation & Demodulation

6.02 Practice Problems: Modulation & Demodulation 1 of 12 6.02 Practice Problems: Modulation & Demodulation Problem 1. Here's our "standard" modulation-demodulation system diagram: at the transmitter, signal x[n] is modulated by signal mod[n] and the

More information

Lecture5: Lossless Compression Techniques

Lecture5: Lossless Compression Techniques Fixed to fixed mapping: we encoded source symbols of fixed length into fixed length code sequences Fixed to variable mapping: we encoded source symbols of fixed length into variable length code sequences

More information

END-OF-YEAR EXAMINATIONS ELEC321 Communication Systems (D2) Tuesday, 22 November 2005, 9:20 a.m. Three hours plus 10 minutes reading time.

END-OF-YEAR EXAMINATIONS ELEC321 Communication Systems (D2) Tuesday, 22 November 2005, 9:20 a.m. Three hours plus 10 minutes reading time. END-OF-YEAR EXAMINATIONS 2005 Unit: Day and Time: Time Allowed: ELEC321 Communication Systems (D2) Tuesday, 22 November 2005, 9:20 a.m. Three hours plus 10 minutes reading time. Total Number of Questions:

More information

Lecture 3 Review of Signals and Systems: Part 2. EE4900/EE6720 Digital Communications

Lecture 3 Review of Signals and Systems: Part 2. EE4900/EE6720 Digital Communications EE4900/EE6720: Digital Communications 1 Lecture 3 Review of Signals and Systems: Part 2 Block Diagrams of Communication System Digital Communication System 2 Informatio n (sound, video, text, data, ) Transducer

More information

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005 Lecture 5 Slides Jan 26 th, 2005 Outline of Today s Lecture Announcements Filter-bank analysis

More information

Pulse Code Modulation (PCM)

Pulse Code Modulation (PCM) Project Title: e-laboratories for Physics and Engineering Education Tempus Project: contract # 517102-TEMPUS-1-2011-1-SE-TEMPUS-JPCR 1. Experiment Category: Electrical Engineering >> Communications 2.

More information

FPGA implementation of DWT for Audio Watermarking Application

FPGA implementation of DWT for Audio Watermarking Application FPGA implementation of DWT for Audio Watermarking Application Naveen.S.Hampannavar 1, Sajeevan Joseph 2, C.B.Bidhul 3, Arunachalam V 4 1, 2, 3 M.Tech VLSI Students, 4 Assistant Professor Selection Grade

More information

Understanding Digital Signal Processing

Understanding Digital Signal Processing Understanding Digital Signal Processing Richard G. Lyons PRENTICE HALL PTR PRENTICE HALL Professional Technical Reference Upper Saddle River, New Jersey 07458 www.photr,com Contents Preface xi 1 DISCRETE

More information

8.3 Basic Parameters for Audio

8.3 Basic Parameters for Audio 8.3 Basic Parameters for Audio Analysis Physical audio signal: simple one-dimensional amplitude = loudness frequency = pitch Psycho-acoustic features: complex A real-life tone arises from a complex superposition

More information

CMPT 318: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals

CMPT 318: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals CMPT 318: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 16, 2006 1 Continuous vs. Discrete

More information

Lecture 3 Concepts for the Data Communications and Computer Interconnection

Lecture 3 Concepts for the Data Communications and Computer Interconnection Lecture 3 Concepts for the Data Communications and Computer Interconnection Aim: overview of existing methods and techniques Terms used: -Data entities conveying meaning (of information) -Signals data

More information

CHAPTER 4. PULSE MODULATION Part 2

CHAPTER 4. PULSE MODULATION Part 2 CHAPTER 4 PULSE MODULATION Part 2 Pulse Modulation Analog pulse modulation: Sampling, i.e., information is transmitted only at discrete time instants. e.g. PAM, PPM and PDM Digital pulse modulation: Sampling

More information

Signal Processing Toolbox

Signal Processing Toolbox Signal Processing Toolbox Perform signal processing, analysis, and algorithm development Signal Processing Toolbox provides industry-standard algorithms for analog and digital signal processing (DSP).

More information

Templates and Image Pyramids

Templates and Image Pyramids Templates and Image Pyramids 09/07/17 Computational Photography Derek Hoiem, University of Illinois Why does a lower resolution image still make sense to us? What do we lose? Image: http://www.flickr.com/photos/igorms/136916757/

More information

CHAPTER. delta-sigma modulators 1.0

CHAPTER. delta-sigma modulators 1.0 CHAPTER 1 CHAPTER Conventional delta-sigma modulators 1.0 This Chapter presents the traditional first- and second-order DSM. The main sources for non-ideal operation are described together with some commonly

More information