Digital Speech Processing and Coding

Similar documents
-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

EE482: Digital Signal Processing Applications

ENEE408G Multimedia Signal Processing

Overview of Code Excited Linear Predictive Coder

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

APPLICATIONS OF DSP OBJECTIVES

Enhanced Waveform Interpolative Coding at 4 kbps

Communications Theory and Engineering

The Channel Vocoder (analyzer):

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Synthesis; Pitch Detection and Vocoders

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Linguistic Phonetics. Spectral Analysis

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

Speech Synthesis using Mel-Cepstral Coefficient Feature

L19: Prosodic modification of speech

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

Audio Signal Compression using DCT and LPC Techniques

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Compression Using Voice Excited Linear Predictive Coding

Comparison of CELP speech coder with a wavelet method

Chapter IV THEORY OF CELP CODING

Speech Enhancement using Wiener filtering

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

Converting Speaking Voice into Singing Voice

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Digital Audio. Lecture-6

Analysis/synthesis coding

Page 0 of 23. MELP Vocoder

Voice Excited Lpc for Speech Compression by V/Uv Classification

EC 2301 Digital communication Question bank

Chapter 9 Image Compression Standards

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Adaptive time scale modification of speech for graceful degrading voice quality in congested networks

Audio processing methods on marine mammal vocalizations

Compression. Encryption. Decryption. Decompression. Presentation of Information to client site

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Voice Transmission --Basic Concepts--

Multimedia Systems Entropy Coding Mahdi Amiri February 2011 Sharif University of Technology

Distributed Speech Recognition Standardization Activity

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Advanced audio analysis. Martin Gasser

Fundamentals of Digital Communication

EEE 309 Communication Theory

Pitch Period of Speech Signals Preface, Determination and Transformation

Wideband Speech Coding & Its Application

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

A 600 BPS MELP VOCODER FOR USE ON HF CHANNELS

Speech Coding in the Frequency Domain

Pulse Code Modulation

Waveform Encoding - PCM. BY: Dr.AHMED ALKHAYYAT. Chapter Two

Digital Signal Processing

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

SPEECH AND SPECTRAL ANALYSIS

EEE 309 Communication Theory

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

ON-LINE LABORATORIES FOR SPEECH AND IMAGE PROCESSING AND FOR COMMUNICATION SYSTEMS USING J-DSP

Overview of Signal Processing

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

COMP 546, Winter 2017 lecture 20 - sound 2

Analog and Telecommunication Electronics

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Speech Coding using Linear Prediction

TCET3202 Analog and digital Communications II

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Telecommunication Electronics

Compression and Image Formats

Waveform Coding Algorithms: An Overview

10 Speech and Audio Signals

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

Audio and Speech Compression Using DCT and DWT Techniques

A SURVEY ON DICOM IMAGE COMPRESSION AND DECOMPRESSION TECHNIQUES

Voice mail and office automation

Overview of Digital Signal Processing

Ch. 3: Image Compression Multimedia Systems

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

Digital Communication (650533) CH 3 Pulse Modulation

Implementation of attractive Speech Quality for Mixed Excited Linear Prediction

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

2.1. General Purpose Run Length Encoding Relative Encoding Tokanization or Pattern Substitution

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

CHAPTER 3 Syllabus (2006 scheme syllabus) Differential pulse code modulation DPCM transmitter

Syllabus. osmania university UNIT - I UNIT - II UNIT - III CHAPTER - 1 : INTRODUCTION TO DIGITAL COMMUNICATION CHAPTER - 3 : INFORMATION THEORY

Components for Signal Compression

ITM 1010 Computer and Communication Technologies

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

ECE/OPTI533 Digital Image Processing class notes 288 Dr. Robert A. Schowengerdt 2003

Synthesis Algorithms and Validation

Auditory modelling for speech processing in the perceptual domain

Multimedia Communications. Lossless Image Compression

Transcription:

ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/ sas@eng.umd.edu ENEE408G Capstone -- Multimedia Signal Processing (S'03)

Last Lecture UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Course overview and logistics Bring multimedia to digital world: sampling & quantization Introduction to speech processing Different aspects of speech Friday Lab Session Warm-up with Matlab and PocketPC Today: speech processing and coding ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [2]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Speech Production ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [3]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Visualize Speech via Spectrogram Short-time Fourier Transform (STFT) of the windowed speech waveform is X ( ω, τ ) = x[ n, τ ]exp[ jω n] A window function is applied to the speech signal Spectrogram is Visualized using different brightness in a 2-D time-frequency plot S( ω, τ) = X( ω, τ) 2 x [ n, τ ] = w[ n, τ ] x[ n] ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [4]

Example of Speech Spectrogram Figure 3 of SPM May 98 Speech Survey ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [5]

Play Speech Figure 1 of SPM May 98 Speech Survey UMCP ENEE408G Slides (created by M.Wu 2003) First speaking machine in 1930s NY World s Fair 14 keys, 1 wristband, 1 pedal Modeling speech production as a linear system Sound sources Either voiced or unvoiced Voice sound Modeled by a generator of pulses Unvoiced sound Modeled by white noise generator Articulation Modeled by a cascade of singleresonance (pole) digital filters ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [6]

UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Linear Separable Model for Speech Production Vocal tract is modeled as a linear time-varying system Parameters of the linear system are slowly varying Excited by time-varying source (voiced or unvoiced) Practical models Model each speech frame as Linear Time-Invariant Excited by either voiced or unvoiced source Allow overlaps in neighbouring frames Figure 3.2 of Furui s book ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [7]

Pitch UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Pitch period: time duration of one glottal cycle (of glottis) Pitch: reciprocal of pitch period Also known as the fundamental frequency Vowel typically has one to four pitch periods Pitch ranges from about 60Hz to 400 Hz Typical pitch range of female pitch is higher than male pitch ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [8]

Pitch of Male and Female Speakers UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Distribution, mean, and standard deviation of male and female pitch Figure 2.12 & 2.11 of Furui s book ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [9]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Formant Figure 2.5 & 3.5 of Furui s book Formants: the resonance frequencies of the vocal tract Vowels typically have three formants: F1, F2, F3 Formant changes when vocal tract changes to produce different sound ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [10]

Formants of Vowels Figure 2.6 & 2.7 of Furui s book UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [11]

Speech Coding ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [12]

PCM coding UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) I(x,y) Sampler Quantizer Encoder Input signal digitize/capture device How to encode a digital image into bits? Sampling and perform uniform quantization Pulse Coded Modulation (PCM) 8 bits per sample ~ good for speech 16 bits ~ needed for high-quality music Tradeoff between fidelity and file size How to squeeze out redundancy? transmit ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [13]

Discussion on Improving PCM (1) UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Uniform quantization may give inconsistent range of relative amount errors E.g., +/- 2 incurs 20% vs. 2% at amplitude 10 and 100 Also is not optimal to non-uniformly distributed sources Non-uniform quantization Assign smaller quantization step size at small amplitude to maintain consistent range of relative quantization errors over the entire dynamic range Can apply non-linear transform before uniform quantization via companding (compression-expansion) μ-law companding: international standard for 64kbps speech ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [14]

Discussion on Improving PCM (2) UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Quantized PCM values may not be equally likely Can we do better than encode each value using same # bits? Example P( 0 ) = 0.5, P( 1 ) = 0.25, P( 2 ) = 0.125, P( 3 ) = 0.125 If use same # bits for all values Need 2 bits to represent the four possibilities if treat If use less bits for likely values 0 ~ Variable Length Codes (VLC) 0 => [0], 1 => [10], 2 => [110], 3 => [111] Use 1.75 bits on average ~ saves 0.25 bit per sample! Bring probability into the picture Use probability distribution to reduce average # bits per quantized sample ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [15]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Entropy coding Idea: use less bits for commonly seen values Examples: Huffman coding (used in JPEG and MPEG) Lemple-Ziv coding (used in Unix) How many # bits needed? Compressability depends on the source s characteristics Limit of compression => Entropy Measures the uncertainty or amount of avg. information of a source ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [16]

List of Compression Tools UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Lossless encoding tools Entropy coding: Huffman, Lemple-Ziv, and others (Arithmetic coding) [ Run-length coding ] Lossy tools for reducing redundancy Quantization and truncations Predictive coding Encode prediction parameters and residues with less bits Subband and Transform coding Encode different frequency bands Transform into a domain with improved energy compaction ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [17]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) How to Encode Correlated Sequence? Consider: high correlation between successive samples Predictive coding Basic principle: Remove redundancy between successive pixels and only encode residual between actual and predicted Residue usually has much smaller dynamic range Allow fewer quantization levels for the same MSE => get compression Compression efficiency depends on intersample redundancy First try e Q (n) u(n) Predictor _ e(n) Quantizer u P (n) = u(n-1) u Q (n) e Q (n) Encoder + u P (n) = u Q (n-1) Predictor Decoder ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [18]

UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Predictive Coding (cont d) Problem with 1 st try Input to predictor are different at encoder and decoder decoder doesn t know u(n)! Mismatch error could propagate to future reconstructed samples Solution: Differential PCM (DPCM) Use quantized sequence u Q (n) for prediction at both encoder and decoder Prediction error e(n) Quantized prediction error e Q (n) Distortion d(n) = e(n) e Q (n) u(n) e Q (n) + _ + u P (n) = u Q (n-1) e(n) u P (n) =u Q (n-1) Encoder Quantizer u Q (n) Predictor Predictor Decoder Think: what predictor to use? e Q (n) u Q (n) ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [19]

All-Pole Modeling of Speech UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Auto-regressive (AR) model: all-pole filter u[n] H(z) is the overall transfer function Glottal Flow G(z), Vocal Tract V(z), Radiation R(z), Gain β Synthesis process: u[n]: the vocal tract input, s[n]: speech output u[n] β H ( z) = s[n] A( z) β H ( z) = s[n] A( z) ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [20]

Model-based Coding UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Linear Prediction Coder (LPC) LPC Vocoder ( voice coder ) Divide speech into frames (several tens milliseconds) and encode the LPC coefficients of each frame Additional parameters to facilitate synthesis: voiced/unvoiced flag, gain, pitch (for voiced) Line Spectrum Pair (LSP) Coding Hybrid Coding: LPC Residual Coding Between LPC and waveform coding ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [21]

Line Spectrum Pair (LSP) Coding UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Pros and Cons of LPC method Good performance at coding rate down to 2.4kbps Synthesized voice becomes unnatural when below 2.4kbps When the poles are near the unit circle, quantization in LPC coefficients may result in instability. LSP parameters LSP are frequencies extracted from polynomials constructed from LPC coefficients Frequency domain features (similar to formant) => produce less distortion due to quantization [See details in Design Project on Speech] ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [22]

Hybrid Coding UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Hybrid between LPC and waveform coding LPC Residual Coding: encode and slowly update LPC coefficients, and send the LPC residual (e.g. encoded using Vector Quantization) Advantages: Free from quality degradation due to source modeling Low-frequency waveform is exactly reproduced Spectral information of the entire frequency range is preserved No need of pitch period estimation and voiced/unvoiced decision ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [23]

Code-Excited Linear Predictive Coding (CELP) UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Multipulse-Excised Linear Predictive Coding (MPC) Do not distinguish voiced/unvoiced sound explicitly Code-Excited Linear Predictive Coding (CELP) Replace the multi-pulses of MPC with vector-quantized sequences based on long-term prediction of periodicity and short-term prediction Figure 6.32 of Furui s book ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [24]

Speech Coding Methods Table 6.1 of Furui s book UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Waveform coding; Hybrid coding; Analysis-synthesis coding ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [25]

Speech Quality vs. Transmission Rate UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Figure 6.2 of Furui s book ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [26]

Comparison of Different Speech Coding Tech. UMCP ENEE408G Slides (created by R. Liu & M.Wu 2002) Table 6.2 of Furui s book ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [27]

Put Together: A Digital Telephone System UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) 8kHz and 8-bit per sample for telephone speech => 64kbps Anti-aliasing filter before sampling Non-uniform quantization (e.g., through μ-law or A-law companding ~ signal compression-expansion) ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [28]

Summary UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) Speech production and analysis Spectrogram; Pitch, Formant Linear prediction model Speech coding Basic compression tools This week s Lab session: Design project#1 on Speech Next lecture: speech recognition ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [29]

Assignments UMCP ENEE408G Slides (created by M.Wu & R.Liu 2002) The Past, Present, and Future of Speech Processing IEEE Signal Processing Magazine, May 1998 Hard copy handout -- Read Section I, II, III Preparing for Friday Lab Lab manual will be posted online by Wednesday afternoon Go over Part I and II before coming to lab Already have background to do Part I & II Can play with Part III, IV-1, and V Start working on Part VI (Pocket PC) Team-up and Email Instructor and TA your team information by Thursday 5pm ENEE408G Capstone -- Multimedia Signal Processing (S'03) Lec1 Introduction 2/4/03 [30]