Communications Theory and Engineering

Similar documents
Analysis/synthesis coding

EE482: Digital Signal Processing Applications

APPLICATIONS OF DSP OBJECTIVES

Cellular systems & GSM Wireless Systems, a.a. 2014/2015

SPEECH AND SPECTRAL ANALYSIS

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Digital Speech Processing and Coding

Speech Compression Using Voice Excited Linear Predictive Coding

Speech Synthesis using Mel-Cepstral Coefficient Feature

Linguistic Phonetics. Spectral Analysis

Voice Excited Lpc for Speech Compression by V/Uv Classification

Overview of Code Excited Linear Predictive Coder

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

EE482: Digital Signal Processing Applications

Speech Synthesis; Pitch Detection and Vocoders

Digital Signal Representation of Speech Signal

Comparison of CELP speech coder with a wavelet method

Enhanced Waveform Interpolative Coding at 4 kbps

Basic Characteristics of Speech Signal Analysis

A Physiologically Produced Impulsive UWB signal: Speech

The Channel Vocoder (analyzer):

Robust Linear Prediction Analysis for Low Bit-Rate Speech Coding

Telecommunication Electronics

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

Audio Signal Compression using DCT and LPC Techniques

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Analog and Telecommunication Electronics

General outline of HF digital radiotelephone systems

Voice Transmission --Basic Concepts--

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

EC 2301 Digital communication Question bank

Speech Coding using Linear Prediction

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Adaptive Filters Application of Linear Prediction

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Signal Analysis

Pulse Code Modulation

The source-filter model of speech production"

Auditory modelling for speech processing in the perceptual domain

Pitch Period of Speech Signals Preface, Determination and Transformation

Wideband Speech Coding & Its Application

Speech Coding in the Frequency Domain

Audio Compression using the MLT and SPIHT

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Improving Sound Quality by Bandwidth Extension

Digital Audio. Lecture-6

Speech/Music Change Point Detection using Sonogram and AANN

SGN Audio and Speech Processing

Techniques for low-rate scalable compression of speech signals

Chapter IV THEORY OF CELP CODING

Department of Electronics and Communication Engineering 1

Overview of Signal Processing

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Acoustic Phonetics. Chapter 8

On a Classification of Voiced/Unvoiced by using SNR for Speech Recognition

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

NOVEL PITCH DETECTION ALGORITHM WITH APPLICATION TO SPEECH CODING

ENEE408G Multimedia Signal Processing

Assistant Lecturer Sama S. Samaan

Low Bit Rate Speech Coding

MASTER'S THESIS. Speech Compression and Tone Detection in a Real-Time System. Kristina Berglund. MSc Programmes in Engineering

Problem Sheet 1 Probability, random processes, and noise

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Digital Communication (650533) CH 3 Pulse Modulation

Page 0 of 23. MELP Vocoder

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Overview of Digital Signal Processing

Complex Sounds. Reading: Yost Ch. 4

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Microcomputer Systems 1. Introduction to DSP S

Ap A ril F RRL RRL P ro r gra r m By Dick AH6EZ/W9

QUESTION BANK EC 1351 DIGITAL COMMUNICATION YEAR / SEM : III / VI UNIT I- PULSE MODULATION PART-A (2 Marks) 1. What is the purpose of sample and hold

Chapter 4. Digital Audio Representation CS 3570

1) The modulation technique used for mobile communication systems during world war II was a. Amplitude modulation b. Frequency modulation

ITM 1010 Computer and Communication Technologies

Open Access Improved Frame Error Concealment Algorithm Based on Transform- Domain Mobile Audio Codec

Systems for Audio and Video Broadcasting (part 2 of 2)

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Nonuniform multi level crossing for signal reconstruction

Pulse Code Modulation

Dilpreet Singh 1, Parminder Singh 2 1 M.Tech. Student, 2 Associate Professor

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Advanced audio analysis. Martin Gasser

Voice mail and office automation

Class 4 ((Communication and Computer Networks))

SILK Speech Codec. TDP 10/11 Xavier Anguera I Ciro Gracia

CHAPTER 4. PULSE MODULATION Part 2

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Signal Characteristics

L19: Prosodic modification of speech

Time division multiplexing The block diagram for TDM is illustrated as shown in the figure

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Transcription:

Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019

Speech and telephone speech

Based on a voice production model Parametric representation of speech signals Vocal tract Vocal folds Excitation signal Model match Excitation signal FILTER H(f) FILTER H(f) Speech signal Glottal signal VOCAL TRACT

Parametric representation The idea: the signal can be considered as the output of a system excited by an appropriate excitation signal QUESTION IS:? How should the excitation signal be defined? How should the system be characterized? THE ANSWER IS IN THE STRUCTURE OF THE SPEECH SIGNAL

Parametric representation The speech signal is quasi-stationary: This means that it can be considered as stationary for short-time intervals Typically 10~20 ms 10 ms Time Implying that: the model parameters must be updated every 10-20 ms A segment of duration 10-20 ms is referred to as FRAME This analysis is referred to as short-term analysis

Parametric representation Source model (excitation signal) Two main categories of sounds were identified: Voiced sounds: Vocal folds start vibrating; the corresponding excitation signal is a pulse train with repetition period T: the pitch interval Air from lungs Voiceless sounds: Vocal folds are open, while the vocal tract closes at a specific point, causing the air coming from lungs to create a turbulence at the constriction. The corresponding excitation signal is noise. Air from lungs Vocal cords tight and vibrating The vocal tract narrows Vocal folds are open

Parametric representation Reminder: voiced sounds vs. voiceless sounds The waveform of a voiced sound is almost periodic T T: pitch interval The waveform of a voiceless sound is noise-like

Source model Voiced sounds: pulse train generator, with period T PULSE TRAIN GENERATOR Voiceless sounds: noise generator T time NOISE GENERATOR time A voiced/voiceless detector is thus required in order to select which excitation signal shall be used

Vocal tract model The filter H(f) must be characterized by a transfer function that mimics the action performed by the vocal tract on the excitation signal A LINEAR PREDICTION filter is adopted The parameters of the filter must be updated every 10-20 ms But what is linear prediction?

Vocal tract model The analysis relies on the idea that a sample predicted by previous signal samples: ( ) s n of the signal can be The prediction of s( n) is a LINEAR combination of previous samples s( n i) i =1,..., p that is: ( ) = α k s( n k)!s n p k=1 Linear Prediction of s(n) p is referred to as PREDICTION ORDER

Linear prediction The adopted approach is to determine the coefficients that minimize the difference between sample and the prediction, i.e. minimize the PREDICTION ERROR α k ( ) s! ( n) s n PREDICTION ERROR e( n) = s( n)!s ( n) where BUT ( ) = α k s( n k)!s n p k=1 REMEMBER THAT The analysis is short-term

Linear prediction α k In particular the coefficients can be determined by minimizing the SHORT TERM quadratic error N m=1 ( ) E n = e n 2 m for each analysis window n, where N is the number of samples per window Since a window has a typical duration of 10-20 ms At a 10 khz sampling frequency the corresponding number of samples is 100-200

Linear prediction N N E n = e 2 ( n m) = ( s ( n m)!s ( n m) ) 2 N p = s ( n m) α k s n m k m=1 m=1 m=1 k=1 ( ) 2 FIND THE MINIMUM We are searching for a set of α k such that: E n α i = 0 for i=1,, p Order of prediction

Linear prediction Leading to the the following set of equations: R n YULE-WALKER EQUATIONS R ( n 0) R ( n 1)! R ( n p 1) R ( n 1) R ( n 0)! R ( n p 2) " " # " p 1 ( ) R ( n p 2)! R ( n 0) α 1 α 2 " α p = R ( n 1) R ( n 2) R n " p ( ) where: R n ( i) = s ( n m)s n m+i m ( ) SHORT-TERM AUTOCORRELATION FUNCTION

Linear prediction It can be observed that the matrix R n R ( n 0) R ( n 1) R ( n 2)! R ( n p 1) R ( n 1) R ( n 0) R ( n 1)! R ( n p 2) R ( n 2) " R ( n 1) " R ( n 0) is a Toeplitz matrix, and as such: It is symmetric All the elements on the main diagonal have same value "! # ( p 1) R ( n p 2) R ( n p 3)! R ( n 0) " "

Parametric representation Example: [a] VOWEL for several values of prediction order p AMPLITUDE db TIME FREQUENCY FREQUENCY FREQUENCY FREQUENCY FREQUENCY FREQUENCY INPUT SIGNAL SHORT TERM SPECTRUM LPC p=4 LPC p=8 LPC p=12 LPC p=16 LPC p=20

Parametric representation Summary VOCAL TRACT PARAMETERS: Coefficientsα k Gain G SOURCE PARAMETERS: PITCH VOICED/VOICELESS decision

Parametric representation The complete model is thus as follows: PITCH SOURCE PARAMETER: PITCH + VOICED/VOICELESS decision PULSE TRAIN GENERATOR RANDOM NOISE GENERATOR u n Voiced/voiceless switch x G TIME-VARYING FILTER ( ) ( ) s n α k COEFFICIENTS Transmission rate up to 10000 bit/s

Parametric representation The predictor order p is typically about 12~14 ENERGY (db) Red: original signal Blue: LPC FREQUENCY

Parametric representation VOCODER scheme s( n) LPC analysis filter Coder Channel LPC synthesis filter PITCH detector Decoder ŝ( n) TRANSMITTER CHANNEL RECEIVER

Mixed systems Mixed systems are only in part based on speech production models Best example: Multipulse The multipulse method achieves excellent quality far transmission rates around 10000 bit/s In this method the vocal tract is represented by a LPC filter, but the source is determined without relying on specific properties of the speech signal ( ) u n LPC filter ( )!s n ( ) u ( n ) ( ) Given a signal!s n, one searches for the system input that makes as close as possible tos n ( ) α k The input u n and the coefficients are then sent to the receiver!s ( n)

What is ( ) u n like? Multipulse systems Let us assume a signal window of length 100 samples: it is obvious that, if u( n) had length 100 samples, the synthesis would be perfect The maximum number of available samples depends however on the Depending on the transmission rate, pulses (samples) TRANSMISSION RATE u( n) will thus include the right number of u( n)

Multipulse systems Optimal positions and amplitudes of pulses forming the input sequence must then be determined Example: for a bit rate of about 16 kbits/s, one can transmit ~ 30 pulses for a signal frame of 128 samples. u n will contain ~ 30 samples of NON-ZERO amplitude ( ) Finding optimal positions would require to analyze ALL possible positions, with an unacceptable computational cost Sub-optimal solutions are typically adopted

Multipulse systems In the search for pulse positions, positions are explored ONE AT THE TIME u( n) In the search for pulse amplitudes, a system of linear equations can be defined

Multipulse systems Which information is transmitted? Coefficients Positions Amplitudes Amplitude quantization step NOTE THAT OPPOSITELY to the VOCODER, here there is no information on the structure of the source signal (neither voiced/voiceless decision, nor pitch extraction)

Mixed methods GSM system This method was standardized for early digital RADIO-MOBILE voice transmissions STRUCTURE: similar to the one described or the multipulse system, but the search for optimal positions is carried out with a resolution of three samples u( n) POSITION SEARCH Computational cost much lower than in multipulse system 13 kb/s standard

MPEG1 Audio Compression Input Critical bands filtering (sub-band filtering) Bit allocation (quantization) Bitstream formatting Output Estimation of masking effects (Psychoacoustic model) MPEG1 audio compression works in the frequency domain It takes advantage of limitations in the human auditory system in order to reduce the bit rate without significant effect on perceived audio quality MPEG1 audio compression evolved in 3 different layers: Layer 1, Layer 2 and Layer 3 (MPEG1 Layer 3, known as MP3)

Frequency masking: the Bark scale The perception of a sound at a given frequency reduces the capability of the ear of perceiving other sounds at nearby frequencies The higher the intensity of the sound, the stronger the masking effect The frequency interval affected by a sound is referred to as critical band The width of critical bands grows with frequency

Time masking A tone at high intensity affects the capability of human ear to perceive another tone at nearby frequencies even after the perception of the first tone ends.

Overall masking effect The combination of time masking and frequency masking leads to specific frequency intervals that are not audible for specific time intervals

MP3 coding MP3 coding uses information on time and frequency masking effects to achieve efficient bit allocation for quantization Bands affected by masking effects are coded with a low number of bits (higher quantization noise): this module is proprietary PCM bitstream Filter bank (32 sub-bands) Modified Discrete Cosine Transform Non-uniform Quantization FFT (1024 points) Psychoacoustic model (proprietary) Additional control signalling MP3 coded bitstream Bitstream creation Huffman coding