Linear Predictive Coding *

Similar documents
Adaptive Filters Linear Prediction

Image Processing - License Plate Localization and Letters Extraction *

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

EE482: Digital Signal Processing Applications

SPEECH AND SPECTRAL ANALYSIS

Linguistic Phonetics. Spectral Analysis

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Pitch Detection Algorithms

Digital Speech Processing and Coding

Advanced audio analysis. Martin Gasser

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

An Introduction to the FDM-TDM Digital Transmultiplexer: Appendix C *

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

L19: Prosodic modification of speech

Communications Theory and Engineering

Audio Signal Compression using DCT and LPC Techniques

Speech Enhancement using Wiener filtering

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

NOISE ESTIMATION IN A SINGLE CHANNEL

A Comparative Study of Formant Frequencies Estimation Techniques

Converting Speaking Voice into Singing Voice

Overview of Code Excited Linear Predictive Coder

Synthesis Algorithms and Validation

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Applications of Music Processing

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Analysis/synthesis coding

Remote Sound Detection Using a Laser. Collection Editor: Naren Anand

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

On the glottal flow derivative waveform and its properties

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Lecture 6: Speech modeling and synthesis

Robust Algorithms For Speech Reconstruction On Mobile Devices

Lecture 5: Speech modeling. The speech signal

GLOTTAL EXCITATION EXTRACTION OF VOICED SPEECH - JOINTLY PARAMETRIC AND NONPARAMETRIC APPROACHES

Initial Vectors (random) Filter (Fault-simulation Based Compaction) Yes. done? predict/construct future vectors; append to test set.

AN AUTOREGRESSIVE BASED LFM REVERBERATION SUPPRESSION FOR RADAR AND SONAR APPLICATIONS

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Forced Oscillations and Resonance *

Planar Phased Array Calibration Based on Near-Field Measurement System

Voice Excited Lpc for Speech Compression by V/Uv Classification

Spectral analysis of seismic signals using Burg algorithm V. Ravi Teja 1, U. Rakesh 2, S. Koteswara Rao 3, V. Lakshmi Bharathi 4

Minor Keys and Scales *

Speech Synthesis using Mel-Cepstral Coefficient Feature

Resonance and resonators

Speech synthesizer. W. Tidelund S. Andersson R. Andersson. March 11, 2015

The source-filter model of speech production"

COMPARATIVE REVIEW BETWEEN CELP AND ACELP ENCODER FOR CDMA TECHNOLOGY

Speech Compression Using Voice Excited Linear Predictive Coding

Transformation of graphs by greatest integer function

Short Time Fourier Transform *

What is an FDM-TDM Transmultiplexer *

IMPULSE NOISE CANCELLATION ON POWER LINES

Speech Coding using Linear Prediction

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

DIVERSE RESONANCE TUNING STRATEGIES FOR WOMEN SINGERS

Comparison of CELP speech coder with a wavelet method

A() I I X=t,~ X=XI, X=O

Voice mail and office automation

Lecture 5: Sinusoidal Modeling

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

-voiced. +voiced. /z/ /s/ Last Lecture. Digital Speech Processing. Overview of Speech Processing. Example on Sound Source Feature

AP PHYSICS WAVE BEHAVIOR

Adaptive Noise Reduction of Speech. Signals. Wenqing Jiang and Henrique Malvar. July Technical Report MSR-TR Microsoft Research

Linguistic Phonetics. The acoustics of vowels

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Transverse Pulses - Grade 10 *

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Speech Perception Speech Analysis Project. Record 3 tokens of each of the 15 vowels of American English in bvd or hvd context.

Auto Regressive Moving Average Model Base Speech Synthesis for Phoneme Transitions

Psychology of Language

Synthesis of speech with a DSP

OpenStax-CNX module: m Interval * Catherine Schmidt-Jones

Source-filter Analysis of Consonants: Nasals and Laterals

Adaptive Filters Application of Linear Prediction

A DSP Based Class D Audio Amplifier *

Acoustic Phonetics. Chapter 8

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Fundamental Frequency Detection

24.3 Production of Electromagnetic Waves *

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Speech Enhancement Based On Noise Reduction

Cepstrum alanysis of speech signals

Number Patterns - Grade 10 [CAPS] *

COMPARING ACOUSTIC GLOTTAL FEATURE EXTRACTION METHODS WITH SIMULTANEOUSLY RECORDED HIGH- SPEED VIDEO FEATURES FOR CLINICALLY OBTAINED DATA

Chapter 4 SPEECH ENHANCEMENT

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Speech Enhancement Using Voice Source Models. Anisa Yasmin. A thesis. presented to the University of Waterloo. in fullment of the

Exploring QAM using LabView Simulation *

Digital Signal Processing

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Modulator Domain Adaptive Gain Equalizer for Speech Enhancement

Transcription:

OpenStax-CNX module: m45345 1 Linear Predictive Coding * Kiefer Forseth This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 1 LPC Implementation Linear Predictive Coding or LPC is usually applied for speech compression but we will be utilizing its very output dependent form for analysis of our signal. The fundamental principle behind LPC is that one can predict or approximate the future elements of a signal based on linear combinations of the previous signal elements and, of course, and an input excitation signal. Figure 1: All equations are from: http://cs.haifa.ac.il/ nimrod/compression/speech/s4linearpredictioncoding2009.pdf 1. The eectiveness of this model stems from fact that the processes that generate speech in the human body are relatively slow and that the human voice is quite limited in frequency range. The physical model of the vocal tract in LPC is a buzzer, corresponding the glottis which produces the baseline buzz in the human voice, at the end of a tube which has linear characteristics. * Version 1.1: Dec 14, 2012 8:43 am -0600 http://creativecommons.org/licenses/by/3.0/ 1 http://cs.haifa.ac.il/ nimrod/compression/speech/s4linearpredictioncoding2009.pdf

OpenStax-CNX module: m45345 2 Figure 2: Courtesy of https://engineering.purdue.edu/cfdlab/projects/voice.html In systems speak we clearly see the form of a feedback lter emerging so to further analyze the system we take its Z-transform and nd a transfer function from U[z] to S[z].

OpenStax-CNX module: m45345 3 Figure 3 The result is clearly an all pole lter and in standard application one would feed in the generating signal and get out a compressed version of the output. Figure 4 The key barrier to implementing this lter is of course determining the a values or the coecients of our linear superposition approximation of the output signal. Ultimately when we form the linear superposition, we want to choose coecients that yield a compressed signal with the minimum deviation from the original signal; equivalently we want to minimize the dierence(error) between the two signals.

OpenStax-CNX module: m45345 4 Figure 5 From the form of s(n) we can derive and equivalent condition on the auto-correlation R[j]. Figure 6 Where: Figure 7 Thus, we have p such equations, one for each R(j) and so we can more easily describe our conditions in terms of a matrix equation.

OpenStax-CNX module: m45345 5 Figure 8 The matrix we now need to invert and multiply has a unique constant diagonal structure which classies it as a Toeplitz matrix. There have been multiple methods developed for solving equations with Toeplitz matrices and one of the most ecient method, the method we used, is the the Levinson Durbin algorithm. This method is a bit involved but fundamentally it solves the system of equations by solving smaller submatrix equations and iterating up to get a recursive solution for all the coecients. 2 Application To reapply this method, this lter, towards our goal of speech analysis we rst note that the form of the lter primarily dependent on the output rather than the input. The coecients that we derived using the Levinson Durbin algorithm only use properties (the auto-correlation) of the output signal rather than the input signal. This means that this lter, in a way, is more natural as a method for going form output to input rather than the reverse, all we need do is take the reciprocal of the transfer function. Figure 9

OpenStax-CNX module: m45345 6 We go from an all pole lter to an all zero lter which now takes in a speech signal and returns the generating signal. This transfer function is actually more useful for our purposes because of our method of analyzing speech signals. We are primarily looking to identify the formants in the speech signal, the fundamental components of phonemes that make up human alphabets. These formants directly correspond to the resonant modes of the vocal tract, so we are eectively trying to achieve a natural mode decomposition of a complex resonant cavity. Figure 10: Courtesy of http://hyperphysics.phy-astr.gsu.edu/hbase/music/vocres.html Therefore these formants are more easily identiable in the generating signal (since there are inherently a property of the generating cavity). With the lter generated by LPC we can now reconstruct a linear approximation of the generation signal using the speech signals from our soundbank. Our full signal of course can be represented by a spectrogram and the formant correspond to the local maxima of each time slice of the spectrogram.

OpenStax-CNX module: m45345 7 Figure 11: Spectrogram (left); One slice of the spectrogram, with peaks and troughs highlighted (right) What we chose to extract from these spectrograms were the amplitude and frequency data of the rst 4 formants present in the signal, as these are usually the most dominant, as well as the same information about the minima in between the peaks. This is the information we will need to feed into our classier for emotion classication.