EE482: Digital Signal Processing Applications

Similar documents
EE482: Digital Signal Processing Applications

Communications Theory and Engineering

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speech Coding in the Frequency Domain

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Advanced audio analysis. Martin Gasser

Audio Signal Compression using DCT and LPC Techniques

Auditory modelling for speech processing in the perceptual domain

Speech Signal Analysis

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Overview of Code Excited Linear Predictive Coder

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

SGN Audio and Speech Processing

Speech Synthesis using Mel-Cepstral Coefficient Feature

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Adaptive Filters Application of Linear Prediction

APPLICATIONS OF DSP OBJECTIVES

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Digital Speech Processing and Coding

Speech Synthesis; Pitch Detection and Vocoders

Adaptive Filters Linear Prediction

Equalizers. Contents: IIR or FIR for audio filtering? Shelving equalizers Peak equalizers

SGN Audio and Speech Processing

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

T Automatic Speech Recognition: From Theory to Practice

Voice Excited Lpc for Speech Compression by V/Uv Classification

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

Cepstrum alanysis of speech signals

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Comparison of CELP speech coder with a wavelet method

Binaural Hearing. Reading: Yost Ch. 12

Audio Restoration Based on DSP Tools

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

Copyright S. K. Mitra

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

Speech Compression Using Voice Excited Linear Predictive Coding

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

COMPRESSIVE SAMPLING OF SPEECH SIGNALS. Mona Hussein Ramadan. BS, Sebha University, Submitted to the Graduate Faculty of

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Digital Signal Processing

Chapter 4 SPEECH ENHANCEMENT

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Auditory Based Feature Vectors for Speech Recognition Systems

Analysis/synthesis coding

SPEECH AND SPECTRAL ANALYSIS

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Enhanced Waveform Interpolative Coding at 4 kbps

MATLAB SIMULATOR FOR ADAPTIVE FILTERS

Speech Coding using Linear Prediction

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Sound Synthesis Methods

Chapter IV THEORY OF CELP CODING

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Problem Sheet 1 Probability, random processes, and noise

Audio and Speech Compression Using DCT and DWT Techniques

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Complex Sounds. Reading: Yost Ch. 4

Isolated Digit Recognition Using MFCC AND DTW

Speech Enhancement using Wiener filtering

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Synthesis Techniques. Juan P Bello

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

CS3291: Digital Signal Processing

Multirate Digital Signal Processing

Simulation of Conjugate Structure Algebraic Code Excited Linear Prediction Speech Coder

ELEC9344:Speech & Audio Processing. Chapter 13 (Week 13) Professor E. Ambikairajah. UNSW, Australia. Auditory Masking

Resonator Factoring. Julius Smith and Nelson Lee

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science. OpenCourseWare 2006

Speech Compression based on Psychoacoustic Model and A General Approach for Filter Bank Design using Optimization

Page 0 of 23. MELP Vocoder

Lecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

Audio Watermarking Scheme in MDCT Domain

SOUND SOURCE RECOGNITION AND MODELING

Robust Algorithms For Speech Reconstruction On Mobile Devices

Chapter 7. Frequency-Domain Representations 语音信号的频域表征

Signal processing preliminaries

REAL-TIME BROADBAND NOISE REDUCTION

CS 188: Artificial Intelligence Spring Speech in an Hour

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Assistant Lecturer Sama S. Samaan

SPEech Feature Toolbox (SPEFT) Design and Emotional Speech Feature Extraction

RECENTLY, there has been an increasing interest in noisy

HCS 7367 Speech Perception

Enhancement of Speech in Noisy Conditions

FIR/Convolution. Visulalizing the convolution sum. Convolution

The Channel Vocoder (analyzer):

An introduction to physics of Sound

DEPARTMENT OF INFORMATION TECHNOLOGY QUESTION BANK. Subject Name: Information Coding Techniques UNIT I INFORMATION ENTROPY FUNDAMENTALS

FX Basics. Filtering STOMPBOX DESIGN WORKSHOP. Esteban Maestre. CCRMA - Stanford University August 2013

Transcription:

Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 14 Quiz 04 Review 14/04/07 http://www.ee.unlv.edu/~b1morris/ee482/

2 Outline Random Processes Autocorrelation, white noise, expectation Adaptive Signal Processing Adaptive filtering, LMS, applications Speech Signal Processing LPC, CELP, noise subtraction, recognition Audio Signal Processing Masking, MDCT, coding systems, equalizers

3 Outline Random Processes Autocorrelation, white noise, expectation Adaptive Signal Processing Adaptive filtering, LMS, applications Speech Signal Processing LPC, CELP, noise subtraction, recognition Audio Signal Processing Masking, MDCT, coding systems, equalizers

4 Autocorrelation Specifies statistical relationship of signal at different time lags (n k) r xx n, k = E x n x k Similarity of observations as a function of the time between them (repeating pattern, time-delay, etc.) We consider wide sense stationary (WSS) processes Statistics do not change with time Mean independent of time Autocorrelation only depends on time lag r xx k = E x n + k x n

5 Expected Value Value of random variable expected if random variable process repeated infinite number of times Weighted average of all possible values Expectation operator E. =. f x dx f(x) probability density function of random variable X Favorites are mean and variance Mean - E x(n) = Variance - E x n m x 2 x(n)f x dx = m x

6 White Noise Very popular random signal Typical noise model v(n) with zero mean and variance σ2 v Autocorrelation r vv k = σ 2 v δ k Statistically uncorrelated except at zero time lag Power spectrum P vv ω = σ 2 v, ω π Uniformly distributed over entire frequency range

7 Outline Random Processes Autocorrelation, white noise, expectation Adaptive Signal Processing Adaptive filtering, LMS, applications Speech Signal Processing LPC, CELP, noise subtraction, recognition Audio Signal Processing Masking, MDCT, coding systems, equalizers

8 General Adaptive Filter Signal characteristics in practical applications are time varying and/or unknown Must modify filter coefficients adaptively in an automated fashion to meet objectives Two components Digital filter defined by coefficients Adaptive algorithm automatically update filter coefficients (weights) Adaption occurs by comparing filtered signal y(n) with a desired (reference) signal d(n) Minimize error e(n) using a cost function (e.g. mean-square error) Continually lower error and get y n closer to d(n)

9 FIR Adaptive Filter y n L 1 = l=0 w l n x(n l) Notice time-varying weights In vector form y n = w T n x n = x T n w n x n = x n, x n 1,, x n L + 1 w n = w 0 n, w 1 n,, w L 1 n T Error signal e n = d n y n = d n w T n x n T Use mean-square error (MSE) cost function ξ n = E e 2 n ξ n = E d 2 n 2p T w n + w T n Rw n p = E d n x n = r dx 0, r dx 1,, r dx L 1 R autocorrelation matrix R = E[x n x T n ] Error function is quadratic surface Can use gradient descent w n + 1 = w n μ ξ n 2 T

10 LMS Algorithm Practical applications do not have knowledge of d n, x n Cannot directly compute MSE and gradient Stochastic gradient algorithm Use instantaneous squared error to estimate MSE ξ n = e 2 n Gradient estimate ξ n = 2 e n e n e n = d n w T n x(n) ξ n = 2x(n)e n Steepest descent algorithm w n + 1 = w n + μx n e n LMS Steps 1. Set L, μ, and w(0) L filter length μ step size (small e.g. 0.01) w(0) initial filter weights 2. Compute filter output y n = w T n x n 3. Compute error signal e n = d n y n 4. Update weight vector w l n + 1 = w l n + μx n l e n, l = 0,1, L 1 Notice this requires a reference signal Must choose small μ for stability

11 Practical Applications Four classes of adaptive filtering applications System identification determine unknown system coefficients Noise cancellation remove embedded noise Prediction estimate future values Inverse modeling estimate inverse of unknown system

12 Outline Random Processes Autocorrelation, white noise, expectation Adaptive Signal Processing Adaptive filtering, LMS, applications Speech Signal Processing LPC, CELP, noise subtraction, recognition Audio Signal Processing Masking, MDCT, coding systems, equalizers

13 Linear Predictive Coding (LPC) Speech production model with excitation input, gain, and vocal-tract filter Gain represents amount of air from lungs and voice loudness Unvoiced (e.g. s, sh, f ) no vibration Use white noise for excitation signal Voiced (e.g. vowels) caused by vibration of vocal-cords with rate of vibration the pitch Modeled with periodic pulse with fundamental (pitch) frequency Generate periodic pulse train for excitation signal Vocal tract model Vocal tract is a pipe from vocal cords to oral cavity Modeled as all pole filter Match formants Most important part of LPC model (changes shape to make sounds)

14 Code-Exited Linear Prediction (CELP) Algorithms based on LPC approach using analysis by synthesis scheme Three main components: LPC vocal tract model (1/A(z)) Solve using Levinson-Durbin recursive algorithm with autocorrelation normal equations Perceptual-based minimization (W z ) Control sensitivity of error calculation Shape noise so it appears in regions where the ear cannot detect it Place in louder regions of spectrum Voice activity detection Critical for reduced coding More coefficients better match to speech

15 Noise Subtraction Input is noisy speech + stationary noise Estimate noise characteristics during silent period between utterances with VAD system Spectral subtraction implemented in frequency domain Based on short-time magnitude spectra estimation S k = H k X k H k = 1 E V k X k Subtract estimated noise mag spectrum from input signal Reconstruct enhanced speech signal using IFFT Coefficients are difference in mag and original phase

16 Speech Recognition x(n) Feature Extraction classifier templates Feature extraction Represent speech content with mel-frequency cepstrum (MFCC) coefficients c n = F 1 log X e jω Rate of change in spectrum bands MFCC use non-linear frequency bands to mimic human perception text Recognizer system Pattern recognition problem Must design templates and method to meaningfully compare speech signals Big issues: unequal length data Two solutions: Dynamic time warping (DTW) optimal alignment technique for sequences Hidden Markov model probabilistic model of speech with phoneme state transitions

17 Outline Random Processes Autocorrelation, white noise, expectation Adaptive Signal Processing Adaptive filtering, LMS, applications Speech Signal Processing LPC, CELP, noise subtraction, recognition Audio Signal Processing Masking, MDCT, coding systems, equalizers

18 Audio Coding Techniques are required to enable high quality sound reproduction efficiently Differences with speech Much wider bandwidth (not just 300-1000 Hz) Uses multiple channels Psychoacoustic principles can be utilized for coding Do not code frequency components below hearing threshold Lossy compression used based on noise shaping Noise below masking threshold is not audible Entropy coding applied Large amount of data from high sampling rate and multi-channels

19 Audio Codec Codec = coder-decoder Filterbank transform Convert between full-band signal (all frequencies) into subbands (modified discrete cosine transform MDCT) Psychoacoustic model Calculates thresholds according to human masking effects and used for quantization of MDCT Quantization MDCT coefficient quantization of spectral coefficients Lossless coding Use entropy coding to reduce redundancy of coded bitstream Side information coding Bit allocation information Multiplexer Pack all coded bits into bitstream

20 Auditory Masking Effects Psychoacoustic principle that a low-level signal (maskee) becomes inaudible when a louder signal (masker) occurs simultaneously Human hearing does not respond equally to all frequency components Auditory masking depends on the spectral distribution of masker and maskee These will vary in time Will do noise shaping during encoding to exploit human hearing

21 Quiet Threshold First step of perceptual coding Shape coding distortion spectrum Represent a listener with acute hearing No signal level below threshold will be perceived Quiet (absolute) threshold T q f = 3.64 6.5e 0.6 f 10 3 f 1000 f 1000 1000 3.3. 2 + 4 db 0.8 Most humans cannot sense frequencies outside of 20-20k Hz Range changes in time and narrows with age Sound Pressure Level (SPL) [db] 100 80 60 40 20 0-20 10 2 10 3 10 4 Frequency [Hz]

22 Masking Threshold Threshold determined by stimuli at a given time Time-varying threshold Human hearing non-linear response to frequency components Divide auditory system into 26 critical bands (barks) z f = 13 tan 1 0.00076f bark + 3.5 tan 1 [ f/7500 2 ] Higher bandwidth at higher frequencies Difficult to distinguish frequencies within the same bark Simultaneous masking Dominant frequency masks (overpowers) frequencies in same critical band No need to code any other frequency components in bark Masking spread Masking effect across adjacent critical bands Use triangular spread function +25 db/bark lower frequencies -10 db/bark higher frequencies Bark 30 20 10 0 0.5 1 1.5 2 Frequency [Hz] x 10 4

23 Frequency Domain Coding Representation of frequency content of signal Modified discrete cosine transform (MDCT) widely used for audio DCT energy compaction (lower # of coefficients) Reduced block effects MDCT definition N 1 X k = x n cos n + N+2 k + 1 2π n=0 4 2 N N/2 1 x n = X k cos n + N+2 k=0 4 n = 0,1,, N 1 k = 0,1,, N/2 1 k + 1 2 Notice half coefficients for each window Lapped transform (designed with overlapping windows built in) Like with FFT, windows are used but muse satisfy more conditions (Princen-Bradley condition) Window applied both to analysis (MDCT) and synthesis (imdct) equations 2π N

24 Audio Coding Entropy (lossless) coding removes redundancy in coded data without loss in quality Pure entropy coding (lossless-only) Huffman encoding statistical coding More often occurring symbols have shorter code words Fast method using a lookup table Cannot achieve very high compression Extended lossless coding Lossy coder followed by entropy coding 20% compression gain MP3 perceptual coding followed by entropy coding Scalable lossless coding Can have perfect reproduction Input first encoded, residual error is entropy coded Results in two bit streams Can choose lossy lowbit rate and combine for high quality lossless

25 Audio Equalizers Spectral equalization uses filtering techniques to reshape magnitude spectrum Useful for recording and reproduction Example uses Simple filters to adjust bass and treble Correct response of microphone, instrument pickups, loudspeakers, and hall acoustics Parametric equalizers provide better frequency compensations but require more operator knowledge than graphic equalizers

26 Graphic Equalizers Use of several frequency bands to display and adjust the power of audio frequency components Divide spectrum using octave scale (doubling scale) Bandpass filters can be realized using IIR filter design techniques DFT bins of audio signal X(k) need to be combined to form the equalizer frequency bands Use octave scaling to combine Input signal decomposed with bank of parallel bandpass filters Separate gain control for each band Signal power in each band estimated and displayed graphically with a bar

27 Example 10.4 Graphic equalizer to adjust signal Select bands Use octave scaling bandfreqs = {'31.25','62.5','125','250','500', '1k','2k','4k','8k','16k'}; 12 10 10-band Equalizer Amplitude [db] 12 10 8 6 4 2 Band Gain Spectral Diff Magnitude (db) 8 6 4 2 0 0 0 0.5 1 1.5 2 2.5 Frequency [Hz] x 10 4 31.2562.5 125 250 500 1k 2k 4k 8k 16k Frequency (Hz)

28 Parametric Equalizers Provides a set of filters connected in cascade that are tunable in terms of both spectral shape and filter gain Not fixed bandwidth and center as in graphic Use 2nd-order IIR filters Parameters: f s - sampling rate f c - cutoff frequency [center (peak) or midpoint (shelf) Q quality factor [resonance (peak) slope (shelf)] Gain boost in db (max ±12 db)

29 Shelf Filters Low-shelf Boost frequencies below cuttoff and pass higher components High-shelf Boost frequencies above cuttoff and pass rest See book for equations Ex 10.6 Shape of shelf filter with different gain parameters Magnitude [db] Magnitude [db] 15 10 5 0-5 -10 Low Shelf Filter (Fc=2000, Q=2, Fs=16000) G = 10 db G = 5 db G = -5 db G = -10 db -15 0 0.2 0.4 0.6 0.8 1 Normalized Frequency [ rad/sample] High Shelf Filter (Fc=6000, Q=1 and Q=2, Fs=16000) 15 G = 10 db 10 G = 5 db G = -5 db G = -10 db 5 0-5 -10-15 0 0.2 0.4 0.6 0.8 1 Normalized Frequency [ rad/sample]

30 Peak Filter Peak filter amplify certain narrow frequency bands Notch filter attenuate certain narrow frequency bands E.g. loudness of certain frequency See book for equations Ex 10.5 Shape of peak filter for different parameters Peak/Notch Filter (Fc=4/16, Q=2, Gain(dB)=10,5,-5,-10, Fs=16000) 10 G = 10 db G = 5 db G = -5 db 5 G = -10 db Magnitude [db] 0-5 -10 0 0.2 0.4 0.6 0.8 1 Normalized Frequency [ rad/sample]

31 Example 10.7 Implement parametric equalizer f s = 16,000 Hz Cascade 3 filters: Low-shelf filter f c = 1000, Gain = 10 db, Q = 1.0 High-shelf filter f c = 4000, Gain = 10 db, Q = 1.0 Peak filter f c = 7000, Gain = 10 db, Q = 1.0 Play example file outside of powerpoint Left channel original signal Right channel - filtered

32 Audio (Sound) Effects Use of filtering techniques to emphasize audio signal in artistic manner Will only mention and give examples of some common effects Not an in-depth look

33 Sound Reverberation Reverberation is echo sound from reflected sounds The echoes are related to the physical properties of the space Room size, configuration, furniture, etc. Use impulse response to measure Direct sound First sound wave to reach ear Reflected sound The echo waves that arrive after bouncing off a surface Example 10.8 Use hall impulse response to simulated reverberated sound Input Output

34 Pitch Shift Change speech pitch (fundamental frequency) All frequencies are adjusted over the entire signal Chipmunk voice Example 10.9a Adjust pitch See audio files 4000 3500 3000 4000 3500 3000 Frequency (Hz) 2500 2000 1500 Frequency (Hz) 2500 2000 1500 1000 1000 500 500 0 0.5 1 1.5 2 2.5 Time original 0 0.5 1 1.5 2 2.5 Time pitch shifted

35 Time Stretch Change speed of audio playback without affecting pitch Audio editing: adjust audio to fit a specific timeline Example 10.9b Adjust play time See audio files 1 0.8 0.6 Original Time Stretch 0.4 0.2 0-0.2-0.4 0 0.5 1 1.5 2 2.5 3

36 Tremolo Amplitude modulation of audio signal y n = 1 + AM n x n A max modulation amplitude M(n) slow modulation oscillator Example 10.10 A = 1, f r = 1 Hz White noise input at f s = 8000 Hz 0.4 0.3 0.2 0.1 0-0.1-0.2-0.3-0.4 0 2 4 6 8 10 0.8 0.6 M n = sin(2πf r nt) f r - modulation rate 0.4 0.2 0-0.2-0.4-0.6-0.8 0 2 4 6 8 10

37 Spatial Sounds Audio source localization determined by the way it is perceived by human ears Time delay and intensity differences Binaural audio demos Great home fun http://www.youtube.com/watc h?v=iudtlvagjja http://www.youtube.com/watc h?v=3fwda7twhhc http://www.qsound.com/demo s/binaural-audio.htm http://www.studio360.org/sto ry/126833-adventures-3dsound/ Sounds in different positions arrive differently at ears Interaural time difference (ITD) - delay between sounds reaching ear for localization Iteraural intensity difference (IID) - loudness difference for localization