Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Similar documents
CS 188: Artificial Intelligence Spring Speech in an Hour

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Statistical NLP Spring Unsupervised Tagging?

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Speech Recognition. Mitch Marcus CIS 421/521 Artificial Intelligence

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Speech Synthesis using Mel-Cepstral Coefficient Feature

EE482: Digital Signal Processing Applications

Audio processing methods on marine mammal vocalizations

SPEECH AND SPECTRAL ANALYSIS

Linguistic Phonetics. Spectral Analysis

SGN Audio and Speech Processing

Speech Signal Analysis

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Digital Signal Processing

Digital Speech Processing and Coding

Mel Spectrum Analysis of Speech Recognition using Single Microphone

SGN Audio and Speech Processing

Applications of Music Processing

VQ Source Models: Perceptual & Phase Issues

MAKE SOMETHING THAT TALKS?

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Source-Filter Theory 1

APPLICATIONS OF DSP OBJECTIVES

Part of Speech Tagging & Hidden Markov Models (Part 1) Mitch Marcus CIS 421/521

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Overview of Code Excited Linear Predictive Coder

Cepstrum alanysis of speech signals

Chapter 1: Introduction to audio signal processing

Physics 115 Lecture 13. Fourier Analysis February 22, 2018

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Recurrent neural networks Modelling sequential data. MLP Lecture 9 Recurrent Networks 1

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Distributed Speech Recognition Standardization Activity

EE 225D LECTURE ON SPEECH SYNTHESIS. University of California Berkeley

Robustness (cont.); End-to-end systems

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

SOUND SOURCE RECOGNITION AND MODELING

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Experimental evaluation of inverse filtering using physical systems with known glottal flow and tract characteristics

Source-filter Analysis of Consonants: Nasals and Laterals

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Drum Transcription Based on Independent Subspace Analysis

Advanced Audiovisual Processing Expected Background

Foundations of Language Science and Technology. Acoustic Phonetics 1: Resonances and formants

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

An Approach to Very Low Bit Rate Speech Coding

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Performance analysis of voice activity detection algorithm for robust speech recognition system under different noisy environment

EXPERIMENT 8: SPEED OF SOUND IN AIR

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Audio Imputation Using the Non-negative Hidden Markov Model

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

COMP 546, Winter 2017 lecture 20 - sound 2

Lab 3 FFT based Spectrum Analyzer

L19: Prosodic modification of speech

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

HIGH RESOLUTION SIGNAL RECONSTRUCTION

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Lecture 6: Speech modeling and synthesis

ECE 4203: COMMUNICATIONS ENGINEERING LAB II

Using RASTA in task independent TANDEM feature extraction

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

Adaptive Filters Application of Linear Prediction

Lab 9 Fourier Synthesis and Analysis

Review: Frequency Response Graph. Introduction to Speech and Science. Review: Vowels. Response Graph. Review: Acoustic tube models

Basic Characteristics of Speech Signal Analysis

Week 4: Experiment 24. Using Nodal or Mesh Analysis to Solve AC Circuits with an addition of Equivalent Impedance

Epoch Extraction From Emotional Speech

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

Pitch Period of Speech Signals Preface, Determination and Transformation

EE 422G - Signals and Systems Laboratory

Enhanced Waveform Interpolative Coding at 4 kbps

Complex Sounds. Reading: Yost Ch. 4

Lecture 9: Time & Pitch Scaling

Acoustic Phonetics. How speech sounds are physically represented. Chapters 12 and 13

Advanced audio analysis. Martin Gasser

DERIVATION OF TRAPS IN AUDITORY DOMAIN

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.

Deep Learning Basics Lecture 9: Recurrent Neural Networks. Princeton University COS 495 Instructor: Yingyu Liang

An Experimentally Measured Source Filter Model: Glottal Flow, Vocal Tract Gain and Output Sound from a Physical Model

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Department of Electronics and Communication Engineering 1

Transcription:

Introduction to Artificial Intelligence Announcements V22.0472-001 Fall 2009 Lecture 19: Speech Recognition & Viterbi Decoding Rob Fergus Dept of Computer Science, Courant Institute, NYU Slides from John DeNero, Dan Klein, Zheng Chen 2 Today HMMs: Most likely explanation queries Speech recognition A massive HMM! Details of this section not required Speech and Language Speech technologies Automatic speech recognition (ASR) Text-to-speech synthesis (TTS) Dialog systems Language g processing technologies Machine translation 3 Information extraction Web search, question answering Text classification, spam filtering, etc HMMs: MLE Queries State Path Trellis HMMs defined by States X Observations E Initial distr: Transitions: Emissions: Query: most likely explanation: X 1 X 2 X 3 X 4 E 1 X E 2 E 3 E 4 E Viterbi algorithm 5 State trellis: graph of states and transitions over time Each arc represents some transition Each arc has weight Each path is a sequence of states The product of weights on a path is the seq s probability Can think of the Forward (and now Viterbi) algorithms as computing sums of all paths (best paths) in this graph 6 1

Viterbi Algorithm Example 7 8 Andrew Viterbi Digitizing Speech 10 Speech in an Hour Speech input is an acoustic wave form s p ee ch l a b Spectral Analysis Frequency gives pitch; amplitude gives volume sampling at ~8 khz phone, ~16 khz mic (khz=1000 cycles/sec) s p ee ch l a b l to a transition: Fourier transform of wave displayed as a spectrogram darkness indicates energy at each frequency Graphs from Simon Arnfield s web tutorial on speech, Sheffield: 11 http://www.psyc.leeds.ac.uk/research/cogn/speech/tutorial/ 12 2

Adding 100 Hz + 1000 Hz Waves Spectrum 0.99 Frequency components (100 and 1000 Hz) on x-axis 0 Amplitude 0.9654 0 0.05 Time (s) 13 100 Frequency in Hz 1000 14 Part of [ae] from lab Back to Spectra Spectrum represents these freq components Computed by Fourier transform, algorithm which separates out each frequency component of wave. Note complex wave repeating nine times in figure Plus smaller waves which repeats 4 times for every large pattern Large wave has frequency of 250 Hz (9 times in.036 seconds) Small wave roughly 4 times this, or roughly 1000 Hz Two little tiny waves on top of peak of 1000 Hz waves 15 x-axis shows frequency, y-axis shows magnitude (in decibels, a log measure of amplitude) Peaks at 930 Hz, 1860 Hz, and 3020 Hz. 16 Resonances of the vocal tract The human vocal tract as an open tube Closed end Open end Length 17.5 cm. Air in a tube of a given length will tend to vibrate at resonance frequency of tube. Constt: Pressure differential should be maximal at (closed) glottal end and minimal at (open) lip end. 17 Figure from W. Barry Speech Science slides From Mark Liberman s website 18 3

Acoustic Feature Sequence Time slices are translated into acoustic feature vectors (~39 real numbers per slice) State Space P(E X) encodes which acoustic vectors are appropriate for each phoneme (each kind of sound)..e 12 e 13 e 14 e 15 e 16.. These are the observations, now we need the hidden states X P(X X ) encodes how sounds can be strung together We will have one state for each sound in each word From some state x, can only: Stay in the same state (e.g. speaking slowly) Move to the next position in the word At the end of the word, move to the start of the next word We build a little state graph for each word and chain them together to form our state space X 19 20 HMMs for Speech Schematic Architecture for a (simplified) Speech Recognizer 21 22 HMM Model in Speech Fine-ged HMM model to represent a phone The most common model used for speech is consted, allowing a state to transition only to itself or to a single succeeding state. 23 24 4

Decoding While there are some practical issues, finding the words given the acoustics is an HMM inference problem We want to know which state sequence x 1:T is most likely given the evidence e 1T 1:T : From the sequence x, we can simply read off the words 25 26 Also use Language Model For the given acoustic observation O= o1, o2,..., on the goal of speech recognition is to find out the corresponding word sequence W = w1, w2,..., wn that has the maximum posterior probability P(W O) Acoustic Model Language Model 27 5