Audio processing methods on marine mammal vocalizations

Similar documents
Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Speech Synthesis using Mel-Cepstral Coefficient Feature

Digital Speech Processing and Coding

YEDITEPE UNIVERSITY ENGINEERING FACULTY COMMUNICATION SYSTEMS LABORATORY EE 354 COMMUNICATION SYSTEMS

Linguistic Phonetics. Spectral Analysis

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Lab 3 FFT based Spectrum Analyzer

EC 6501 DIGITAL COMMUNICATION UNIT - II PART A

Converting Speaking Voice into Singing Voice

Overview of Code Excited Linear Predictive Coder

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

EE482: Digital Signal Processing Applications

CS 188: Artificial Intelligence Spring Speech in an Hour

Sampling and Reconstruction of Analog Signals

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Advanced audio analysis. Martin Gasser

Enhanced Waveform Interpolative Coding at 4 kbps

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Applications of Music Processing

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

SOUND SOURCE RECOGNITION AND MODELING

Department of Electronics and Communication Engineering 1

DIGITAL SIGNAL PROCESSING. Chapter 1 Introduction to Discrete-Time Signals & Sampling

SGN Audio and Speech Processing

Lecture 6: Speech modeling and synthesis

Lecture 9: Time & Pitch Scaling

JOURNAL OF OBJECT TECHNOLOGY

Lecture 5: Speech modeling. The speech signal

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Speech Signal Analysis

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Frequency Division Multiplexing Spring 2011 Lecture #14. Sinusoids and LTI Systems. Periodic Sequences. x[n] = x[n + N]

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

L19: Prosodic modification of speech

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

END-OF-YEAR EXAMINATIONS ELEC321 Communication Systems (D2) Tuesday, 22 November 2005, 9:20 a.m. Three hours plus 10 minutes reading time.

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Audio Signal Compression using DCT and LPC Techniques

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Cepstrum alanysis of speech signals

EE 422G - Signals and Systems Laboratory

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Week 1 Introduction of Digital Signal Processing with the review of SMJE 2053 Circuits & Signals for Filter Design

Overview of Digital Signal Processing

Communications Theory and Engineering

NCCF ACF. cepstrum coef. error signal > samples

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Lecture 5: Sinusoidal Modeling

Outline. Discrete time signals. Impulse sampling z-transform Frequency response Stability INF4420. Jørgen Andreas Michaelsen Spring / 37 2 / 37

Telecommunication Electronics

Real time noise-speech discrimination in time domain for speech recognition application

CMPT 318: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

Synthesis Techniques. Juan P Bello

Continuous vs. Discrete signals. Sampling. Analog to Digital Conversion. CMPT 368: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals

Design IV. E232 Spring 07

Analog and Telecommunication Electronics

Digital Signal Processing

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

Music 270a: Fundamentals of Digital Audio and Discrete-Time Signals

Speech Compression Using Voice Excited Linear Predictive Coding

APPLICATIONS OF DSP OBJECTIVES

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Frequency-Domain Sharing and Fourier Series

Overview of Signal Processing

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Speech Enhancement Based On Noise Reduction

SGN Audio and Speech Processing

Speech Coding Technique And Analysis Of Speech Codec Using CS-ACELP

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

An Improved Voice Activity Detection Based on Deep Belief Networks

Sampling of Continuous-Time Signals. Reference chapter 4 in Oppenheim and Schafer.

Multirate DSP, part 3: ADC oversampling

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

Gammatone Cepstral Coefficient for Speaker Identification

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Calibration of Microphone Arrays for Improved Speech Recognition

Chapter 1. Electronics and Semiconductors

Introduction of Audio and Music

TCET3202 Analog and digital Communications II

Discrete-time Signals & Systems

TE 302 DISCRETE SIGNALS AND SYSTEMS. Chapter 1: INTRODUCTION

EC 2301 Digital communication Question bank

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Lecture Schedule: Week Date Lecture Title

Basic Characteristics of Speech Signal Analysis

QUESTION BANK. SUBJECT CODE / Name: EC2301 DIGITAL COMMUNICATION UNIT 2

Transcription:

Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio http://labrosa.ee.columbia.edu

Sound to Signal sound is pressure variation of the medium (e.g. speech air pressure, marine mammals water pressure) Pressure waves in water Converting waves to voltage through a microphone Time varying voltage Xanadu Halkias-www.ee.columbia.edu/~xanadu 2

Analog to digital sampling + quantizing = digital signal Xanadu Halkias-www.ee.columbia.edu/~xanadu 3

Time to frequency and back Fourier transform=decompose a signal as a sum of sinusoids and cosines Digital signal Fourier spectrum Spectrum = the frequency content of the signal (energy/frequency band) Xanadu Halkias-www.ee.columbia.edu/~xanadu 4

Back to sampling Signal has to be bandlimited eg. energy up to some frequency Ω Μ Sampling needs to obey the Nyquist limit: Ω Τ 2Ω Μ Audio is sampled at Ω Τ =2π44100Hz so spectrum has up to 22050Hz Xanadu Halkias-www.ee.columbia.edu/~xanadu 5

Looking at sounds-the Spectrogram Looking at energy in time and frequency Xanadu Halkias-www.ee.columbia.edu/~xanadu 6

More on spectrograms Xanadu Halkias-www.ee.columbia.edu/~xanadu 7

Overview of marine mammal research Xanadu Halkias-www.ee.columbia.edu/~xanadu 8

Call detection What is it good for Detect different calls within the recording automatically Differentiate between species or identify the number of marine mammals in the region through overlapping of calls Tracking marine mammals through their calls Use calls to analyze and construct a possible language structure Problems Data, data, data Xanadu Halkias-www.ee.columbia.edu/~xanadu 9

Call detection approaches Noise is the biggest problem D. K. Mellinger et all use the cross-correlation approach Cross-correlation is a way of measuring how similar two signals are Xanadu Halkias-www.ee.columbia.edu/~xanadu 10

Call detection-kernel cross- correlation This method requires manual interference and is performed on the signal waveform Image obtained by D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp. 163-181, 1997 Xanadu Halkias-www.ee.columbia.edu/~xanadu 11

Call detection-spectrogram correlation Image obtained by D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp. 163-181, 1997 Xanadu Halkias-www.ee.columbia.edu/~xanadu 12

Voiced calls Energy appears in multiples of some frequency (=pitch) Xanadu Halkias-www.ee.columbia.edu/~xanadu 13

Comments Both methods require manual measurements for the construction of the template The quality of the results depends highly on the noise present in the data Quality recordings at high sampling rates decide the course of action Correlation methods can t capture all types of calls without constructing different kernels Xanadu Halkias-www.ee.columbia.edu/~xanadu 14

Linear Predictive Coding Idea: the signal, x[n], is formed by adding white noise, e[n], to previous samples weighted by the linear predictive coefficients, a E[z] 1/A[z] X[z] The number of coefficients defines the detail that we capture of the original signal Xanadu Halkias-www.ee.columbia.edu/~xanadu 15

Linear Predictive Coding Used in speech for transmission purposes Intuition: LPCs model the spectral peaks of your signal Xanadu Halkias-www.ee.columbia.edu/~xanadu 16

LPCs in marine mammal recordings Model the peaks in the recordings that likely belong to calls that way we alleviate the problem of noise Unveils harmonic structure not visible in original spectrogram Xanadu Halkias-www.ee.columbia.edu/~xanadu 17

Hidden Markov Models Machine learning involves training a general model based on your data in order to extract and predict desired features HMMs, M j are defined by: Xanadu Halkias-www.ee.columbia.edu/~xanadu 18

HMMs some more Training: getting the parameters of the model, a, b, π Evaluating: we are given a sequence of states we want to know if the model produced them Decoding: we have some observations and we want to find out the hidden states Xanadu Halkias-www.ee.columbia.edu/~xanadu 19

HMMs in marine mammal vocalizations HMMs could provide a call detection tool The data has to be workable Use frequencies of the spectrogram as hidden states Observe the spectrogram and use it for learning Tracking the call in the spectrogram Xanadu Halkias-www.ee.columbia.edu/~xanadu 20

References D. P. Ellis www.ee.columbia.edu/~dpwe/e6820 www.ee.columbia.edu/~dpwe/e4810 D. K. Mellinger and C. W. Clark. "Methods for automatic detection of mysticete sounds", Mar. Fresh. Behav. Physiol. Vol. 29, pp. 163-181, 1997 R. O. Duda, P. E. Hart, D. G. Stork. Pattern Classification, John Wiley & sons, inc. 2001 Xanadu Halkias-www.ee.columbia.edu/~xanadu 21