A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

Similar documents
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

High-speed Noise Cancellation with Microphone Array

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Mikko Myllymäki and Tuomas Virtanen

NOISE ESTIMATION IN A SINGLE CHANNEL

Calibration of Microphone Arrays for Improved Speech Recognition

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Automatic Morse Code Recognition Under Low SNR

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Lecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Speech Enhancement: Reduction of Additive Noise in the Digital Processing of Speech

Speech Synthesis using Mel-Cepstral Coefficient Feature

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

VQ Source Models: Perceptual & Phase Issues

Enhancement of Speech in Noisy Conditions

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Audio Restoration Based on DSP Tools

Communications I (ELCN 306)

Speech and Music Discrimination based on Signal Modulation Spectrum.

Adaptive Speech Enhancement Using Partial Differential Equations and Back Propagation Neural Networks

Introduction of Audio and Music

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

The Munich 2011 CHiME Challenge Contribution: BLSTM-NMF Speech Enhancement and Recognition for Reverberated Multisource Environments

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Pitch Period of Speech Signals Preface, Determination and Transformation

Audio Imputation Using the Non-negative Hidden Markov Model

REAL-TIME BROADBAND NOISE REDUCTION

Speech Signal Enhancement Techniques

Applications of Music Processing

Speech Enhancement Based On Noise Reduction

Enhanced Waveform Interpolative Coding at 4 kbps

Audio processing methods on marine mammal vocalizations

Speech Enhancement Using a Mixture-Maximum Model

SOUND SOURCE RECOGNITION AND MODELING

CS 188: Artificial Intelligence Spring Speech in an Hour

A Spectral Conversion Approach to Single- Channel Speech Enhancement

Robustness (cont.); End-to-end systems

Robust Speech Feature Extraction using RSF/DRA and Burst Noise Skipping

RECENTLY, there has been an increasing interest in noisy

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Chapter 4 SPEECH ENHANCEMENT

Environmental Sound Recognition using MP-based Features

Modified Kalman Filter-based Approach in Comparison with Traditional Speech Enhancement Algorithms from Adverse Noisy Environments

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Available online at ScienceDirect. Procedia Computer Science 54 (2015 )

Modulation Spectrum Power-law Expansion for Robust Speech Recognition

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Introduction to Audio Watermarking Schemes

Speech Signal Analysis

Biosignal filtering and artifact rejection. Biosignal processing, S Autumn 2012

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

Voice Activity Detection

Noise Estimation based on Standard Deviation and Sigmoid Function Using a Posteriori Signal to Noise Ratio in Nonstationary Noisy Environments

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

POSSIBLY the most noticeable difference when performing

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Performance Analysiss of Speech Enhancement Algorithm for Robust Speech Recognition System

Power Normalized Cepstral Coefficient for Speaker Diarization and Acoustic Echo Cancellation

Auditory Based Feature Vectors for Speech Recognition Systems

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

Phase estimation in speech enhancement unimportant, important, or impossible?

Noise estimation and power spectrum analysis using different window techniques

The Jigsaw Continuous Sensing Engine for Mobile Phone Applications!

S PG Course in Radio Communications. Orthogonal Frequency Division Multiplexing Yu, Chia-Hao. Yu, Chia-Hao 7.2.

Enhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

EE482: Digital Signal Processing Applications

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

CONVOLUTIONAL NEURAL NETWORK FOR ROBUST PITCH DETERMINATION. Hong Su, Hui Zhang, Xueliang Zhang, Guanglai Gao

Rhythm Analysis in Music

Pitch Estimation of Singing Voice From Monaural Popular Music Recordings

IMPROVED SPEECH QUALITY FOR VMR - WB SPEECH CODING USING EFFICIENT NOISE ESTIMATION ALGORITHM

Robust telephone speech recognition based on channel compensation

Online Blind Channel Normalization Using BPF-Based Modulation Frequency Filtering

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

Advanced Signal Processing and Digital Noise Reduction

Evaluation of MFCC Estimation Techniques for Music Similarity Jensen, Jesper Højvang; Christensen, Mads Græsbøll; Murthi, Manohar; Jensen, Søren Holdt

Robust Low-Resource Sound Localization in Correlated Noise

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Recent Advances in Acoustic Signal Extraction and Dereverberation

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Discriminative Training for Automatic Speech Recognition

A STUDY ON CEPSTRAL SUB-BAND NORMALIZATION FOR ROBUST ASR

SIGNAL PROCESSING FOR ROBUST SPEECH RECOGNITION MOTIVATED BY AUDITORY PROCESSING CHANWOO KIM

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS

Transcription:

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department of Electrical Engineering University of California, Los Angeles Supported in part by the NSF

Outline Motivation Bird Call Analysis Bird Call Classifier Design Denoising Filter Design Experiments

Motivation of noise robust bird call classification Songs are important in the communication between birds of specific species. Behavioral and ecological studies could benefit from automatically detecting and identifying species from acoustic recordings. It is a challenge to correctly classify the bird calls under noisy conditions. In this work, we analyze 5 types of Antbirds. Now let us listen to several examples of Antbird calls:

Waveform and spectrogram of a Barred Antshrike (BAS) call

Waveform and spectrogram of a Dusky Antbird (DAB) call

Waveform and spectrogram of a Great Antshrike (GAS) call

Waveform and spectrogram of a Mexican Antthrush (MAT) call

Waveform and spectrogram of a Dot-winged Antwren (DWA) call

Antbird Call Properties A bird call consists of a sequence of chirps. The interval between chirps and the chirp intensity gradually decrease over time. 0.25 0.2 PMF of bird call duration BAS DAB GAS MAT DWA 0.15 PMF 0.1 0.05 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Call duration (sec) A histogram of bird call duration of 2246 samples from 5 bird species. The duration ranges from 0.5 to 5 seconds.

Automatic bird call classification involves several aspects: Waveform denoising: the focus of this paper Feature extraction: Mel-Frequency Cepstral Coefficients (MFCCs) Acoustic modelling: Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM) Learning model parameters from observations Decoding observations

Why denoising is needed? Different kinds of background noise can be observed in the recordings: Other bird chirps Insect sounds Sounds of other animals We propose a Correlation-Maximization based filter to suppress background noise existed in the bird calls.

Wiener Filter A prevailing denoising approach: Wiener filtering Clean X(f) is corrupted by an additive noise noisy Y(f). S NR(f): an estimation of SNR(f): S NR(f) = ˆX(f) 2 The estimated clean spectrum is : ˆX(f) 2 = H(f) Y(f) 2 = ˆN(f) 2 (1) S NR(f) 1 + S NR(f) Y(f) 2 (2) The noncausal Wiener filter converts the denoising problem into an SNR estimation problem [1].

Correlation-Maximization Filter Futher Analysis of the Bird Call Two Levels of Bird Call Periodicity 1 Short phonation period (Left): ranges from 0.2-1.0 ms 2 Interval between chirps (Right): ranges from 0.06-0.3 sec, slowly decreases with time. instruct the denoising!

Correlation-Maximization Filter Correlation-Maximization Filter Suppose an FIR filter with L taps: h = [h[1], h[2],, h[l]] T (3) is used for denoising the noisy bird call y[n]. The output of the filter is the estimated clean signal ˆx[n]: ˆx[n] = L h[k]y[n k] (4) k=1 y[n] and ˆx[n] is then segmented into frames.

Correlation-Maximization Filter Correlation-Maximization Filter (cont.) Two Assumptions 1 y[n] and ˆx[n] are wide sense stationary: The bird chirps are repeating periodically. 2 A single h for each bird call: The spectral distributions of different frames in a bird call are similar. The cross correlation function of ˆx[n] at lag k of frame m: φ mˆx [0, k] = ht Φ m y [0, k]h (5) h = [h[0], h[1],, h[l]] T : coefficients of the FIR filter. Φ m y [0, k]: cross correlation function of y[n] (independent of h)

Correlation-Maximization Filter Use Dynamic Programming (DP) to Search the Chirp Interval Searching the chirp interval in each frame over ˆx[n]. DP: minimizing the distortion induced by background noise Local cost at lag k of frame m: φ mˆx [0, k] Transition cost of from lag k i at to k j : d(k i, k j ) = e α k i δ k j 1 (6) Purpose: prevent chirp intervals from greatly varying in two consecutive frames. A trellis structure of K M for dynamic programming is built.

Correlation-Maximization Filter Correlation-Maximization Filter (cont.) The effect of an optimal filter h Removing the additive noise in the corrupted signal so that the minimum accumulative cost is achieved in chirp interval searching: h = arg min F(h, s) (7) h s: an valid path in the trellis: s = s 1, s 2,, s M, h : the optimal denoising filter. the accumulative cost F(h, s) = Ψ(h, s) + Θ(h, s). Ψ(h, s) : accumulative local cost; Θ(h, s) : accumulative transition cost.

Correlation-Maximization Filter Speed Up: From Brute Force to N-Best There are K M possible paths in a K M trellis. Suppose the average iteration times of the gradient search is Ī, this brute-force approach needs K M Ī iterations which is computationally unacceptable. We can assume that s is within a path subset denoted by S(h) in each iteration. The subset is composed of the top N-best paths from the dynamic programming using the trellis. That means the gradient descent search is only needed to be applied to the N-best paths, not all the paths at each iteration. Let J denotes the size of N-best search, the total gradient search iterations is reduced to J 2 Ī. Typically, for Antbird calls, K = 49, 1 M 50, J = 20.

Correlation-Maximization Filter The spectrograms of a GAS call before and after filtering (a) other non-target bird chirps: 0.6-1.6 seconds (b) both target and non-target bird chirps are enhanced after Wiener filtering (c) Correlation-Maximization filter suppressed the non-target chirps while enhancing the target chirps (d) non-target chirps and background noise are suppressed when cascading two filters

Correlation-Maximization Filter The frequency response of the CM filter for a GAS call enhanced the the target bird call; minimized the interference introduced by background noise and other bird. filter h s characteristic pass-band: 800-1750 Hz stop-band: 2600-8000 Hz dip: around 2800 Hz Magnitude (db) 10 5 0 5 10 15 G060605K030003.wav 20 25 30 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz)

Data Set Researchers from UCLA Ecology and Evolutionary Biology department collected 2 hours of bird calls (3366 calls) from 5 species. We split the corpus into a training and testing set with a ratio of 2:1. Table: 2.1 The number of bird calls in the training and test sets. BAS: Barred Antshrike; DAB: Dusky Antbird; GAS: Great Antshrike; MAT: Mexican Antthrush; DWA: Dot-winged Antwren. BAS DAB GAS MAT DWA Total Training 240 888 350 609 159 2246 Testing 120 444 175 304 77 1120 The training set has 85 minutes of recordings; the testing set is 42 minutes long.

Setting A band-pass filter with cutoff frequencies at 360 Hz and 6500 Hz is used to remove the irrelevant frequency components. Downsamped from 44.1 khz to 16 khz. The taps of the filter L = 20. The frame length N = 600ms = 9600samples. The dimensions of MFCC features is 39. GMM: 256 Gaussians; HMM: 6 states, 256 Gaussians / state.

Classification Results Analysis Table: 2.2 The classification error rate using the bird call test set. W+/CM+: feature extraction using the output of the Wiener/Correlation-Maximization based denoising filter GMM HMM MFCC 8.7% 5.4% W+MFCC 5.9% 4.9% CM+MFCC 5.3% 4.6% CM+W+MFCC 4.7% 4.1% HMM based classifier is better than the GMM classifier when using the same features. Correlation-Maximization based denoising filter is effective before extracting MFCC features. Cascading the CM filter and Wiener filter is most effective.

Conclusions and Future Work The Correlation-Maximization based denoising filter is effective in reducing the classification errors of the bird call which has a quasi-periodic structure in the time domain and an invariant power spectral density across frames. Future work Extract better features for classification, such as long-term features and the modulation frequency features; Detect the bird call in an audio stream. Use Dynamic Bayesian Network to represent the probabilistic relationships between the observed bird calls and the bird species.

Thank you! Q & A?

S. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 27, no. 2, pp. 113 120, 1979.