Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Similar documents
ONE of the most common and robust beamforming algorithms

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Time-of-arrival estimation for blind beamforming

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY

Sound source localisation in a robot

Lab S-3: Beamforming with Phasors. N r k. is the time shift applied to r k

Approaches for Angle of Arrival Estimation. Wenguang Mao

DIRECTION OF ARRIVAL ESTIMATION IN WIRELESS MOBILE COMMUNICATIONS USING MINIMUM VERIANCE DISTORSIONLESS RESPONSE

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

STAP approach for DOA estimation using microphone arrays

REALISTIC ANTENNA ELEMENTS AND DIFFERENT ARRAY TOPOLOGIES IN THE DOWNLINK OF UMTS-FDD NETWORKS

A robust dual-microphone speech source localization algorithm for reverberant environments

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Adaptive Systems Homework Assignment 3

On the Plane Wave Assumption in Indoor Channel Modelling

Robust direction of arrival estimation

Nonlinear postprocessing for blind speech separation

Robust Low-Resource Sound Localization in Correlated Noise

BORIS KASHENTSEV ESTIMATION OF DOMINANT SOUND SOURCE WITH THREE MICROPHONE ARRAY. Master of Science thesis

Advances in Direction-of-Arrival Estimation

A FAST CUMULATIVE STEERED RESPONSE POWER FOR MULTIPLE SPEAKER DETECTION AND LOCALIZATION. Youssef Oualil, Friedrich Faubel, Dietrich Klakow

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Bluetooth Angle Estimation for Real-Time Locationing

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

Speech Enhancement Using Microphone Arrays

Speaker Localization in Noisy Environments Using Steered Response Voice Power

BEAMFORMING WITHIN THE MODAL SOUND FIELD OF A VEHICLE INTERIOR

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

Cost Function for Sound Source Localization with Arbitrary Microphone Arrays

Indoor Sound Localization

arxiv: v1 [cs.sd] 4 Dec 2018

Digital Audio Signal Processing DASP. Lecture-3: Noise Reduction-II. Fixed Beamforming. Marc Moonen

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Contents. List of Figures 4. List of Tables 6

one-dimensional (1-D) arrays or linear arrays; two-dimensional (2-D) arrays or planar arrays; three-dimensional (3-D) arrays or volumetric arrays.

ADAPTIVE ANTENNAS. TYPES OF BEAMFORMING

DIRECTION of arrival (DOA) estimation of audio sources. Real-Time Multiple Sound Source Localization and Counting using a Circular Microphone Array

Adaptive Beamforming Applied for Signals Estimated with MUSIC Algorithm

Smart antenna technology

STATISTICAL DISTRIBUTION OF INCIDENT WAVES TO MOBILE ANTENNA IN MICROCELLULAR ENVIRONMENT AT 2.15 GHz

arxiv: v1 [cs.sd] 17 Dec 2018

Auditory System For a Mobile Robot

Sound Source Localization in a Security System using a Microphone Array

In air acoustic vector sensors for capturing and processing of speech signals

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Modeling Mutual Coupling and OFDM System with Computational Electromagnetics

Consideration of Sectors for Direction of Arrival Estimation with Circular Arrays

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Ultrasound Beamforming and Image Formation. Jeremy J. Dahl

Applying the Filtered Back-Projection Method to Extract Signal at Specific Position

Automotive three-microphone voice activity detector and noise-canceller

Antennas and Propagation. Chapter 6b: Path Models Rayleigh, Rician Fading, MIMO

Meeting Corpora Hardware Overview & ASR Accuracies

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Microphone Array project in MSR: approach and results

SOUND SOURCE LOCATION METHOD

ROBUST SUPERDIRECTIVE BEAMFORMER WITH OPTIMAL REGULARIZATION

A COMPREHENSIVE PERFORMANCE STUDY OF CIRCULAR AND HEXAGONAL ARRAY GEOMETRIES IN THE LMS ALGORITHM FOR SMART ANTENNA APPLICATIONS

AVAL: Audio-Visual Active Locator ECE-492/3 Senior Design Project Spring 2014

Index Terms Uniform Linear Array (ULA), Direction of Arrival (DOA), Multiple User Signal Classification (MUSIC), Least Mean Square (LMS).

Microphone Array Feedback Suppression. for Indoor Room Acoustics

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Recent Advances in Acoustic Signal Extraction and Dereverberation

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

Evaluating Real-time Audio Localization Algorithms for Artificial Audition in Robotics

FEASIBILITY STUDY ON FULL-DUPLEX WIRELESS MILLIMETER-WAVE SYSTEMS. University of California, Irvine, CA Samsung Research America, Dallas, TX

Sound Source Localization using HRTF database

Performance Analysis of MUSIC and LMS Algorithms for Smart Antenna Systems

Channel Modelling for Beamforming in Cellular Systems

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

Distant Speech Recognition Using Multiple Microphones in Noisy and Reverberant Environments

DISPLACED SENSOR ARRAY FOR IMPROVED SIGNAL DETECTION UNDER GRAZING INCIDENCE CONDITIONS

Digital Beamforming Using Quadrature Modulation Algorithm

Acoustic Source Tracking in Reverberant Environment Using Regional Steered Response Power Measurement

Speech Intelligibility Enhancement using Microphone Array via Intra-Vehicular Beamforming

Radio channel measurement based evaluation method of mobile terminal diversity antennas

Painting with Music. Weijian Zhou

Performance of 2-D DOA Estimation for Stratospheric Platforms Communications

Audio data fuzzy fusion for source localization

MICROPHONE ARRAY MEASUREMENTS ON AEROACOUSTIC SOURCES

MASTER OF SCIENCE THESIS

ADAPTIVE ANTENNAS. NARROW BAND AND WIDE BAND BEAMFORMING

MEASURING DIRECTIVITIES OF NATURAL SOUND SOURCES WITH A SPHERICAL MICROPHONE ARRAY

AD-HOC acoustic sensor networks composed of randomly

Array-based Spectro-temporal Masking for Automatic Speech Recognition

K.NARSING RAO(08R31A0425) DEPT OF ELECTRONICS & COMMUNICATION ENGINEERING (NOVH).

Estimating Discrete Power Angular Spectra in Multiprobe OTA Setups

Antennas and Propagation. Chapter 5c: Array Signal Processing and Parametric Estimation Techniques

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

FDM based MIMO Spatio-Temporal Channel Sounder

Source Localisation Mapping using Weighted Interaural Cross-Correlation

Spatialized teleconferencing: recording and 'Squeezed' rendering of multiple distributed sites

Advances in Radio Science

Multiple sound source localization using gammatone auditory filtering and direct sound componence detection

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Dispersion and Ultrashort Pulses II

Transcription:

Joint Position-Pitch Decomposition for Multi-Speaker Tracking SPSC Laboratory, TU Graz 1

Contents: 1. Microphone Arrays SPSC circular array Beamforming 2. Source Localization Direction of Arrival (DoA) GCC and SRP Performance tests 3. Joint position-pitch estimation (PoPi) PoPi-decomposition Modifications Performance tests 4. Conclusions and future work 2

1. Microphone Arrays Definition: Arrangement of multiple spatially separated microphones Different designs Linear (1D) Planar (2D) Volumetric(3D) 3

SPSC circular microphone array: Planar design Circular arrangement Diameter = 0.4m 16 channels Omni-directional electret microphones Angular offset = 22.5 4

Recording setup: Preamplifiers Behringer ADA 8000 A/D- Converter RME-Fireface Apple MacBook Pro PD-recording patch 5

Near-field and Far-field Near-field: Source Array distance is comparable to array dimesions Wavefront curvature is not neglectable Source distance can be estimated Far-field: Source Array distance is much bigger than array dimensions Planar wavefronts can be assumed Source distance can not be estimated 6

Beamforming Summing signals More sensitivity for signals arriving at the same time Focus beam on 90 direction Beam-width depends on number of microphones 7

Delay and Sum Beamforming Signals are individually delayed Steering-delays correspond to focus direction Signals impinging on the array from steering direction add up constructively because of their phase alignment 8

Spatial Aliasing Similar to temporal aliasing Microphone distance d must be smaller than half minimum wavelength to avoid spatial aliasing λmin c f = d alias 2 2d cosθ Frequency above which spatial aliasing occurs depends on microphone distance and angle Different effect on array designs Linear array (only one angle) Circular array (different angles) 9

Linear (ULA) vs. Circular (UCA) Linear array 16 microphones d=0.4m (total length= 6m) Circular array (SPSC) 16 microphones diameter=0.4m 10

1. Source Localization Direction of arrival (DoA) Direction from which a planar wavefront is impinging on the array Vector from array origin to source position Defined by azimuth and elevation ζ s = o cos φsin θ cos φ cosθ sin φ 11

Localization Strategies: Time-Delay-Estimation (TDE) Cross correlation of microphone pairs leads to time difference of arrival (TDoA) TDoA and microphone positions lead to DoA Steered-beamforming Beamformer is steered over a specific range Output power of beamformer reaches a maximum if focusing on the source direction 12

Time delay estimation using GCC Generalized Cross correlation R12 τ = 1 Ψ 12 ω X 1 ω X 2 ω e jωt dω 2π Phase transform Division of Cross Power Spectrum by its magnitude Ψ PHAT 12 ω = 1 X 1 ω X 2 ω 13

GCC-Phat (1) Precise TDoA estimation DoA relevant range: -51 till +51 samples (0.4m, fs= 44100Hz) Maximum can be easily located θ DoA estimation TDoA leads to DoA angle and are θ 360 θ stored for every microphone pair More GCC-maximum peaks are stored for multispeaker scenario c θ=arccos τ d 14

GCC-Phat (2) Shifting of DoA estimations According to angular offset of pairs m-1 * 22.5 (m microphone pair) Total number of estimations: 2*M (M number of pairs) Histogram Leads to final DoA estimation 15

Steered Beamforming Delay&Sum Beamformer Focusing on every direction Output power is computed Steered response power (SRP) Output power reaches maximum if focused in source direction Problems in two speaker scenario 16

SRP-Phat (1) Defined by Hector DiBiase in 2000 M M 1 jω Δ X k ω X l ω e dω k=1 l= 1 X k ω X l ω P Δ1... Δ M = lk Sum of multiple shifted GCC-Phat functions Shifting according to focus direction Predefined steering delays look up table (LUT) Δm = ζ o d m c Steering in azimuth and elevation directions LUT is defined for every direction in spherical half space 17

SRP-Phat (2) SRP can be used to locate multiple sources DoA estimation in spherical half space is possible 18

GCC-Phat vs. SRP-Phat Localization performance in the presence of a disturbing noise source 60 segments Segment-length = 2048 samples Multi-speaker scenario 20 segments Segment-length = 2048 samples 19

3. Joint position-pitch estimation (PoPi) PoPi decomposition Reindexing of Cross-correlation P K 1 ρt θ s,f 0 = R t,i k L f 0 +O θ is 2K 1 i=1 k= K Position and pitch values defined in LUTs d cos θ f s O θ = c L f 0 = f s f0 0 360, 80 280Hz 20

The PoPi-Plane: ρt θ,f 0 The matrix Female speaker at 45 : Undesired Gaussian at half Pitch Solution: additional decomposition term is visualized in a 2D-plane 1 ρt θ s,f 0 = 2K 1 P K i=1 k= K β Rt,i R t,i k L f 0 +O θ is 2k 1 i L f 0 +O θ s 2 21

Two speaker problem Analogous to SRP SRP for two speakers (90, 270 ): DoA estimation fails PoPi-Plane for two speakers (90 = female ; 270 = male) PoPi estimation fails 22

PoPi-Phat Joining GCC and GCC-Phat Phat kills periodicity (pitch information) DoA relevant sample range (-60 till +60 samples) replaced in GCC function DoA precision is improved Pitch problem not solved 23

PoPi-filter (1) Prefiltering the microphone signals Inspired by the auditory model (multi-pitch detection) Gammatone filterbank (17 Bandpass-filters) Normalized Cross-correlation of filtered Signals Summing the cross-correlations 24

PoPi-filter (2) Filtered correlations: Every filtered GCC makes a different contribution Low-frequency channels include pitch information High-frequency channels lead to precise DoA estimations 25

PoPi-filter (3) Summary correlation: Includes pitch and position information for both sources PoPi plane: Shows two dominant Gaussians at correct pitch and position values 26

Performance of PoPi methods (1) Presence of a disturbing noise source: 60 segments (2048 samples) Percentage of correct estimations PoPi performs better than SRP-Phat and Cepstrum for high noise levels PoPi favors speech sources and suppresses noise sources PoPi-filter outperforms the other methods 27

Performance of PoPi methods (2) Presence of a disturbing noise source (joint pitch and position): 60 segments (2048 samples) Percentage of correct estimations Pitch and position values must be correct PoPi filter performs best if combined DoA and pitch information is desired 28

Two speaker scenario (1) IBK-Studio (T60=0.13s) 15 segments (2048 samples) Male speaker (337 ) vowel o Female speaker (22 ) e Position and Pitch estimation of the three PoPi methods PoPi-filter completely outperforms the other methods 29

Two speaker scenario (2) Seminar room (T60=0.5s) 15 segments (2048 samples) Male speaker (45 ) vowel o Female speaker (337 ) e All methods suffer strongly under reverberation PoPi-filter appears as the only Method to give practicable results 30

Moving source (1) IBK-Studio (T60=0.13s) Male speaker moving around the array Pronouncing vowel a All three methods give practicable results PoPi-filter gives stable pitch estimation 31

Moving source (2) Seminar room (T60=0.5s) Male speaker moving around the array Pronouncing vowel a PoPi and PoPi-Phat fail in the presence of reverberation PoPi-filter completely outperforms the other methods and gives accurate DoA and pitch estimations 32

Conclusion and future work Source Localization State-of the art algorithms implemented for SPSC-array SRP-Phat outperforms GCC-Phat SRP-Phat: multi-speaker and elevation estimation possible PoPi estimation PoPi method is less sensitive to noise sources Original PoPi does not perform suitable for multiple speakers PoPi-Phat: DoA estimation more robust and precise PoPi-filter: multi-speaker PoPi estimation possible Future work Combining with a VAD and an advanced tracking algorithm Reducing computational effort of PoPi decomposition Real-time implementation 33

Thank you for your attention! 34