Audio data fuzzy fusion for source localization

Similar documents
Auditory System For a Mobile Robot

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

arxiv: v1 [cs.sd] 4 Dec 2018

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Advances in Direction-of-Arrival Estimation

Fuzzy cooking control based on sound pressure

Microphone Array Design and Beamforming

Comparative Analysis of Room Temperature Controller Using Fuzzy Logic & PID

A Simple Two-Microphone Array Devoted to Speech Enhancement and Source Tracking

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Speech Intelligibility Enhancement using Microphone Array via Intra-Vehicular Beamforming

Key-Words: - Fuzzy Behaviour Controls, Multiple Target Tracking, Obstacle Avoidance, Ultrasonic Range Finders

Meeting Corpora Hardware Overview & ASR Accuracies

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Implementing a Fuzzy Logic Control of a Shower

POSSIBLY the most noticeable difference when performing

High-speed Noise Cancellation with Microphone Array

Automotive three-microphone voice activity detector and noise-canceller

The psychoacoustics of reverberation

Voice Activity Detection

Recent Advances in Acoustic Signal Extraction and Dereverberation

Development of multichannel single-unit microphone using shotgun microphone array

Microphone Array project in MSR: approach and results

EFFECTS OF PHYSICAL CONFIGURATIONS ON ANC HEADPHONE PERFORMANCE

3rd International Conference on Machinery, Materials and Information Technology Applications (ICMMITA 2015)

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

Positioning Architectures in Wireless Networks

A Low-Power Broad-Bandwidth Noise Cancellation VLSI Circuit Design for In-Ear Headphones

Multichannel Robot Speech Recognition Database: MChRSR

1 Publishable summary

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Michael Brandstein Darren Ward (Eds.) Microphone Arrays. Signal Processing Techniques and Applications. With 149 Figures. Springer

Active Noise Cancellation System Using DSP Prosessor

DESIGNING POWER SYSTEM STABILIZER FOR MULTIMACHINE POWER SYSTEM USING NEURO-FUZZY ALGORITHM

Indoor Location Detection

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

RIR Estimation for Synthetic Data Acquisition


BIOLOGICALLY INSPIRED BINAURAL ANALOGUE SIGNAL PROCESSING

AUDITORY ILLUSIONS & LAB REPORT FORM

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

Speech & Audio Processing / Part-II. Digital Audio Signal Processing DASP. Marc Moonen

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

Revision 1.1 May Front End DSP Audio Technologies for In-Car Applications ROADMAP 2016

Microphone Array Power Ratio for Speech Quality Assessment in Noisy Reverberant Environments 1

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

FP6 IST

Robust Low-Resource Sound Localization in Correlated Noise

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

A Hybrid Indoor Tracking System for First Responders

Supporting Presbycusic Drivers in Detection and Localization of Emergency Vehicles: Alarm Sound Signal Processing Algorithms

1, 2, 3,

CHAPTER 6 NEURO-FUZZY CONTROL OF TWO-STAGE KY BOOST CONVERTER

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

Visvesvaraya Technological University, Belagavi

Abstract of PhD Thesis

Comparison of Adaptive Neuro-Fuzzy based PSS and SSSC Controllers for Enhancing Power System Oscillation Damping

Integrated Vision and Sound Localization

Development of the Mechatronics Design course

Low Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers

Speaker Localization in Noisy Environments Using Steered Response Voice Power

Hybrid Positioning through Extended Kalman Filter with Inertial Data Fusion

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Shweta Kumari, 2 Priyanka Jaiswal, 3 Dr. Manish Jain 1,2

DC Motor Speed Control: A Case between PID Controller and Fuzzy Logic Controller

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Self Localization Using A Modulated Acoustic Chirp

PASS-BY NOISE TESTS BY MEANS OF CIRA ACOUSTIC ANTENNAS SYSTEM

Smart antenna for doa using music and esprit

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Autonomous Vehicle Speaker Verification System

Emanuël A. P. Habets, Jacob Benesty, and Patrick A. Naylor. Presented by Amir Kiperwas

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

Real-time Adaptive Concepts in Acoustics

Sound Source Localization using HRTF database

CHAPTER 7 CONCLUSIONS AND FUTURE SCOPE

Speech Enhancement Based On Noise Reduction

IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH

All-Neural Multi-Channel Speech Enhancement

Mutual Coupling Estimation for GPS Antenna Arrays in the Presence of Multipath

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

SELECTIVE NOISE FILTERING OF SPEECH SIGNALS USING AN ADAPTIVE NEURO-FUZZY INFERENCE SYSTEM AS A FREQUENCY PRE-CLASSIFIER

Acoustic Beamforming for Speaker Diarization of Meetings

Speech enhancement with ad-hoc microphone array using single source activity

TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting

Recording and post-processing speech signals from magnetic resonance imaging experiments

The Phased Array Feed Receiver System : Linearity, Cross coupling and Image Rejection

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Sound Waves and Beats

AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION

FREQUENCY RESPONSE AND LATENCY OF MEMS MICROPHONES: THEORY AND PRACTICE

Design of Multi Lingual, Voice Signal Frequency Based Robotic Hand Control System

AGENT PLATFORM FOR ROBOT CONTROL IN REAL-TIME DYNAMIC ENVIRONMENTS. Nuno Sousa Eugénio Oliveira

ADAPTIVE ACTIVE NOISE CONTROL SYSTEM FOR SECONDARY PATH FLUCTUATION PROBLEM

Transcription:

International Neural Network Society 13-16 September, 2013, Halkidiki, Greece Audio data fuzzy fusion for source localization M. Malcangi Università degli Studi di Milano Department of Computer Science DSP&RTS Research Laboratory Milan - Italy malcangi@di.unimi.it

@ a glance This work addresses the problem of audio source localization in multiple speakers indoor scenarios. Three different direction of arrival (DOA) algorithms are applied to measure the angular position of the primary audio source with respect to a reference microphone. A fuzzy logic-based method is applied to fuse the crisp measurements.

Sound Source Localization What The sound is the most important information media in the interaction between human beings and the physical world. Among the several information that the sound embeds, the spatial location of the sources is of primary importance for executing auditory tasks such as: attention-less monitoring of the surrounding environment the source localization the beamforming

Sound Source Localization Why Audio-based tasks of human beings, such as speech recognition, speaker tracking, and speaker identification use sound source location and data fusion strategies to successfully interact and communicate with other human beings. To enable the interaction of a human being with the machine in a natural way, a similar functionality needs to be developed in terms of sound source localization capabilities and data fusion strategy. Many audio-based applications, such as the automatic speech recognition (ASR) and audio/visual ASR (AV/ASR), the automatic speaker identification (ASI), the audio noise cancellation (ANC), and the indoor audio navigation (IAN), can be improved using SSL.

Sound Source Localization How it works Human beings are able to recognize which sound is which, to locate where each sound is coming from and to recognize which sounds can be ignored, beamforming towards the sound source target

Sound Source Localization How it works The sound source localization (SSL) is a signal processing task based on various signal processing methods for time delay estimation applied to signals received by an array of microphones. Using proper microphone geometry, 2D and 3D sound source location can be detected and the related measurements applied to recognize and isolate the sound source. Beamforming can be executed on the sound source, either by moving the primary microphone in the direction of the sound source, or executing application specific signal processing algorithms on the captured sound.

Sound Source Localization What s new The sound source localization (SSL) proposed system fuzzy fuses the sound time delay measurements executed by a set of three algorithms processing at the same time the sound signal captured by a couple of microphones. The purpose is to implement a basic SSL capability useful to design a front-end between an array of microphones and the sound processing application.

SSL System framework The system consists of two layers: the lower layer implements three subsystems, each one implementing a sound time delay measurement. The upper layer implements a fuzzy logic-based inferential engine tuned to fuse the decision of the three sound delay measurements. A two level audio source localization system approach is robust and reliable because each module operates independently from the other, and the fuzzy logic inferential engine has the capability to evaluate qualitatively the performance of each of the DOA measurement subsystems.

SSL System How it works Among several time delay measurement algorithms, three of them have been selected as independent estimator methods: cross-correlation (CC) phase transform (PHAT) maximum likelihood (ML) Cross correlation, phase transform, and maximum likelihood functions combine the signal captured from a pair of microphones

Fuzzy logic How it fuses The three methods perform well in low noise and low reverberant rooms, highlighting a unambiguous position of the maximum, but if noise increases and reflections occur, then each method decrease its performance. To improve the performance of the delay estimation, the three algorithms run independently each other, measuring the time delay of a sound frame while the upper layer executes the fusion of the time delay measured for the previous sound frame. Membership functions to fuzzify the measurements and defuzzify the decision

Fuzzy logic How it fuses A set of rules has been compiled to fuse the information at decision level. The decision rules look like this: IF THEN CCpeak1 IS High AND PHATpeak1 IS High AND MLpeak1 IS High AND CCpeak12 IS Medium AND PATHpeak12 IS High AND MLpeak12 IS Medium Delay is PATHdelay Only two parameters have been used in the rule set, peak1 and peak12. Peak1 is the primary prominent peak amplitude. Peak12 is the secondary prominent peak amplitude, closer to the primary. Basic ruleset consists of 9 rules, but more rules are required when more secondary sound sources are close to the primary sound source. In this first release of the system, the rules are hand tuned at compile time, but an adaptive tuning process will be implemented to take in count the evolving nature of sound scenarios.

Fuzzy logic How it fuses The fuzzy decision is then defuzzified by a singleton membership function that produces as output the crisp measurement of the estimated delay. The center of gravity (CoG) defuzzification method has been applied in its weighted average (WA) implementation for singleton membership functions: (fuzzy_out ) Crisp_out = (singleton _ fuzzy_out position )

SSL System How it performs A set of tests has been executed in a simulated context using pure tone sound sources and short uttered frames. The simulation has been executed in Matlab environment, with an STMicroelectronics 8 MEMS microphonic array protoboard connected to Audacity acquisition and editing IDE for simultaneous 8 channel data acquisition, and a loudspeaker as sound source.

SSL System How it performs The pure tone has been played at three different frequencies (500, 1000, 2000 Hz) and positioned at three different angles (0, 15, 30 degree). The same test has been executed for a short utterance. The two tests have been executed with and without noise. The following success rate resulted: CC CCwithPHAT&ML FuzzyFusion Tone test noiseless 100% 100% 100% Speech test noiseless 95% 97% 97% Tone test noisy 85% 88% 93% Speech test noisy 75% 77% 91% The tests have confirmed that CC supported by PATH and ML works well than CC alone and that fuzzy fusion improves the performance of DOA measurements, mainly in noisy contexts.

SSL System How it is made DAQ board MEMS Microphone array

SSL System How it applies

Conclusions The sound source localization is a very complex task which, at its core, involves the interaction among multiple microphones. Smart data fusion based on a fuzzy logic solution is an effective approach to manage and fuse the gathered data, mainly because a multiple level data fusion can be implemented.

Conclusions (cont.) To improve the reliability of the system, future developments will concern the reengineering of the fuzzy logic-based data fusion, splitting the fuzzy data fusion engine in a multi-level hierarchy, the lower level for feature fusion and the upper level for decision fusion. Each pair of microphones will work like a submodule to compose a specific geometry to match a specific application. To this purpose, a third fuzzy fusion level layer will be developed to fuse the DOA decisions of the microphones pairs.

Thank you for your attention (any question?) Mario Malcangi Università degli Studi di Milano Department of Computer Science Via Comelico 39 20135 Milano - Italy DSP&RTS Research Laboratory (Digital Signal Processing & Real-Time Systems) Please, address any further question to: malcangi@di.unimi.it