Implementation of Speaker Identification Using Speaker Localization for Conference System

Similar documents
Robust Low-Resource Sound Localization in Correlated Noise

Localization of underwater moving sound source based on time delay estimation using hydrophone array

Sound Source Localization using HRTF database

Sound Source Localization in Median Plane using Artificial Ear

An Ultrasonic Sensor Based Low-Power Acoustic Modem for Underwater Communication in Underwater Wireless Sensor Networks

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Speaker Localization in Noisy Environments Using Steered Response Voice Power

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Automotive three-microphone voice activity detector and noise-canceller

Sensors, Signals and Noise

Integrated Vision and Sound Localization

SOUND SOURCE LOCATION METHOD

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Automatic Licenses Plate Recognition System

EXPERIMENTAL EVALUATION OF MODIFIED PHASE TRANSFORM FOR SOUND SOURCE DETECTION

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

Underwater Wideband Source Localization Using the Interference Pattern Matching

Omnidirectional Sound Source Tracking Based on Sequential Updating Histogram

Sound Processing Technologies for Realistic Sensations in Teleworking

LONG RANGE SOUND SOURCE LOCALIZATION EXPERIMENTS

Speech/Music Change Point Detection using Sonogram and AANN

Classification of Voltage Sag Using Multi-resolution Analysis and Support Vector Machine

WIND SPEED ESTIMATION AND WIND-INDUCED NOISE REDUCTION USING A 2-CHANNEL SMALL MICROPHONE ARRAY

AN EXPANDED-HAAR WAVELET TRANSFORM AND MORPHOLOGICAL DEAL BASED APPROACH FOR VEHICLE LICENSE PLATE LOCALIZATION IN INDIAN CONDITIONS

On methods to improve time delay estimation for underwater acoustic source localization

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

A New Social Emotion Estimating Method by Measuring Micro-movement of Human Bust

Epoch Extraction From Emotional Speech

Anti-shaking Algorithm for the Mobile Phone Camera in Dim Light Conditions

Eyes n Ears: A System for Attentive Teleconferencing

Robust Speaker Identification for Meetings: UPC CLEAR 07 Meeting Room Evaluation System

ACOUSTIC SOURCE LOCALIZATION IN HOME ENVIRONMENTS - THE EFFECT OF MICROPHONE ARRAY GEOMETRY

Joint Position-Pitch Decomposition for Multi-Speaker Tracking

Dynamics and Periodicity Based Multirate Fast Transient-Sound Detection

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

A MICROPHONE ARRAY INTERFACE FOR REAL-TIME INTERACTIVE MUSIC PERFORMANCE

PERFORMANCE COMPARISON BETWEEN GENERALIZED CROSS- CORRELATION TIME DELAY ESTIMATION AND FINGERPRINTING METHOD FOR ACOUSTIC EVENT LOCALIZATION

Multiple Sound Sources Localization Using Energetic Analysis Method

Distance Estimation with a Two or Three Aperture SLR Digital Camera

Advanced delay-and-sum beamformer with deep neural network

Real Time Vehicle License Plate Recognition Based on 2D Haar Discrete Wavelet Transform

Useful Information Master Copy

ENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE

SPECTRAL COMBINING FOR MICROPHONE DIVERSITY SYSTEMS

Mel Spectrum Analysis of Speech Recognition using Single Microphone

4: EXPERIMENTS WITH SOUND PULSES

Position Control of a Hydraulic Servo System using PID Control

A Novel Crack Location Method Based on the Reflection Coefficients of Guided Waves

Denoising Of Speech Signal By Classification Into Voiced, Unvoiced And Silence Region

Sound Source Localization in Reverberant Environment using Visual information

Indoor Location Detection

Estimation of Absolute Positioning of mobile robot using U-SAT

Training of EEG Signal Intensification for BCI System. Haesung Jeong*, Hyungi Jeong*, Kong Borasy*, Kyu-Sung Kim***, Sangmin Lee**, Jangwoo Kwon*

Speech Synthesis using Mel-Cepstral Coefficient Feature

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

ISSN Vol.02,Issue.17, November-2013, Pages:

A Simple Two-Microphone Array Devoted to Speech Enhancement and Source Tracking

Separation and Recognition of multiple sound source using Pulsed Neuron Model

Proposed Method for Off-line Signature Recognition and Verification using Neural Network

Real time noise-speech discrimination in time domain for speech recognition application

High-speed Noise Cancellation with Microphone Array

Auditory System For a Mobile Robot

REAL-TIME SRP-PHAT SOURCE LOCATION IMPLEMENTATIONS ON A LARGE-APERTURE MICROPHONE ARRAY

Speed Control of Induction Motor using Space Vector Modulation

Impact of Interference Model on Capacity in CDMA Cellular Networks

Simultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array

Different Approaches of Spectral Subtraction Method for Speech Enhancement

A Study on the Characteristics of a Temperature Sensor with an Improved Ring Oscillator

Development of excavator training simulator using leap motion controller

A Study on Developing Image Processing for Smart Traffic Supporting System Based on AR

IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

HANDSFREE VOICE INTERFACE FOR HOME NETWORK SERVICE USING A MICROPHONE ARRAY NETWORK

Matlab Based Vehicle Number Plate Recognition

Spatially Varying Color Correction Matrices for Reduced Noise

A Study on Single Camera Based ANPR System for Improvement of Vehicle Number Plate Recognition on Multi-lane Roads

A 12-bit 100kS/s SAR ADC for Biomedical Applications. Sung-Chan Rho 1 and Shin-Il Lim 2. Seoul, Korea. Abstract

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness

Design and Testing of DWT based Image Fusion System using MATLAB Simulink

Adaptive Transmission Scheme for Vehicle Communication System

Research Article Localization of Directional Sound Sources Supported by A Priori Information of the Acoustic Environment

Robust Control Design for Rotary Inverted Pendulum Balance

Autonomous Stair Climbing Algorithm for a Small Four-Tracked Robot

Audio Fingerprinting using Fractional Fourier Transform

MARQUETTE UNIVERSITY

Non-Contact Gesture Recognition Using the Electric Field Disturbance for Smart Device Application

Estimation of Debonded Area in Bearing Babbitt Metal by C-Scan Method

arxiv: v1 [cs.sd] 4 Dec 2018

Acoustic echo cancellers for mobile devices

Investigation of Noise Spectrum Characteristics for an Evaluation of Railway Noise Barriers

Reducing comb filtering on different musical instruments using time delay estimation

DesignCon Design of a Low-Power Differential Repeater Using Low Voltage and Charge Recycling. Brock J. LaMeres, University of Colorado

Performance Analysis of Windowing Techniques in Automatic Speech Signal Segmentation

Proceedings of Meetings on Acoustics

Speech Recognition using FIR Wiener Filter

Introduction to cochlear implants Philipos C. Loizou Figure Captions

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

EE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.

LOCALIZATION AND IDENTIFICATION OF PERSONS AND AMBIENT NOISE SOURCES VIA ACOUSTIC SCENE ANALYSIS

Transcription:

Proceedings of the 2 nd World Congress on Electrical Engineering and Computer Systems and Science (EECSS'16) Budapest, Hungary August 16 17, 2016 Paper No. MHCI 110 DOI: 10.11159/mhci16.110 Implementation of Speaker Identification Using Speaker Localization for Conference System Sung-Woo Byun, Seok-Pil Lee Dept. Computer Science, SangMyung University 20, Hongjimun 2-gil, Jongno-gu, Seoul, Korea 123234566@naver.com, esprit@smu.ac.kr Abstract - As signal processing and computing environment has developed, researches on speaker analysis technologies have been increasing. A speaker localization has become an active area of research with widespread applications in many speaker analysis fields. Many researches on a speaker localization have focused on steering camera and tracking active speakers. We also focus on tracking active speakers precisely. In this paper, we estimate 3-dimensional coordinates of the speaker using a time delay estimation and implement speaker identification for the conference system. For this, the 3-microphones array is used. To evaluate the performance of the proposed system, precision rate and recall rate are used. Keywords: Time Delay Estimation, TDOA, Speaker Localization, Speaker Identification, Conference System 1. Introduction As signal processing and computing environment has developed, researches on speaker analysis technologies, like speaker recognition, speaker emotion analysis and speaker localization, etc. have been increasing. Speaker localization is defined as the determination of the coordinate of speaker in 3-dimensional spaces. Speaker localization has become an active area of research with widespread applications in many speaker analysis fields like an automatic steering, zooming cameras and a gesture recognition during the video teleconferencing. The accuracy of information of the speaker. placement is also useful for various applications and other multimodal service [1]. In speaker localization, the location is estimated using time delay estimation (TDE) according to the difference in position of microphones. There are two ways the conventional strategies of time delay estimation. One is cross spectral function based and the other is generalized cross correlation (GCC) function [2]. For this research, we estimate the time difference of arrival (TDOA) using generalized cross correlation function. Previous researches on the conference system using speaker localization have focused on steering camera and tracking active speakers [3][4][5]. We also focus on tracking active speakers precisely. In this paper, we estimate 3-dimensional coordinates of the speaker using a time delay estimation and implement speaker identification for the conference system. To evaluate the performance of the proposed system, precision rate and recall rate are used. The experimental result shows the performance of our system is very promising. The rest of this paper is organized as follows. Section 2 explains the speaker localization. Section 3 shows the experiment configuration of our conference system based on Speaker identification and the result. Section 4 presents summaries and conclusions. 2. Speaker Localization 2.1. TDOA The TDOA is defined as the time interval between each microphone from a speaker. For this research, we use 3 microphones to estimate the time difference of arrival to each microphone. First, in order to calculate the time interval, we compute the cross correlation function of the two signals. The lag at which the cross-correlation function has its maximum is taken as the time delay between the two signals [2]. The distance between the set of microphones is estimated by multiplying the time interval by the speed of sound in the acoustical medium (air, 330m/s). MHCI 110-1

T delay = argmax(ρ s1s2 (τ)) (1) Following the equation (1), ρ s1s2 (τ) is the cross-correlation function of the two signals. 2.2. 3-Dimensional coordinate estimation Fig. 1: The example of 3-dimensional coordinate estimation. When the source is occurred at red point (x, y, z), the equation can be expressed as, Solve for (x, y, z) in the following equation, (x d) 2 + y 2 + z 2 = α 2 (2) x 2 + (y d) 2 + z 2 = β 2 (3) x 2 + y 2 + (z d) 2 = γ 2 (4) x = β2 α 2 y = γ2 β 2 z = α2 γ 2 + y (5) + z (6) + x (7) And then equations (2), (3), (4) are substituted by equation (5), (6), (7) as following: 3x 2 + 2(i d k)x + d 2 + k 2 + i 2 α 2 = 0 (8) 3y 2 + 2(k d j)y + k 2 + d 2 + j 2 β 2 = 0 (9) 3z 2 + 2(j i d)z + i 2 + j 2 + d 2 γ 2 = 0 (10) From this, k is (β 2 α 2 )/.i is (α 2 γ 2 )/ and j is (γ 2 β 2 )/. Therefore, the 3-dimensional coordinate of source can be estimated by solving the quadratic equation from x, y, z. MHCI 110-2

2.3. Speaker localization experiment Fig. 2: The experimental environment. We performed an experiment of speaker localization while increasing the distance between the speaker and the microphone. The speaker was randomly placed. The distance between microphones and the origin were set to d = 30cm, and the distance of 1 cm was set to 1 point at 3-dimensional space. The experiment was done 10 times for each case to minimize the human error. The experimental environment was done in the figure 2 diagram. Table 1: The result of speaker localization experiment. Mean for absolute error of the position x y z 100Cm 5.91 5.6 2.05 150Cm 6.59 7.02 2.6 200Cm 9.26 9.45 3.8 Table.1 shows the mean for absolute error of the position. According to this result, in the case of 100Cm, the mean for absolute error is about 4.52. As for the distance increasing, the mean for absolute error also increases. 3. Speaker Identification for the Conference System 3.1. Experiment configuration For the experiment configuration, we performed an experiment in a regular room with a size of 500cm in width, 630cm in length and 250cm in height. The speakers were placed about 150cm apart from the 3-microphones array, and the experiment was performed 3 times. Within each experiment, we placed 2, 4, and 6 speakers randomly. Each experiment was performed for 10 minutes, and each speaker spoke in free debate for 30 seconds at once. After the experiments were finished, we extracted the voice segments, and mapped it with the correspondence according to the speaker localization. The signals are recorded at sampling rate of 16 khz and 16 bit resolution. 3.2. Evaluation Criteria In this research, we used the precision rate and the recall rate which are commonly used for basic measures to evaluate the experiment. The precision rate is the ratio of the number of correctly labelled segments to the total number of extracted segments. It is shown in (11) Precision rate = T R R (11) The recall rate is the ratio of the number of correctly labelled segments to the total number of correct segments. It is shown in (12) MHCI 110-3

Recall rate = T R T (12) From this, R is the total number of extracted segments. T is the total number of correct segments. 3.3. Results Fig. 3: the result to map the voice segments onto each speaker. As a result, figure 3 shows the result to map the voice segments onto each speaker during the 10 minute conversation. The location of the voice segments and the speakers are marked in 3-dimensional space. Table 2: The result of the conference system experiment. Number of speaker Average precision rate Average recall rate 2 100% 100% 4 90.5% 91% 6 86.5% 86.7% Table 2 shows the result of evaluating performances. In this result, the average of precision rate and recall rate was at maximum 100% for the case which contained 2 speakers, and it has decreased while increasing the number of speakers. In the case of 4 speakers, the average precision rate was 90.5% and the average recall rate was 91%. For the 6 speakers, the average precision rate was 86.5% and the average recall rate was 86.7%. In a previous research, Rafal Samborski performed the conference system based on 2-dimensional information such as the phase feature. He tested on five male by setting them randomly around the table during the 28 minutes [3]. We compared Rafal s research method with our 3-dimensional coordinate estimation method. Table 3: a comparison of 3-dimensional coordinate estimation with 2-dimensional information method. 3-dimensional coordinate estimation 2-dimensional information method[3] Accuracy 86.2% 80% Precision rate 87.7% 43% Recall rate 86.1% 50% As shown on table 3, Accuracy, precision rate and recall rate of 3-dimensional coordinate estimation were better than 2-dimensional information method. 4. Conclusion Speaker localization has become an active area of research with widespread applications in many speaker analysis fields. Previous researches on the conference system using speaker localization have focused on steering camera and tracking active speakers. We also focus on tracking active speakers precisely. In this paper, we estimate 3-dimensional coordinates of the speaker using a time delay estimation and implement speaker identification for the conference system. For this, the experiment is done 3 times with the 3-microphones array. To evaluate the performance of the proposed system, precision MHCI 110-4

rate and recall rate are used. The experimental result shows the performance of our system is very promising. Additional tests for estimating 3-dimensional coordinates in noisy environment has been left for future works. Acknowledgements "This research was supported by the MSIP(Ministry of Science, ICT and Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2016-R0992-16-1014) supervised by the IIT(( Ittt itut frr IIff rr mtt i&&&&mmmmmiii aati hhhhlll ggymmmmmi))) References [1] M. Hesam and H. Marvi, IImrr ovemttt ff eett or aaee mll tipl ppaakrr loaaliztt i i mmrrt rmmm in Signal Processing (ICSP), 2010 IEEE 10th International Conference on, 2010. [2] A. K. Tellakula, ccc sss ti cccccccaaaa liztt i Usigg Tim Dll yy ttt imtt inn Degree Thesis. Bangalore, India: Supercomputer Education and Research Centre Indian Institute of Science [3] R. Samborski and M. Ziolko, aaaaa arr lccll iztt io i ffff ernnii gg yyttmm mmll yyi ssss features and wavelet trnnfformiiiin Signal Processing and Information Technology (ISSPIT), 2013. [4] H. Sayoud, S. Ouamour, and S. Khennouf, n mtt ff eee.. rr loaalization using the filtered correlation in Industrial Mechatronics and Automation (ICIMA), 2010 IEEE 2nd International Conference on, 2013. [5] S. Ouamour and H. Sayoud, mmmmmmi ppaakrr laaaliztt i aaee ppaak.. innntification - A smart room llll iaati in Information and Communication Technology and Accessibility (ICTA), 2013 IEEE Fourth International Conference on. MHCI 110-5