Speaker Isolation in a Cocktail-Party Setting
|
|
- Buck Harper
- 6 years ago
- Views:
Transcription
1 Speaker Isolation in a Cocktail-Party Setting M.K. Alisdairi Columbia University M.S. Candidate Electrical Engineering Spring Abstract the human auditory system is capable of performing many interesting tasks, several of which could find useful applications in engineering settings. One such capability is the ability to perceptually separate sound sources, allowing a listener to focus on a single speaker in a noisy environment. This effect is often referred to as the cocktail effect (in reference to a cocktail-party environment where several simultaneous conversations are taking place in the background) or as Auditory Scene Analysis. This paper introduces two methodologies for isolating a desired speaker s audio stream from a binaural recording of multiple speakers in conversation. An implementation a system for speaker isolation based on one of these methods is also presented. Note that some of the graphics presented in this document are best viewed in a color format. For the electronic version please visit INTRODUCTION Systems capable of performing Auditory Scene Analysis (ASA) [] could find numerous useful applications. The most evident application is as a front- for speech recognition systems. The development of systems capable of ASA could provide improvements in speech recognition in unconstrained auditory environments [7] [9]. Noisy Environment Sound Input Speaker Extraction System Voice Recognition System Figure Speaker Extraction System as a Front-End to Voice Recognition Another possible use for an ASA capable system could be in theatrical/movie settings as a substitute for wireless microphones. In such instances sound engineers could have a versatile means of controlling audio quality without the physical imposition of hardware on the speaker s person. This paper discusses two methodologies for speech extraction. The first method is based on the Interaural Intensity Difference, and the second on the Time Difference of Arrival. After the preliminary discussion an implementation of the TDOA based method is presented. Analysis and implementation is carriedout on sound recordings from the ShATR corpus. The ShATR Corpus The sound files used were taken from the ShATR corpus of dummy head recordings. The recordings are of five speakers (Guy, Martin, Phil, Inge Marie, and Malcolm) oriented around the dummy head. Included in the ShATR corpus are two files; one file contains a recording of each of the five speakers introducing themselves, Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting -9
2 the other file is a recording of the five carrying on in conversation. These two files are of primary interest in this document. THEORETICAL BACKGROUND THE HRTF The theoretical undergirding of speaker isolation is the fact that the left and right channels traverse different paths; resulting in differing filtering for each channel. The following figure and equations illustrate the concept. Frequency x 4.5 Sample Spectrogram TF Cell Ear Channel Dummy Head Figure Depiction of the Path Depent Filtering Effect Y Y Where: Y ( ω, = H φ ) X = H X, Y Ear Channel (Equation ) (Equation ) The signals received by the left and right ears Time Figure 3 - The TF Cell Concept There are two methods which may be implemented towards the goal of categorizing and weighting the TF cells appropriately; the interaural intensity difference (IID) method and the time delay of arrival (TDOA) method. The Inter-aural Intensity Difference The IID method of cell-weight estimation is based on intensity differentials as a function of frequency [6]. For example, a sound originating from a source in the first quadrant of the figure below will be detected by the left ear as a signal that has undergone a low-pass filtering effect. The low-pass effect is a result of the shadowing of high frequency components by the head. On the other hand, low frequencies are able to wrap around the head with little attenuation. H, H The impulse responses of the paths to the left and right ears. SECOND QUADRANT FIRST QUADRANT X ( ω, The original speech signal. HEAD degrees The two transfer functions H and H are called the Head Related Transfer Functions (HRFT) and are functions of position as well as frequency (note the HRTF is more precisely Figure 4 Sound Source Plane a function of frequency ω, azimuth φ, and elevation θ) [5]. It is averred that that by decomposing a multi-speaker signal The interaural intensity difference may be obtained by taking into several time/frequency (TF) cells and weighting each cell the ratio of the left and right channel magnitudes in the appropriately, a desired speaker s speech data may be extracted frequency domain (or correspondingly taking the difference from an aggregate of speakers [3]. The localization cues that between the frequency domain magnitudes in db). The math are implanted in the received signals by the HRTF may be used follows. to determine the appropriate weightings for each of the TF cells. Y = H iid = log log (Equation 3) YRigth( ω, H Rigth( ω, ( Y ) log( Y ) ) iid = log t Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting -9
3 By comparing incoming speech data to predetermined categorical information the previously mentioned TF cells may be classified appropriately [4]. x 4 The Time Delay of Arrival The distance differential between the propagation paths for the right and left ears causes a phase distortion between the two channels. This phase distortion may be used as localization cue. As with the IID method, these localization cues may be used to derive the necessary weighting information for each TF cell. Theoretically, extraction of the desired time delay information may be accomplished through direct analysis of the phase components as follows: itd = arg * { Y Y } (Equation 4) Despite the plausibility of the above approach it is avoided due to difficulties resulting from the nonlinearity of the phase functions. An alternative means towards the same is the use of the crosscorrelation function. The TDOA is obtained by retaining the lag index of maximum crosscorrelation between the left and right channels. Correlation of & Channels 8 x Frequency Time Inge Marie on Microphone Four Figure 6 Inge s Introduction Sample The sound sample is first passed through a Bark scaled filter bank and each band-limited output is broken into short time windows. The left and right channel windows are then crosscorrelated, and the index of the maximum crosscorrelation is retained for each time window. The filter bank is a four-channel filter-bank. The justification for this selection is based on the results given in [3], which show a maximization of the SNR for four frequency bands. Initially, the time window length was based solely on calculations of the maximum possible lag index. However, [3] also shows a maximization of SNR for a window length of 56 samples, thus a window length of 56 was used. The following figure depicts a histogram of the lag indexes that result from the process described above: Correlation Index Figure 5 Determining the TDOA by Autocorrelation IMPLEMENTATION This paper concentrates on the TDOA method of speaker isolation. The first step is the analysis of the introduction sound samples from the ShATR corpus to develop a source model that describes each speaker s localization cues. The second step involves use of the localization cues extracted by the previously attained models to determine the appropriate TF cell weightings for each TF cell. Analysis Source Models The first step in building the speech isolation system is to study the behavior of the TDOA as a speaker (position) depent feature. This analysis was conducted on the speech samples of each speaker s introduction, such as Inge Marie s depicted below. Figure 7 Histogram of Lag Index of Maximum Correlation As the above histograms show, the lag indexes of maximum crosscorrelation seem to be normally distributed. The appropriate mean and standard deviations are then manually extracted for each speaker. The results for Guy and Inge Marie follow: Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 3-9
4 Speaker: Guy Channel Mean (µ) Standard Deviation (σ) One -8 Two -6.5 Three Four 5 Speaker: Inge Marie Channel Mean (µ) Standard Deviation (σ) One Two Three 7.5 Four The weight of a particular TF cell as a function of lag index (i) is as follows: w ( i µ ch ) σ ch ch( i) = e (Equation 5) Speech Isolation System Once the distributions of the lag indexes for each speaker have been determined the sound file containing simultaneous speech may be analyzed and a desired speaker may be extracted. The simultaneous speech sample is first passed through the previously mentioned filter bank. Each of the four band limited signals is then broken into time windows of length 56 samples. The left and right channels are then crosscorrelated for each time frame, and the index of maximum correlation is retained. The sequence of lag indexes is then compared to the desired speaker s lag index distribution model and a weight corresponding to the likelihood of each TF cell belonging to the desired speaker is used as that cell s weight. Figure 9 Extracting Guy s Speech Signal Although the system preformed the desired task of extracting a single speech track from the sequence, the weighting process introduced some undesirable noise. It was postulated that the source of the noise was the discontinuity of the weighting matrix over time, and that the problem could be ameliorated by filtering the weight matrix. The following figures illustrate the short sections of the four weight sequences before and after filtering..5 Filtering of the Weight Matrix Original Weight Seq. Filtered Weight Seq..5.5 x 4.5 TDOA Analysis.5.5 x Desired Speaker x 4 Speech Data Filter Bank Weighting Based on TDOA Model Isolated Speech Weight Time x 4 Figure Illustration of the Weight Matrix Filtering Process Figure 8 Block Diagram of Speech Isolation System RESULTS The above methodology of speech extraction was successful in isolating the desired speaker s signal. The following is a spectrogram of a sample of cocktail-party speech and the resulting extraction of Guy s stream. Reconstruction of the desired track using the newly filtered weighting matrix resulted in the desired improvement in quality. The following figures illustrate the reconstructed signals for Guy with and without weight matrix filtering. Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 4-9
5 [8] F. Berthommier and S. Choi, Evaluation of CASA and BSS models for subband cocktail-party speech separation. [9] D. Wang and G. Brown, Seperation of Speech from Interfering Sounds Based on Oscillatory Correlation, IEEE Transactions on Nerual Networks, Vol., No. 3, May 999. [] A. Bregman, Auditory Scene Analysis, Cambridge, MA: MIT Press, 99. Figure Before and After Filtering the Weighting Function CONCLUDING REMARKS The design presented in this paper illustrates a simple implementation of a speech extraction system and establishes the feasibility of such a system. This design was successful in achieving the objective of extracting a single speaker s track from a group recording. Included in the appices is the code for the implementation as well as larger spectrogram depictions of the results. Despite the successes presented in this paper, there exists room for enhancements in future work. For example, the source model in this implementation was obtained manually. The automation of the source model would allow the system to behave in a more versatile manner, possibly allowing the relaxation of the a priori assumption that the speakers positions remain constant. A second potential improvement could be the incorporation of a broader feature set, possibly including the IID in addition to the TDOA. REFERENCES [] S. Mitra, Digital Signal Processing, a Computer-Based Approach, McGraw-Hill Irwin. [] R Ziemer and W. Tranter, Signals and Systems, Continuous and Discrete, Prentice Hall 998. [3] E. Tessier and F. Berthommier, Speech Enhancement and Segregation Based on the Localisation Cue for Cocktail-Party Processing, [4] W. Chau and R. Duda, Combined Monaural and Binaural Localization of Sound Sources, IEEE Proceedings of ASILOMAR [5] R. Duda, Modeling Head Related Transfer Functions, IEEE Proceedings of ASILOMAR 993. [6] K. Martin, Estimating Azimuth and Elevation from Interaural Differences. [7] S. Choi, H. Glotin, F. Berthommier, and E. Tessier, A CASA Front-End Using the Localization Cue for Segregation and then Cocktail-Party Speech Recognition. Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 5-9
6 APPENDIX A: MATLAB CODE function banddat = bands(dd,nb); M.K. Alisdairi Spring b = bands(ysound,number_bands) Given original data 'dd' bands() returns an nb-by-length(dd)-by- matrix containing versions of 'dd' bandlimited according to the bark scale nb is the number of bands used. Note nb may be {8 4 } if(nb~= & nb~= & nb~=4 & nb~=8) clc disp(sprintf('error: nb must be 8,4,, or ')) banddat = ; return Make sure it fits the criteria home disp(sprintf('working...')) N = ^9; Number of frequency points for the STFT Seperate into right and left channels d_left = dd(:,)'; d_right= dd(:,)'; Calculate the STFTs DL = stft(d_left,n,n,n/); DR = stft(d_right,n,n,n/); Define the Bark scaled windows in Hz, then convert to FFT index (w). Note: eight channels max. F = [ ]; w = floor(n*f./48)+; banddat= zeros(nb,((size(dl,)+)*n/),); Initializing is faster Go through and produce the proper bandlimited signals inc = 8/nb; for(i=:inc:8) FtempL = zeros(size(dl)); FtempL((w(i)):((w(i+inc)-)),:) = DL((w(i)):((w(i+inc)-)),:); FtempR = zeros(size(dr)); FtempR((w(i)):((w(i+inc)-)),:) = DR((w(i)):((w(i+inc)-)),:); banddat(ceil(i/inc),:,)=istft(ftempl,n,n,n/); banddat(ceil(i/inc),:,)=istft(ftempr,n,n,n/); Forward the data function [ii,yy,c] = tdoa(b,win,ch) M.K. Alisdairi Spring [i,y,c] = tdoa(band_data,window_length,channels) Function accepts matrix 'b' (produced by bands()), hich contains nb channels of bandlimited audio data. The function will conduct a cross correletion of left and right time windows with length 'win'. bl = b(:,:,); br = b(:,:,); Break data into right and left channels nb=size(b,); Number of bands stop = floor(size(b,)/win); Number of windows Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 6-9
7 c = NaN*ones(*win-,stop); xcorr matrix for k=:length(ch) j=ch(k); for i=:win:(stop*win) c(:,ceil(i/win))=xcorr(bl(j,i:(i+win-)),br(j,i:(i+win-)))'; home disp(size(c)) [y,i] = max(c); Determine the TDOA for each frame yy(k,:) = y; ii(k,:) = i-win; Forward actual xcorr. Forward lag indexes function [ys,wgt] = extract(b,ii,person,win) M.K. Alisdairi Spring [ys,wgt] = extract(band_data,lag_indexes,desired_person, window size) This function extracts Guy's voice from the sound data in b If person == then extract Guy. If == then extract Inge. u = [ ; Means for Guy ]; Means for Inge Marie s = [.5 5; Std. devs for Guy 4 3.5]; Std. devs for Inge Marie u = u(person,:); Take the correct data s = s(person,:); w = zeros(,length(ii)*win); Initialize short weight vector len = min(length(b),length(w)); ys = zeros(,len); Initialize place for new data wgt = zeros(size(ii,),length(w)); Initialize actual weight matrix for ch=:size(ii,) Look at all channels wf = zeros(,length(ii)); wf = exp(-(ii(ch,:)-u(ch)).^/(*s(ch)^)); Calculate appropriate weight based on the lag index w(:) = wf(ceil((:length(w))/win)); Convert to long vector wgt(ch,:) = w; Forward the info ys(,:) = ys(,:)+b(ch,:len,).*w(:len); Calculate the extracted voice ys(,:) = ys(,:)+b(ch,:len,).*w(:len); Caluclate the other channel Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 7-9
8 APPENDIX B: ENLARGED SPECTROGRAMS Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 8-9
9 Speech and Audio Processing Speaker Isolation in a Cocktail-Party Setting 9-9
A classification-based cocktail-party processor
A classification-based cocktail-party processor Nicoleta Roman, DeLiang Wang Department of Computer and Information Science and Center for Cognitive Science The Ohio State University Columbus, OH 43, USA
More informationMonaural and Binaural Speech Separation
Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as
More informationIMPROVED COCKTAIL-PARTY PROCESSING
IMPROVED COCKTAIL-PARTY PROCESSING Alexis Favrot, Markus Erne Scopein Research Aarau, Switzerland postmaster@scopein.ch Christof Faller Audiovisual Communications Laboratory, LCAV Swiss Institute of Technology
More informationBinaural Hearing. Reading: Yost Ch. 12
Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More information1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE
1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural
More informationSimultaneous Recognition of Speech Commands by a Robot using a Small Microphone Array
2012 2nd International Conference on Computer Design and Engineering (ICCDE 2012) IPCSIT vol. 49 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V49.14 Simultaneous Recognition of Speech
More informationThe psychoacoustics of reverberation
The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control
More informationA Neural Oscillator Sound Separator for Missing Data Speech Recognition
A Neural Oscillator Sound Separator for Missing Data Speech Recognition Guy J. Brown and Jon Barker Department of Computer Science University of Sheffield Regent Court, 211 Portobello Street, Sheffield
More informationBinaural Speaker Recognition for Humanoid Robots
Binaural Speaker Recognition for Humanoid Robots Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader Université Pierre et Marie Curie Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222
More informationTwo-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling
Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Mikko Parviainen 1 and Tuomas Virtanen 2 Institute of Signal Processing Tampere University
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationSound Source Localization using HRTF database
ICCAS June -, KINTEX, Gyeonggi-Do, Korea Sound Source Localization using HRTF database Sungmok Hwang*, Youngjin Park and Younsik Park * Center for Noise and Vibration Control, Dept. of Mech. Eng., KAIST,
More informationINVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS
20-21 September 2018, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2018) 20-21 September 2018, Bulgaria INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR
More informationTDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting
TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones Source Counting Ali Pourmohammad, Member, IACSIT Seyed Mohammad Ahadi Abstract In outdoor cases, TDOA-based methods
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationMultiple Sound Sources Localization Using Energetic Analysis Method
VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova
More informationAuditory Localization
Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception
More informationComputational Perception. Sound localization 2
Computational Perception 15-485/785 January 22, 2008 Sound localization 2 Last lecture sound propagation: reflection, diffraction, shadowing sound intensity (db) defining computational problems sound lateralization
More informationBIOLOGICALLY INSPIRED BINAURAL ANALOGUE SIGNAL PROCESSING
Brain Inspired Cognitive Systems August 29 September 1, 2004 University of Stirling, Scotland, UK BIOLOGICALLY INSPIRED BINAURAL ANALOGUE SIGNAL PROCESSING Natasha Chia and Steve Collins University of
More informationSubband Analysis of Time Delay Estimation in STFT Domain
PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationEnhancing 3D Audio Using Blind Bandwidth Extension
Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,
More informationPERFORMANCE COMPARISON BETWEEN STEREAUSIS AND INCOHERENT WIDEBAND MUSIC FOR LOCALIZATION OF GROUND VEHICLES ABSTRACT
Approved for public release; distribution is unlimited. PERFORMANCE COMPARISON BETWEEN STEREAUSIS AND INCOHERENT WIDEBAND MUSIC FOR LOCALIZATION OF GROUND VEHICLES September 1999 Tien Pham U.S. Army Research
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationAN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationListening with Headphones
Listening with Headphones Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back Substantial individual differences Most evident in elevation
More informationIII. Publication III. c 2005 Toni Hirvonen.
III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on
More informationThe analysis of multi-channel sound reproduction algorithms using HRTF data
The analysis of multichannel sound reproduction algorithms using HRTF data B. Wiggins, I. PatersonStephens, P. Schillebeeckx Processing Applications Research Group University of Derby Derby, United Kingdom
More informationEE1.el3 (EEE1023): Electronics III. Acoustics lecture 20 Sound localisation. Dr Philip Jackson.
EE1.el3 (EEE1023): Electronics III Acoustics lecture 20 Sound localisation Dr Philip Jackson www.ee.surrey.ac.uk/teaching/courses/ee1.el3 Sound localisation Objectives: calculate frequency response of
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationAuditory System For a Mobile Robot
Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Université de Sherbrooke, Québec, Canada Jean-Marc.Valin@USherbrooke.ca Motivations
More informationSound source localization and its use in multimedia applications
Notes for lecture/ Zack Settel, McGill University Sound source localization and its use in multimedia applications Introduction With the arrival of real-time binaural or "3D" digital audio processing,
More informationBlind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model
Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model Jong-Hwan Lee 1, Sang-Hoon Oh 2, and Soo-Young Lee 3 1 Brain Science Research Center and Department of Electrial
More informationAn Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets
Proceedings of the th WSEAS International Conference on Signal Processing, Istanbul, Turkey, May 7-9, 6 (pp4-44) An Adaptive Algorithm for Speech Source Separation in Overcomplete Cases Using Wavelet Packets
More informationSource Localisation Mapping using Weighted Interaural Cross-Correlation
ISSC 27, Derry, Sept 3-4 Source Localisation Mapping using Weighted Interaural Cross-Correlation Gavin Kearney, Damien Kelly, Enda Bates, Frank Boland and Dermot Furlong. Department of Electronic and Electrical
More informationRecurrent Timing Neural Networks for Joint F0-Localisation Estimation
Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Stuart N. Wrigley and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 211 Portobello Street, Sheffield
More informationSOPA version 3. SOPA project. July 22, Principle Introduction Direction of propagation Speed of propagation...
SOPA version 3 SOPA project July 22, 2015 Contents 1 Principle 2 1.1 Introduction............................ 2 1.2 Direction of propagation..................... 3 1.3 Speed of propagation.......................
More informationConvention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 125th Convention 2008 October 2 5 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationIN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationIndoor Sound Localization
MIN-Fakultät Fachbereich Informatik Indoor Sound Localization Fares Abawi Universität Hamburg Fakultät für Mathematik, Informatik und Naturwissenschaften Fachbereich Informatik Technische Aspekte Multimodaler
More informationLive multi-track audio recording
Live multi-track audio recording Joao Luiz Azevedo de Carvalho EE522 Project - Spring 2007 - University of Southern California Abstract In live multi-track audio recording, each microphone perceives sound
More informationPitch-Based Segregation of Reverberant Speech
Technical Report OSU-CISRC-4/5-TR22 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 Ftp site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/25
More informationDistance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks
Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Mariam Yiwere 1 and Eun Joo Rhee 2 1 Department of Computer Engineering, Hanbat National University,
More informationHRIR Customization in the Median Plane via Principal Components Analysis
한국소음진동공학회 27 년춘계학술대회논문집 KSNVE7S-6- HRIR Customization in the Median Plane via Principal Components Analysis 주성분분석을이용한 HRIR 맞춤기법 Sungmok Hwang and Youngjin Park* 황성목 박영진 Key Words : Head-Related Transfer
More informationURBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Human perception of sound and space ITD, IID,
More informationIntensity Discrimination and Binaural Interaction
Technical University of Denmark Intensity Discrimination and Binaural Interaction 2 nd semester project DTU Electrical Engineering Acoustic Technology Spring semester 2008 Group 5 Troels Schmidt Lindgreen
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationCOMPARISON OF CHANNEL ESTIMATION AND EQUALIZATION TECHNIQUES FOR OFDM SYSTEMS
COMPARISON OF CHANNEL ESTIMATION AND EQUALIZATION TECHNIQUES FOR OFDM SYSTEMS Sanjana T and Suma M N Department of Electronics and communication, BMS College of Engineering, Bangalore, India ABSTRACT In
More informationImproving reverberant speech separation with binaural cues using temporal context and convolutional neural networks
Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,
More informationTransmit Power Allocation for BER Performance Improvement in Multicarrier Systems
Transmit Power Allocation for Performance Improvement in Systems Chang Soon Par O and wang Bo (Ed) Lee School of Electrical Engineering and Computer Science, Seoul National University parcs@mobile.snu.ac.r,
More informationONE of the most common and robust beamforming algorithms
TECHNICAL NOTE 1 Beamforming algorithms - beamformers Jørgen Grythe, Norsonic AS, Oslo, Norway Abstract Beamforming is the name given to a wide variety of array processing algorithms that focus or steer
More informationModeling Head-Related Transfer Functions Based on Pinna Anthropometry
Second LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCEI 24) Challenges and Opportunities for Engineering Education, Research and Development 2-4 June
More informationThe Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals
The Role of High Frequencies in Convolutive Blind Source Separation of Speech Signals Maria G. Jafari and Mark D. Plumbley Centre for Digital Music, Queen Mary University of London, UK maria.jafari@elec.qmul.ac.uk,
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationWhite Rose Research Online URL for this paper: Version: Accepted Version
This is a repository copy of Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localisation of Multiple Sources in Reverberant Environments. White Rose Research Online URL for this
More informationSpeech and Audio Processing Recognition and Audio Effects Part 3: Beamforming
Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationComputational Perception /785
Computational Perception 15-485/785 Assignment 1 Sound Localization due: Thursday, Jan. 31 Introduction This assignment focuses on sound localization. You will develop Matlab programs that synthesize sounds
More informationAiro Interantional Research Journal September, 2013 Volume II, ISSN:
Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction
More informationWAVELET-BASED SPECTRAL SMOOTHING FOR HEAD-RELATED TRANSFER FUNCTION FILTER DESIGN
WAVELET-BASE SPECTRAL SMOOTHING FOR HEA-RELATE TRANSFER FUNCTION FILTER ESIGN HUSEYIN HACIHABIBOGLU, BANU GUNEL, AN FIONN MURTAGH Sonic Arts Research Centre (SARC), Queen s University Belfast, Belfast,
More informationPitch-based monaural segregation of reverberant speech
Pitch-based monaural segregation of reverberant speech Nicoleta Roman a Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210 DeLiang Wang b Department of Computer
More informationToward Automatic Transcription -- Pitch Tracking In Polyphonic Environment
Toward Automatic Transcription -- Pitch Tracking In Polyphonic Environment Term Project Presentation By: Keerthi C Nagaraj Dated: 30th April 2003 Outline Introduction Background problems in polyphonic
More informationUsing Energy Difference for Speech Separation of Dual-microphone Close-talk System
ensors & Transducers, Vol. 1, pecial Issue, May 013, pp. 1-17 ensors & Transducers 013 by IF http://www.sensorsportal.com Using Energy Difference for peech eparation of Dual-microphone Close-talk ystem
More informationarxiv: v1 [cs.sd] 4 Dec 2018
LOCALIZATION AND TRACKING OF AN ACOUSTIC SOURCE USING A DIAGONAL UNLOADING BEAMFORMING AND A KALMAN FILTER Daniele Salvati, Carlo Drioli, Gian Luca Foresti Department of Mathematics, Computer Science and
More informationRobotic Spatial Sound Localization and Its 3-D Sound Human Interface
Robotic Spatial Sound Localization and Its 3-D Sound Human Interface Jie Huang, Katsunori Kume, Akira Saji, Masahiro Nishihashi, Teppei Watanabe and William L. Martens The University of Aizu Aizu-Wakamatsu,
More informationAutomotive three-microphone voice activity detector and noise-canceller
Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR
More informationRobust Speech Recognition Based on Binaural Auditory Processing
Robust Speech Recognition Based on Binaural Auditory Processing Anjali Menon 1, Chanwoo Kim 2, Richard M. Stern 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh,
More informationA BINAURAL HEARING AID SPEECH ENHANCEMENT METHOD MAINTAINING SPATIAL AWARENESS FOR THE USER
A BINAURAL EARING AID SPEEC ENANCEMENT METOD MAINTAINING SPATIAL AWARENESS FOR TE USER Joachim Thiemann, Menno Müller and Steven van de Par Carl-von-Ossietzky University Oldenburg, Cluster of Excellence
More informationSpatial Audio Transmission Technology for Multi-point Mobile Voice Chat
Audio Transmission Technology for Multi-point Mobile Voice Chat Voice Chat Multi-channel Coding Binaural Signal Processing Audio Transmission Technology for Multi-point Mobile Voice Chat We have developed
More informationLateralisation of multiple sound sources by the auditory system
Modeling of Binaural Discrimination of multiple Sound Sources: A Contribution to the Development of a Cocktail-Party-Processor 4 H.SLATKY (Lehrstuhl für allgemeine Elektrotechnik und Akustik, Ruhr-Universität
More informationDirectionality. Many hearing impaired people have great difficulty
Directionality Many hearing impaired people have great difficulty understanding speech in noisy environments such as parties, bars and meetings. But speech understanding can be greatly improved if unwanted
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationOptimal Adaptive Filtering Technique for Tamil Speech Enhancement
Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore,
More informationEnhancement of Speech Signal by Adaptation of Scales and Thresholds of Bionic Wavelet Transform Coefficients
ISSN (Print) : 232 3765 An ISO 3297: 27 Certified Organization Vol. 3, Special Issue 3, April 214 Paiyanoor-63 14, Tamil Nadu, India Enhancement of Speech Signal by Adaptation of Scales and Thresholds
More informationUsing Vision to Improve Sound Source Separation
Using Vision to Improve Sound Source Separation Yukiko Nakagawa y, Hiroshi G. Okuno y, and Hiroaki Kitano yz ykitano Symbiotic Systems Project ERATO, Japan Science and Technology Corp. Mansion 31 Suite
More informationBinaural Hearing- Human Ability of Sound Source Localization
MEE09:07 Binaural Hearing- Human Ability of Sound Source Localization Parvaneh Parhizkari Master of Science in Electrical Engineering Blekinge Institute of Technology December 2008 Blekinge Institute of
More informationFrom Monaural to Binaural Speaker Recognition for Humanoid Robots
From Monaural to Binaural Speaker Recognition for Humanoid Robots Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader Université Pierre et Marie Curie Institut des Systèmes Intelligents et de Robotique,
More informationSpeech Enhancement Using Microphone Arrays
Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Speech Enhancement Using Microphone Arrays International Audio Laboratories Erlangen Prof. Dr. ir. Emanuël A. P. Habets Friedrich-Alexander
More informationPerceptual Distortion Maps for Room Reverberation
Perceptual Distortion Maps for oom everberation Thomas Zarouchas 1 John Mourjopoulos 1 1 Audio and Acoustic Technology Group Wire Communications aboratory Electrical Engineering and Computer Engineering
More informationLecture 14: Source Separation
ELEN E896 MUSIC SIGNAL PROCESSING Lecture 1: Source Separation 1. Sources, Mixtures, & Perception. Spatial Filtering 3. Time-Frequency Masking. Model-Based Separation Dan Ellis Dept. Electrical Engineering,
More informationAccurate Delay Measurement of Coded Speech Signals with Subsample Resolution
PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF
ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF F. Rund, D. Štorek, O. Glaser, M. Barda Faculty of Electrical Engineering Czech Technical University in Prague, Prague, Czech Republic
More informationFrom Binaural Technology to Virtual Reality
From Binaural Technology to Virtual Reality Jens Blauert, D-Bochum Prominent Prominent Features of of Binaural Binaural Hearing Hearing - Localization Formation of positions of the auditory events (azimuth,
More informationIvan Tashev Microsoft Research
Hannes Gamper Microsoft Research David Johnston Microsoft Research Ivan Tashev Microsoft Research Mark R. P. Thomas Dolby Laboratories Jens Ahrens Chalmers University, Sweden Augmented and virtual reality,
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationA learning, biologically-inspired sound localization model
A learning, biologically-inspired sound localization model Elena Grassi Neural Systems Lab Institute for Systems Research University of Maryland ITR meeting Oct 12/00 1 Overview HRTF s cues for sound localization.
More informationAcoustics Research Institute
Austrian Academy of Sciences Acoustics Research Institute Spatial SpatialHearing: Hearing: Single SingleSound SoundSource Sourcein infree FreeField Field Piotr PiotrMajdak Majdak&&Bernhard BernhardLaback
More informationLab 8. Signal Analysis Using Matlab Simulink
E E 2 7 5 Lab June 30, 2006 Lab 8. Signal Analysis Using Matlab Simulink Introduction The Matlab Simulink software allows you to model digital signals, examine power spectra of digital signals, represent
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationBinaural Classification for Reverberant Speech Segregation Using Deep Neural Networks
2112 IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 22, NO. 12, DECEMBER 2014 Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks Yi Jiang, Student
More informationSound Source Localization in Median Plane using Artificial Ear
International Conference on Control, Automation and Systems 28 Oct. 14-17, 28 in COEX, Seoul, Korea Sound Source Localization in Median Plane using Artificial Ear Sangmoon Lee 1, Sungmok Hwang 2, Youngjin
More information