Power Normalized Cepstral Coefficient for Speaker Diarization and Acoustic Echo Cancellation
|
|
- Alicia Baldwin
- 5 years ago
- Views:
Transcription
1 Power Normalized Cepstral Coefficient for Speaker Diarization and Acoustic Echo Cancellation Sherbin Kanattil Kassim P.G Scholar, Department of ECE, Engineering College, Edathala, Ernakulam, India Abstract: Developing audio processing methods for extracting audio features is very important as conscious content for determining human behavior. Conventional researches that concentrate on data which have been recorded under constrained conditions, here dealing with data were recorded in completely natural and unpredictable situations. To obtain speech and environmental sound audio signals, need to set benchmark, a pattern of integrated algorithms for sound speech detection and classification, voice and non voice segmentation, speaker segmentation, and prediction. The acoustic system could become instable which would produce loud disturbances. The solution to these problems is the elimination of the echo with an echo suppression or echo cancellation algorithm. Audio feature extraction technique based on Power Normalized Cepstral Coefficient ( PNCC) and gap statistics for speaker segmentation or diarization and prediction of number of speakers is used[1]. Major new features of PNCC that is based on auditory processing, which accomplishes temporal masking is used for speaker segmentation. An adaptive filtering based on LMS algorithm for unwanted echo reduction and to increase communication quality. 7 I. INTRODUCTION Audio signal processing is a highly growing field, where intelligent devices sense and understand human social behavior. Audio signal processing has already attracted researchers in areas such as psychology, ambient intelligence, healthcare, telecommunication etc. Audio processing is relevant for extracting audio features which helps in determining the characteristics and behavior. This work deals with a set of evaluation criteria and test methods for speech recognition systems[2]. Standard methods to evaluate and measure audio signal processing have different limitations. It is very expensive to monitoring humans, which may be limited to a small number of people per observer, which may have inter observer reliability related issues. The presence of a large acoustic coupling between the loudspeaker and microphone would produce a loud echo that would make conversation difficult..automatic speech recognition (ASR) has made great strides with the development of digital signal processing in both software and hardware. Different types of signal features have been proposed by sound recognition community for the task of sound description. The steadiness or dynamicity of the feature, this feature represent a value extracted from the signal at the time given, or a parameter from a model of the signal behavior along time (eg:- mean, standard deviation, Markov model etc).the time extent of the description provided by the feature, explanation applies to only part of the object (eg:- description of the attack of the sound) whereas other anther applies to entire signal (eg:- loudness) [3]. II. SYSTEM ARCHITECTURE In this work the design and software implementation of a computing platform capable of extracting automatically and analyzing audio signals. Audio feature extraction technique based on power normalized cepstral coefficient and gap statistics for speaker segmentation and prediction is used. Speaker segmentation method based on a power normalized cepstral coefficient (PNCC) is used instead of Linear prediction cepstral coefficients (LPCC), Mel-frequency cepstral coefficients (MFCC)[4]. A. Proposed Audio Signal Extraction Architecture:Design the system to continuously collect audio signals in completely natural and unpredictable situations. The proposed SASE architecture can be done in three different steps or stages. Stage 1: Block detection of sound and speech and classification of both environmental and speech sounds. Stage 2: Voiced and non-voiced speech segmentation and sound level meter calculation. Stage 3: Individual speaker segmentation, clustering, unknown number of speaker prediction, noise removal, echo cancelation. B. Voiced Speech Segmentation: In speech analysis, the voiced and non-voiced decision is performed usually is related with pitch analysis. The linking of voiced and nonvoiced decision statistics to pitch analysis can causes unnecessary issues. A pattern recognition approach for deciding whether a given segment of a speech signal should be classified as voiced speech, non voiced(unvoiced speech, and silence), based on measurements made on the signal[5]. The parameters measured are the zero-crossing rate, the energy of speech, the correlation between adjacent speech samples, the first predictor coefficient, LPC(linear predictive coding) analysis, and the energy in the error prediction. The speech segment is allocated or assigned based on a minimum distance rule obtained under the assumption that the measured parameters are distributed according to the multidimensional Gaussian probability density function. The arithmetic means and co-variances for the Gaussian distribution are determined from manually classified speech data. 1) Features and Segmentation: Features for the voice
2 and nonvoiced segmentation conducted on the blocks of speech data, namely, a). Noninitial maximum of the normalized noisy autocorrelation b). Number of correlation peaks, and c). Normalized spectral entropy. These are computed on a per-frame basis. C. Speaker Segmentation Diarization Using Power Normalized Cepstral Coefficient: Speaker diarization aims to detect who spoke when in large audio segments. A new feature extraction algorithm called Power Normalized Cepstral Coefficients (PNCC) that is based on auditory processing. Speaker diarization is the process of partitioning an input audio stream into acoustically homogeneous segments and clusters these segments belonging to each speaker. Automatic speech transcription system can be used as a pre-processing step. It is performed by combining speaker segmentation clustering etc. Speech and non-speech segments can be separated out using diarization and the speech recognizer system has to process only the audio segments containing speech. across a wireless network. Employing echo cancellation technique the quality of speech can be improved very high extent. The aim of an adaptive filter is to calculate the difference between the desired signal and the adaptive filter output error signal e(n). That e(n) error signal is feed backed into the adaptive filter and its coefficients are cltered algorithmically to minimize a function of that difference which is known as the cost function. If the output of the adaptive filter output is same to that of desired signal the error signal goes to zero or get approaches to zero. Double talk detection is a situation where both talkers speak at the same time[7]. An important requirement for echo cancellation is the handling of doubletalk in a natural manner that does not cause divergence. Hybrid echoes have been inherent because of the advent of the telephone or system. This type of echo is the result of impedance mismatches in the analog local loop. The acoustic echo is also known as a multipath echo which produced as a result of poor voice coupling between the earpiece and microphone in handsets and hands free devices. F. Adaptive Echo Reduction Approach: Audio Input O/P Wiener Filter LPR P N C C HMM / GMM Fig.1 Block diagram for speaker diarization PNCC processing make use of a power-law nonlinearity that replaces the traditional log nonlinearity used in mel frequency cepstral coefficients, a noise reduction algorithm based on asymmetric filtering that suppress background excitation or noise, and a unit that accomplishes temporal masking. Speaker diarization is the process of partitioning an input audio stream into acoustically homogeneous segments and clusters these segments belonging to each speaker. D. Predicting Number of Unknown Speakers: To solve the ambiguity in the unknown speaker clustered segments method, need to find out how many unknown speakers are involved in the conversation. To find a solution for this problem, the standard k-means algorithm and the gapstatistic technique are used to achieve this purpose. The concept used here is that the sum of squares can only decrease as the number of clusters increases, after a certain point, the sum of squares should decrease more slowly than for previous clusters. This point is called the elbow, and is determined to be the optimal number of clusters[6]. E. Echo Reduction: In this global period of communication users demand for enhanced voice quality over wireless networks has driven a new and key technology termed echo reduction, which can provide near wire line voice quality Fig 2. Adaptive Echo Reduction system Echoes are major sources of annoyance in hands-free communications, where the presence of coupling from the far-end signal (loudspeaker) to the near-end signal (microphone) would result in undesired acoustic echo. An adaptive filter algorithmically adjusts or alters its parameters in order to minimise a function of the difference between the desired output d(n) and its actual output y(n). This function is known as the cost function of the adaptive algorithm. The filter h(n) represents the impulse response of the acoustic environment w(n) which represents the adaptive filter which uses to cancel the echo signal. The adaptive filter which aims to equates its output y(n) to the required output d(n). A basic echo canceller used to remove echo in a communication system is shown below. Inverse filterting effect and wiener filter noise reduction is very effective in removing the echo. 8
3 The echo canceller replicates the transfer function of the echo path for synthesize a replica of the echo. Then the echo canceller minimizes/subtracts the synthesized replica from the combined echo and near-end speech or disturbance signal to obtain the near-end signal. The transfer function is not known. So the transfer function should be found out. At each iteration the error signal, e(n) =d(n)-y (n), is fed back into the filter. This Adaptive echo cancelation technique is introduced to Asymmetric noise suppression.an audio sample of 5-10sec duration is recorded, and to test the echo cancelation algorithm have added echo effect using the MP3 audio editor. Adaptive LMS algorithm is used to get the echo cancelation done. The desired signal is obtained by combining the filter output and the impulse response. Wiener noise reduction algorithm is used to eliminate the noise present in the audio signal.using the three signals Echo return loss enhancement factor is plotted. If getting ERLE on or above 10dB it found to be good result and echo reduction. Pitch detection algorithms can be classified or divided into methods which can be operated in frequency domain, time domain or both together. One set of pitch detection methods uses the detection and timing of sometime domain features also. Different time domain methods use correlation functions or difference norms to detect similarity between the waveform and a time lagged set of them self. Another set of methods operates in the frequency domain by locating sinusoidal peaks in the frequency transform of the input audio signal. Human pitch lies in the interval Hz, where the pitch for men is usually around 150 Hz, for women about 250Hz and children a bit higher frequencies around 300Hz, the pitch is needed to construct this part of the speech signal. Some of the commonly used detection method uses Energy, Cepstral, Zero crossing, Pitch based on difference function and autocorrelation. Here used cepstral based pitch detection is used. B. Zero Crossing Rate: In a time domain feature detection method the signal is usually pre processed to accentuate some time domain feature and the time between occurrences of that feature is calculated as the period of the audio signal. The time between occurrences of a particular feature is used as the period estimation and feature detection schemes usually do not use all of the data available[8]. C. Separation of Voiced and Unvoiced Using Zero Ccrossing Rate and Energy of the Speech Signal: In speech analysis, the voiced-nonvoiced decision is usually performed in extracting the information from the speech signals. Used two features to separate the voiced and non voiced parts of speech. These are zero crossing rate (ZCR) and energy of spectrum. Fig 3. Screen shot of the ERLE III. SYSTEM BLOCK REPRESENTATION A. Pitch Extraction: To obtain speech and environmental sound audio signals using an in house built wearable device, benchmarked a set of integrated algorithms (sound speech detection and classification, voice and non voice segmentation, sound level meter calculation, speaker prediction and segmentation etc). Fig 5. Block representation of Voiced and Nonvoiced time calculation. 9 Fig 4. Pitch Detection Model Evaluates the results by dividing the speech sample into some segments and used the zero crossing rate and energy calculations to separate the voiced and non-voiced speech parts. The zero crossing rates are low for voiced part and high for unvoiced part where as the energy is high for voiced part and low for non-voiced part. Thus these methods are proved more effective in separation of voiced and nonvoiced speech parts and estimation of its time [9]. D. Power Normalized Cepstral Coefficients for Robust Speech Recognition:
4 Fig 6. The Structure of the PNCC feature extraction algorithm The development of PNCC feature extraction was motivated by a desire to obtain a set of practical features for speech recognition that are more robust with respect to acoustical variability, without loss of performance.the processing described in PNCC is followed by a series of nonlinear timevarying operations that are performed using the longer duration temporal analysis that accomplish noise subtraction as well as a degree of robustness with respect to reverberation. E. Speaker Diarization: 10 Fig 7. Speaker Diarization representation Speaker diarization is the process of detecting the turns in speech because of the changing of speaker and clustering the speech from the same speaker together, and thus provides useful information for the structuring and indexing of the audio document. Recorded audio file (.wav) of duration 10 seconds, sampling frequency 8000 Hz will be given as input. After reading the audio file, the audio signal will be filtered using Wiener filter, to remove noise and to smoothen the signal.linear prediction residual is used, privacy can be preserved[10]. Higher the linear prediction order, better the privacy. After this, linear prediction residual will be represented as PNCC will be used as the observed sequence for HMM with states represents the speaker. Since the numbers of speakers are unknown, data will be segmented into an initial number of clusters 10. (Assuming number of speakers is less than 10).Parameters of HMM are initialized by uniform segmentation of the data in terms of 10 clusters and estimate the parameters of the cluster GMM over these segments. The log-likelihood of the combined model is compared with the sum of the log likelihood of the original model. The pair, for which the log likelihood improvement is largest, will be combined. Using this combined model, make a new HMM. This process will be continued until there exists no more states to combine. Now each state corresponds to the single speaker in the audio file and data related to that state corresponds to the speech of that particular speaker. IV. CONCLUSIONS The analysis of longitudinal and unpredictable audio signal is dealt in this project. An efficient architecture has developed to enable continuous audio sensing and scalable methods to gather and analyse audio signal. This is used to capture the changeable/unstable characteristics of the longitudinal and unpredictable audio signals. The analysis of the audio data captured by the device has yielded significantly high performance for audio signal using the proposed architecture. The Echo cancellation algorithm used is successful to find a software solution for the problem of echoes in the communication. The proposed method is completely a software approach without utilizing any hardware components. Algorithm is capable of running in any PC with MATLAB software installed. This new technique provides almost perfect output for canceling echo without losing the speech signals.the results obtained were convincing. The audio of the output speech signals were highly satisfactory and validated the goals. Speaker diarization put forward has emerged as an increasingly important and dedicated domain of speech research. Speaker diarization has been developed in many domains. In this work, linear prediction residual represented in PNCC is used for speaker diarization. Nowadays privacy preserving is more important. Since linear prediction residual is used, privacy can be preserved, as intelligible speech cannot be reconstructed. Using this information physical and mental health of a person can be analyzed.
5 V. REFERENCES [1] Bin Gao, And Wai Lok Woo, "Wearable Audio Monitoring: Content-Based Processing Methodology and Implementation", IEEE Transactions on Human- Machine Systems, May 24, [2] Chanwoo Kim and Richard M. Stern, "Power- Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition", IEEE Transactions on Audio, Speech, and Language Processing, [3] Moore and B. R. Glasberg, "PNCC For Robust Speech Recognition", Carnegie Mellon University, Pittsburgh PA USA. [4] Mark S. Hawley, Stuart P. Cunningham, Phil D. Green, "A Voice-Input Voice-Output Communication Aid for People With Severe Speech Impairment", IEEE Transactions on Neural Systems and Rehabilitation Engineering, Vol. 21, No. 1, January [5] Junghsi Lee and Hsu-Chang Huang, "A Robust Double-Talk Detector for Acoustic Echo Cancellation",IMECS [6] Radhika hinaboina, D.S.Ramkiran, "Adaptive Algorithms For Acoustic Echo Cancellation In Speech Processing", IJRRAS 7 (1) April [7] Mario Mun oz-organero, Pedro J. Mun oz-merino, "Adapting the Speed of Reproduction of Audio Content and Using Text Reinforcement for Maximizing the Learning Outcome though Mobile Phones",IEEE Transactions on Learning Technologies, Vol. 4, NO. 3, July-September 2011 [8] Sin-Horng Chen, Shaw-Hwa Hwang and Yih-Ru Wang, "An RNN-Based Prosodic Information Synthesizer for Mandarin Text-to-Speech",IEEE Transactions on Speech and Audio Processing, Vol. 6, No. 3, May 1998 [9] J. Ajmera and C. Wooters, A robust speaker clustering algorithm, in Proc. IEEE Automatic Speech Recognition Understand. Workshop, 2003, pp [10] Marro et.al A Two-Step Noise Reduction Technique, in IEEE International Conference on Acoustics, Speech, Signal Processing, Montral, Canada,Vol.1, pp , May
Mel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationSpeech Enhancement Based On Noise Reduction
Speech Enhancement Based On Noise Reduction Kundan Kumar Singh Electrical Engineering Department University Of Rochester ksingh11@z.rochester.edu ABSTRACT This paper addresses the problem of signal distortion
More informationAcoustic Echo Cancellation using LMS Algorithm
Acoustic Echo Cancellation using LMS Algorithm Nitika Gulbadhar M.Tech Student, Deptt. of Electronics Technology, GNDU, Amritsar Shalini Bahel Professor, Deptt. of Electronics Technology,GNDU,Amritsar
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationCalibration of Microphone Arrays for Improved Speech Recognition
MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present
More informationDESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM
DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM Sandip A. Zade 1, Prof. Sameena Zafar 2 1 Mtech student,department of EC Engg., Patel college of Science and Technology Bhopal(India)
More informationPerformance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume - 3 Issue - 8 August, 2014 Page No. 7727-7732 Performance Analysis of MFCC and LPCC Techniques in Automatic
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationAdvanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses
Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses Andreas Spanias Robert Santucci Tushar Gupta Mohit Shah Karthikeyan Ramamurthy Topics This presentation
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationROBUST echo cancellation requires a method for adjusting
1030 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk Jean-Marc Valin, Member,
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationEC 6501 DIGITAL COMMUNICATION UNIT - II PART A
EC 6501 DIGITAL COMMUNICATION 1.What is the need of prediction filtering? UNIT - II PART A [N/D-16] Prediction filtering is used mostly in audio signal processing and speech processing for representing
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationAN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationCOMPARATIVE STUDY OF VARIOUS FIXED AND VARIABLE ADAPTIVE FILTERS IN WIRELESS COMMUNICATION FOR ECHO CANCELLATION USING SIMULINK MODEL
COMPARATIVE STUDY OF VARIOUS FIXED AND VARIABLE ADAPTIVE FILTERS IN WIRELESS COMMUNICATION FOR ECHO CANCELLATION USING SIMULINK MODEL Mr. R. M. Potdar 1, Mr. Mukesh Kumar Chandrakar 2, Mrs. Bhupeshwari
More informationAcoustic echo cancellers for mobile devices
Acoustic echo cancellers for mobile devices Mr.Shiv Kumar Yadav 1 Mr.Ravindra Kumar 2 Pratik Kumar Dubey 3, 1 Al-Falah School Of Engg. &Tech., Hayarana, India 2 Al-Falah School Of Engg. &Tech., Hayarana,
More informationAn Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation
An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,
More informationDesign and Implementation on a Sub-band based Acoustic Echo Cancellation Approach
Vol., No. 6, 0 Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach Zhixin Chen ILX Lightwave Corporation Bozeman, Montana, USA chen.zhixin.mt@gmail.com Abstract This paper
More informationChapter 4 SPEECH ENHANCEMENT
44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or
More informationSignal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2
Signal Processing for Speech Applications - Part 2-1 Signal Processing For Speech Applications - Part 2 May 14, 2013 Signal Processing for Speech Applications - Part 2-2 References Huang et al., Chapter
More informationOptimal Adaptive Filtering Technique for Tamil Speech Enhancement
Optimal Adaptive Filtering Technique for Tamil Speech Enhancement Vimala.C Project Fellow, Department of Computer Science Avinashilingam Institute for Home Science and Higher Education and Women Coimbatore,
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationUsing RASTA in task independent TANDEM feature extraction
R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t
More informationPerformance Analysis of Acoustic Echo Cancellation in Sound Processing
2016 IJSRSET Volume 2 Issue 3 Print ISSN : 2395-1990 Online ISSN : 2394-4099 Themed Section: Engineering and Technology Performance Analysis of Acoustic Echo Cancellation in Sound Processing N. Sakthi
More informationImproving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research
Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationChapter IV THEORY OF CELP CODING
Chapter IV THEORY OF CELP CODING CHAPTER IV THEORY OF CELP CODING 4.1 Introduction Wavefonn coders fail to produce high quality speech at bit rate lower than 16 kbps. Source coders, such as LPC vocoders,
More informationElectronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis
International Journal of Scientific and Research Publications, Volume 5, Issue 11, November 2015 412 Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis Shalate
More informationSpectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition
Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationNOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationDrum Transcription Based on Independent Subspace Analysis
Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationAudio Restoration Based on DSP Tools
Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract
More informationMODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS
MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationVoice Activity Detection for Speech Enhancement Applications
Voice Activity Detection for Speech Enhancement Applications E. Verteletskaya, K. Sakhnov Abstract This paper describes a study of noise-robust voice activity detection (VAD) utilizing the periodicity
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationRobust Speech Recognition Group Carnegie Mellon University. Telephone: Fax:
Robust Automatic Speech Recognition In the 21 st Century Richard Stern (with Alex Acero, Yu-Hsiang Chiu, Evandro Gouvêa, Chanwoo Kim, Kshitiz Kumar, Amir Moghimi, Pedro Moreno, Hyung-Min Park, Bhiksha
More informationSpeech Compression Using Voice Excited Linear Predictive Coding
Speech Compression Using Voice Excited Linear Predictive Coding Ms.Tosha Sen, Ms.Kruti Jay Pancholi PG Student, Asst. Professor, L J I E T, Ahmedabad Abstract : The aim of the thesis is design good quality
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationAudio Fingerprinting using Fractional Fourier Transform
Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationAdaptive Filters Wiener Filter
Adaptive Filters Wiener Filter Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationSPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT
SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com
More informationAcoustic Echo Cancellation for Noisy Signals
Acoustic Echo Cancellation for Noisy Signals Babilu Daniel Karunya University Coimbatore Jude.D.Hemanth Karunya University Coimbatore ABSTRACT Echo is the time delayed version of the original signal. Acoustic
More informationPOSSIBLY the most noticeable difference when performing
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 7, SEPTEMBER 2007 2011 Acoustic Beamforming for Speaker Diarization of Meetings Xavier Anguera, Associate Member, IEEE, Chuck Wooters,
More informationSpeech and Audio Processing Recognition and Audio Effects Part 3: Beamforming
Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering
More informationSpeech Recognition using FIR Wiener Filter
Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of
More informationMultirate Algorithm for Acoustic Echo Cancellation
Technology Volume 1, Issue 2, October-December, 2013, pp. 112-116, IASTER 2013 www.iaster.com, Online: 2347-6109, Print: 2348-0017 Multirate Algorithm for Acoustic Echo Cancellation 1 Ch. Babjiprasad,
More informationNoise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment
Noise Estimation and Noise Removal Techniques for Speech Recognition in Adverse Environment Urmila Shrawankar 1,3 and Vilas Thakare 2 1 IEEE Student Member & Research Scholar, (CSE), SGB Amravati University,
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationDigitally controlled Active Noise Reduction with integrated Speech Communication
Digitally controlled Active Noise Reduction with integrated Speech Communication Herman J.M. Steeneken and Jan Verhave TNO Human Factors, Soesterberg, The Netherlands herman@steeneken.com ABSTRACT Active
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationA variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP
7 3rd International Conference on Computational Systems and Communications (ICCSC 7) A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP Hongyu Chen College of Information
More informationAnalysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication
International Journal of Signal Processing Systems Vol., No., June 5 Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication S.
More informationAuditory modelling for speech processing in the perceptual domain
ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract
More informationAdaptive Noise Reduction Algorithm for Speech Enhancement
Adaptive Noise Reduction Algorithm for Speech Enhancement M. Kalamani, S. Valarmathy, M. Krishnamoorthi Abstract In this paper, Least Mean Square (LMS) adaptive noise reduction algorithm is proposed to
More informationLecture 4 Biosignal Processing. Digital Signal Processing and Analysis in Biomedical Systems
Lecture 4 Biosignal Processing Digital Signal Processing and Analysis in Biomedical Systems Contents - Preprocessing as first step of signal analysis - Biosignal acquisition - ADC - Filtration (linear,
More informationCan binary masks improve intelligibility?
Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +
More informationRobust speech recognition using temporal masking and thresholding algorithm
Robust speech recognition using temporal masking and thresholding algorithm Chanwoo Kim 1, Kean K. Chin 1, Michiel Bacchiani 1, Richard M. Stern 2 Google, Mountain View CA 9443 USA 1 Carnegie Mellon University,
More informationFaculty of science, Ibn Tofail Kenitra University, Morocco Faculty of Science, Moulay Ismail University, Meknès, Morocco
Design and Simulation of an Adaptive Acoustic Echo Cancellation (AEC) for Hands-ree Communications using a Low Computational Cost Algorithm Based Circular Convolution in requency Domain 1 *Azeddine Wahbi
More informationPower Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition
Power Function-Based Power Distribution Normalization Algorithm for Robust Speech Recognition Chanwoo Kim 1 and Richard M. Stern Department of Electrical and Computer Engineering and Language Technologies
More informationA multi-class method for detecting audio events in news broadcasts
A multi-class method for detecting audio events in news broadcasts Sergios Petridis, Theodoros Giannakopoulos, and Stavros Perantonis Computational Intelligence Laboratory, Institute of Informatics and
More informationMultimedia Signal Processing: Theory and Applications in Speech, Music and Communications
Brochure More information from http://www.researchandmarkets.com/reports/569388/ Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications Description: Multimedia Signal
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationDetection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio
>Bitzer and Rademacher (Paper Nr. 21)< 1 Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio Joerg Bitzer and Jan Rademacher Abstract One increasing problem for
More informationAdaptive Systems Homework Assignment 3
Signal Processing and Speech Communication Lab Graz University of Technology Adaptive Systems Homework Assignment 3 The analytical part of your homework (your calculation sheets) as well as the MATLAB
More informationEE 6422 Adaptive Signal Processing
EE 6422 Adaptive Signal Processing NANYANG TECHNOLOGICAL UNIVERSITY SINGAPORE School of Electrical & Electronic Engineering JANUARY 2009 Dr Saman S. Abeysekera School of Electrical Engineering Room: S1-B1c-87
More informationGSM Interference Cancellation For Forensic Audio
Application Report BACK April 2001 GSM Interference Cancellation For Forensic Audio Philip Harrison and Dr Boaz Rafaely (supervisor) Institute of Sound and Vibration Research (ISVR) University of Southampton,
More informationA Novel Hybrid Technique for Acoustic Echo Cancellation and Noise reduction Using LMS Filter and ANFIS Based Nonlinear Filter
A Novel Hybrid Technique for Acoustic Echo Cancellation and Noise reduction Using LMS Filter and ANFIS Based Nonlinear Filter Shrishti Dubey 1, Asst. Prof. Amit Kolhe 2 1Research Scholar, Dept. of E&TC
More informationAnalysis of LMS and NLMS Adaptive Beamforming Algorithms
Analysis of LMS and NLMS Adaptive Beamforming Algorithms PG Student.Minal. A. Nemade Dept. of Electronics Engg. Asst. Professor D. G. Ganage Dept. of E&TC Engg. Professor & Head M. B. Mali Dept. of E&TC
More informationIEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 12, DECEMBER 2002 1865 Transactions Letters Fast Initialization of Nyquist Echo Cancelers Using Circular Convolution Technique Minho Cheong, Student Member,
More informationA Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation
A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation SEPTIMIU MISCHIE Faculty of Electronics and Telecommunications Politehnica University of Timisoara Vasile
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationActive Noise Cancellation System Using DSP Prosessor
International Journal of Scientific & Engineering Research, Volume 4, Issue 4, April-2013 699 Active Noise Cancellation System Using DSP Prosessor G.U.Priyanga, T.Sangeetha, P.Saranya, Mr.B.Prasad Abstract---This
More informationInternational Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015
RESEARCH ARTICLE OPEN ACCESS A Comparative Study on Feature Extraction Technique for Isolated Word Speech Recognition Easwari.N 1, Ponmuthuramalingam.P 2 1,2 (PG & Research Department of Computer Science,
More informationEnhancement of Speech in Noisy Conditions
Enhancement of Speech in Noisy Conditions Anuprita P Pawar 1, Asst.Prof.Kirtimalini.B.Choudhari 2 PG Student, Dept. of Electronics and Telecommunication, AISSMS C.O.E., Pune University, India 1 Assistant
More informationNarrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators
374 IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 52, NO. 2, MARCH 2003 Narrow-Band Interference Rejection in DS/CDMA Systems Using Adaptive (QRD-LSL)-Based Nonlinear ACM Interpolators Jenq-Tay Yuan
More informationAutomotive three-microphone voice activity detector and noise-canceller
Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR
More information