LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION. Ross Cutler and Larry Davis

Size: px
Start display at page:

Download "LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION. Ross Cutler and Larry Davis"

Transcription

1 LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION Ross Cutler and Larry Davis Institute for Advanced Computer Studies University of Maryland, College Park ABSTRACT The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speechreading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a TDNN, which is then used to perform a spatio-temporal search for a speaking person. Applications include video conferencing, video indexing, and improving human computer interaction (HCI). An example HCI application is provided. 1. INTRODUCTION The visual motion of a speaker s mouth is highly correlated with the audio data generated from the voicebox and mouth [8]. This fact has been exploited for lip/speechreading (e.g, [17, 13]) and for combined audio-visual speech recognition (e.g., [5]). We utilize this correlation to detect speakers using video and audio input from a single microphone. We learn the audio-visual correlation in speaking using a timedelayed neural network (TDNN) [2], which is then used to search an audio-video input for speaking people. Applications of speaker detection include video conferencing, video indexing, and improving the human computer interface. In video conferencing, knowing where someone is speaking can cue a video camera to zoom in on the speaker; it can also be used to transmit only the speaker s video in bandwidth-limited conferencing applications. Speaker detection can also be used to index video (e.g., find me all clips of someone speaking ), and can be combined with face recognition techniques (e.g., find me all clips of Bill Clinton speaking ). Finally, speaker detection can be used to improve human computer interaction (HCI) by providing applications with the knowledge of when and where a user is speaking. We provide an example application in which the user speaks to the computer, and the computer performs an action using the time and location of the speaker Related work There has been a significant amount of work done in detecting faces from images and video (e.g., [16]). There has also been a significant amount of work done in locating speakers using arrays of microphones (e.g., [1]), and in identifying a specific individual speaking (e.g., [15]). Audio data can be used to animate lips for animated and real characters [3, 4]. Vision based techniques have also been used to detect people in front of kiosks [14, 7]. There are text-to-speech systems which utilize hand-coded phoneme-to-viseme rules to animate characters [18]. We are not aware of any previous work done that exploits the audio-visual correlation in speaking to detect speakers (spatially and temporally) in video with a single microphone Assumptions In this work, we assume that only one person is speaking at a time, and there is not significant background noise (though our audio S/N is not high in our test data). We also assume that the speaker does not move his head excessively during talking (though we suggest methods in Section 5 to handle this). 2. METHOD Our method exploits the correlation between mouth motions and audio data. Figure 1 shows a recurrence matrix [9, 6] of the mouth region image similarities and the corresponding audio data. A recurrence matrix is a qualitative tool used to perform time series analysis of non-linear dynamic systems. In this case, the recurrence matrix Ê is defined by Ê Ø ½ Ø ¾ µ Á ؽ Á ؾ µ where is the correlation of images Á ؽ and Á ؾ. In this figure, we see that times of change in the audio data are highly correlated with visual changes in the mouth. However, the relationship between the two signals is not simple, as changes in the audio signal do not necessarily imply changes in the visual signal (and vice versa), and the visual signal may lead or lag the audio signal sig-

2 Similarity of Image T 1 and T T T 1 Figure 1: Recurrence matrix of a 1 second talking sequence. The upper triangle is the similarity (absolute correlation) of the mouth region for images at times Ì ½ and Ì ¾, and the lower triangle is the similarity (Euclidean distance) of the corresponding audio signal at times Ì ½ and Ì ¾. Whiter pixels are more similar. nificantly (Bregler and Konig [5] use mutual information to show that audio data on average lagged behind the video data by approximately 12 ms in their dataset). In addition, the changes are highly context sensitive, analogous to the coarticulation problem in speech recognition. We utilize a TDNN to learn the context-dependent correlations between the audio and visual signals. The mel cepstrum coefficients Ø of the audio signal are used as the audio features, which are commonly used in speech recognition systems [1]. In our examples, we compute 12 mel cepstrum coefficients using a 1 ms window. For visual features, we utilize a simple measure of change between two images Á Ø and Á Ø ½ (absolute correlation): Ë Ø Ü Ýµ¾ÏØ Á Ø Ü Ýµ Á Ø ½ Ü Ýµ (1) where Ï Ø is a windowing function in Á Ø (typically a rectangle). In order to account for small translations of the head during speaking, the minimal Ë Ø is found by translating over a small search radius Ö: Ë ¼ Ø Ñ Ò Ü Ý Ö Ü Ýµ¾ÏØ Á Ø Ü Ü Ý Ýµ Á Ø ½ Ü Ýµ (2) The TDNN has an input layer consisting of [ Ø Æ,..., Ø,..., Ø Æ ] audio features and [Ë ¼,..., Ø ÆΠ˼ Ø,..., Ë ¼ ] Ø ÆÎ visual features. In our examples, we have used Æ and Æ Î Figure 2: Example image (64x48 pixels) used in training. For training, the positive visual features were computed using 6x3 window centered on the mouth. See Figure 3 for the corresponding feature vectors. such that approximately 2ms of context in each direction (symmetrically) is provided. There is one hidden layer, and only a single output node Ç Ø Ï, which indicates whether someone is speaking at time Ø in the window Ï Training The TDNN is trained using supervised learning and back propagation [12]. Specifically, for each image Á Ø, the output Ç Ø Ï is set to 1 where a person is talking, and otherwise. An example image is shown in Figure 2. An example of the feature vectors (the TDNN input) is shown in Figure 3. The training data consists of both positive data (Ç Ø Ï =1) and negative data (Ç Ø Ï =) Speaker detection Once the TDNN has been trained, it is evaluated on an audiovisual sequence to detect correlated motion and audio that is indicative of a person talking. Specifically, for a given image Á Ø and windowing function Ï centered at Ü Ýµ, we evaluate the TDNN output Ç Ø Ï at each Ü Ýµ ¾ Á Ø. The windowing function is typically rectangular, and of the size of an expected mouth. In our implementation, the dimensions of Ï are «by «Û pixels, where ¼, Û ¼, and «½ ¾ is a spatial scaling factor. During the search, we choose the ¾ that maximizes Ç Ø Ï to allow for a range of mouth sizes (primarily due to changes in person distance from the camera). The TDNN does not handle large changes in temporal scale from the training data. Therefore, the feature vectors Ø and Ë ¼ are linearly scaled by a time factor before Ø evaluating Ç Ø Ï. In our implementation, we selected the ¾ ¼ ¾ that maximizes Ç Ø Ï to allow for a significant

3 .4 Sound Mel Cepstrum Coefficients x Visual Features Time (frames) Figure 3: Training data example for a person saying computer nine times: (top) audio data (middle) mel cepstrum coefficients Ø, (bottom) visual features Ë ¼ Ø. variation in speaking rate between the training data and test data. Ç Ø Ï can be treated as a probability that there is someone speaking at time Ø, with a mouth in the window Ï. To achieve better robustness, Ç Ø Ï can be filtered to reduce spurious noise. While a Kalman filter could be used for this purpose, in our implementation we simply use a 3D moving average filter to compute Ç Ø Ï. 3. DESIGN DECISIONS In this section, we d like to discuss several important design decisions we have made. First, in detecting the audio-visual correlation, we could have hand-coded some rules that map phonemes-to-visemes. However, extracting phonemes is errorprone, as is the visemes. Moreover, to acurrately extract visemes would likely require greater resolution than we use in our test images (the mouth is about 1x3 pixels), and a sophisticated model-based visual feature extractor. There is also the problem of determining a suitable vismeme lexicon to map to; like phonemes, there are many visemes standards to choose from. Rather than use a rule-based system, we chose to use a TDNN. We structured the NN so that with sufficient training, it should learn similar phoneme-viseme rules, though without actually classifying a phoneme or viseme. While we have not done so, the trained NN could be tested with specific phoneme-visemes to determine if this correlation is actually learned. In choosing the visual features, we needed a feature that could be robustly determined at relatively low resolutions. It was also desireable to choose a method that could be im- Figure 4: Output of the HCI application which locates a speaking person and windows their head. The cross-hair marks the location of the detected mouth. The bounding box size is a function of Ï. plemented in real-time on a standard PC. The correlation feature satisfies both of these requirements. Finally, in choosing the audio features, we utilized features (mel cepstrum) that are commonly used in speech recognition systems. More sophisticated audio features could be utilized (e.g.,...), which could enhance the performance of the system, particularly in the precence of noise. 4. EXAMPLE APPLICATION In this section, we demonstrate the system with a simple HCI application. When the user speaks the word computer, the system will recognize when and where he is speaking, and will window his head for further processing (e.g., teleconferencing). An example image from a sequence is shown in Figure 4, and the corresponding features are shown in Figure 5. The output of the TDNN is shown in Figure 6. For this test sequence, the system correctly detected all 7 instances of the word computer. The system uses a Sony DFW-V5 64x48 3 FPS resolution camera, and is implemented on a standard PC workstation. The microphone is an inexpensive desktop microphone, and the audio is sampled at 2KHz 8-bit resolution. 5. CONCLUSIONS We used a TDNN to learn the audio-visual correlations of mouth regions during speaking. This was utilized to detect speakers (both spatially and temporally) using video and a single microphone as input. We demonstrated the utility of the system using an HCI application that recognized when and where a user was talking.

4 Sound Mel Cepstrum Coefficients Visual Features Time (frames) Figure 5: Features for the sequence shown in Figure 4. Note the audio signal has a much lower S/N ratio than in Figure 3, due to the greater distance from the microphone. Figure 6: Output of the TDNN for the image given in Figure 4. Dark pixels correspond to regions of high speaker probability. To utilize this method of speaker detection for more general applications, such as video conferencing and video indexing, a measure of image similarity must be chosen that is relatively invariant to translations and rotations of the head during speaking. One possible solution is to use an affine tracker to stabilize regions being tested [11]. Other visual feature vectors can also be extracted, such the optical flow around the mouth. To improve robustness and accuracy, our method can be combined with a face detector [16] and tracker [11], and voice detector [19]. 6. REFERENCES [1] S. Basu, M. Casey, W. Gardner, A. Azarbayejani, and A. Pentland. Vision-steered audio for interactive environments. Technical Report 373, MIT Media Lab Perceptual Computer Section, [2] C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, [3] M. Brand. Voice puppetry. In SIGGRAPH, [4] C. Bregler, M. Covell, and M. Slaney. Video rewrite: driving visual speech with audio. In SIGGRAPH, [5] C. Bregler and Y. Konig. Eigenlips for robust speech recognition. In ICASSP, [6] M. Casdagli. Recurrence plots revisited. Physica D, 18:12 44, [7] T. Darrell, G. Gordon, J. Woodfill, H. Baker, and M. Harville. A virtual mirror interface using real-time robust face tracking. In FG, [8] B. Dodd and R. Campbell. Hearing by Eye: The Psychology of Lipreading. Lawrence Erlbaum Press, [9] P. Eckmann, S. O. Kamphorst, and D. Ruelle. Recurrence plots of dynamical systems. J. of Europhysics Letters, 4: , [1] B. Gold and N. Morgan. Speech and audio signal processing. John Wiley and Sons, Inc., [11] G. Hager and K. Toyama. The xvision system: A generalpurpose substrate for portable real-time vision applications. Computer Vision and Image Understanding, 69(1):23 37, [12] J. Hertz, A. Krogh, and R. Palmer. Introduction to the theory of neural computation. Addison Wesley, [13] K. Mase and A. Pentland. Lip reading: automatic visual recognition of spoken words. In Proc. Image Understanding and Machine Vision, Optical Society of America, June [14] J. M. Rehg, K. P. Murphy, and P. W. Fieguth. Vision-based speaker detection using bayesian networks. In Proceedings of the Computer Vision and Pattern Recognition, [15] D. Reynolds and R. Rose. Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1):72 83, [16] H. A. Rowley, S. Baluja, and T. Kanade. Neural networkbased face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(1):23 38, January [17] D. Stork, G. Wolff, and E. Levine. Neural network lipreading system for improved speech recognition. IJCNN, 1992.

5 [18] K. Waters and T. Levergood. DECface: An automatic lipsynchronization algorithm for synthetic faces. Technical Report 93/4, DEC, September [19] T. Zhang and C.-C. Kuo. Hierarchical classification of audio data for archiving and retrieving. In ICCASP, 1999.

Vision-Based Speaker Detection Using Bayesian Networks

Vision-Based Speaker Detection Using Bayesian Networks Appears in Computer Vision and Pattern Recognition (CVPR 99), Ft. Collins, CO, June, 1999. Vision-Based Speaker Detection Using Bayesian Networks James M. Rehg Cambridge Research Lab Compaq Computer Corp.

More information

An Hybrid MLP-SVM Handwritten Digit Recognizer

An Hybrid MLP-SVM Handwritten Digit Recognizer An Hybrid MLP-SVM Handwritten Digit Recognizer A. Bellili ½ ¾ M. Gilloux ¾ P. Gallinari ½ ½ LIP6, Université Pierre et Marie Curie ¾ La Poste 4, Place Jussieu 10, rue de l Ile Mabon, BP 86334 75252 Paris

More information

Face Registration Using Wearable Active Vision Systems for Augmented Memory

Face Registration Using Wearable Active Vision Systems for Augmented Memory DICTA2002: Digital Image Computing Techniques and Applications, 21 22 January 2002, Melbourne, Australia 1 Face Registration Using Wearable Active Vision Systems for Augmented Memory Takekazu Kato Takeshi

More information

Frame-Rate Pupil Detector and Gaze Tracker

Frame-Rate Pupil Detector and Gaze Tracker Frame-Rate Pupil Detector and Gaze Tracker C.H. Morimoto Ý D. Koons A. Amir M. Flickner ÝDept. Ciência da Computação IME/USP - Rua do Matão 1010 São Paulo, SP 05508, Brazil hitoshi@ime.usp.br IBM Almaden

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Vision-based User-interfaces for Pervasive Computing. CHI 2003 Tutorial Notes. Trevor Darrell Vision Interface Group MIT AI Lab

Vision-based User-interfaces for Pervasive Computing. CHI 2003 Tutorial Notes. Trevor Darrell Vision Interface Group MIT AI Lab Vision-based User-interfaces for Pervasive Computing Tutorial Notes Vision Interface Group MIT AI Lab Table of contents Biographical sketch..ii Agenda..iii Objectives.. iv Abstract..v Introduction....1

More information

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS Kuldeep Kumar 1, R. K. Aggarwal 1 and Ankita Jain 2 1 Department of Computer Engineering, National Institute

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS KEER2010, PARIS MARCH 2-4 2010 INTERNATIONAL CONFERENCE ON KANSEI ENGINEERING AND EMOTION RESEARCH 2010 BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS Marco GILLIES *a a Department of Computing,

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES N. Sunil 1, K. Sahithya Reddy 2, U.N.D.L.mounika 3 1 ECE, Gurunanak Institute of Technology, (India) 2 ECE,

More information

Pose Invariant Face Recognition

Pose Invariant Face Recognition Pose Invariant Face Recognition Fu Jie Huang Zhihua Zhou Hong-Jiang Zhang Tsuhan Chen Electrical and Computer Engineering Department Carnegie Mellon University jhuangfu@cmu.edu State Key Lab for Novel

More information

Eyes n Ears: A System for Attentive Teleconferencing

Eyes n Ears: A System for Attentive Teleconferencing Eyes n Ears: A System for Attentive Teleconferencing B. Kapralos 1,3, M. Jenkin 1,3, E. Milios 2,3 and J. Tsotsos 1,3 1 Department of Computer Science, York University, North York, Canada M3J 1P3 2 Department

More information

Uplink and Downlink Beamforming for Fading Channels. Mats Bengtsson and Björn Ottersten

Uplink and Downlink Beamforming for Fading Channels. Mats Bengtsson and Björn Ottersten Uplink and Downlink Beamforming for Fading Channels Mats Bengtsson and Björn Ottersten 999-02-7 In Proceedings of 2nd IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications,

More information

Datong Chen, Albrecht Schmidt, Hans-Werner Gellersen

Datong Chen, Albrecht Schmidt, Hans-Werner Gellersen Datong Chen, Albrecht Schmidt, Hans-Werner Gellersen TecO (Telecooperation Office), University of Karlsruhe Vincenz-Prießnitz-Str.1, 76131 Karlruhe, Germany {charles, albrecht, hwg}@teco.uni-karlsruhe.de

More information

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory

More information

Effects of the Unscented Kalman Filter Process for High Performance Face Detector

Effects of the Unscented Kalman Filter Process for High Performance Face Detector Effects of the Unscented Kalman Filter Process for High Performance Face Detector Bikash Lamsal and Naofumi Matsumoto Abstract This paper concerns with a high performance algorithm for human face detection

More information

Speech Recognition using FIR Wiener Filter

Speech Recognition using FIR Wiener Filter Speech Recognition using FIR Wiener Filter Deepak 1, Vikas Mittal 2 1 Department of Electronics & Communication Engineering, Maharishi Markandeshwar University, Mullana (Ambala), INDIA 2 Department of

More information

Vision for a Smart Kiosk

Vision for a Smart Kiosk Appears in Computer Vision and Pattern Recognition, San Juan, PR, June, 1997, pages 690-696. Vision for a Smart Kiosk James M. Rehg Maria Loughlin Keith Waters Abstract Digital Equipment Corporation Cambridge

More information

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming

Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Speech and Audio Processing Recognition and Audio Effects Part 3: Beamforming Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Engineering

More information

Activity monitoring and summarization for an intelligent meeting room

Activity monitoring and summarization for an intelligent meeting room IEEE Workshop on Human Motion, Austin, Texas, December 2000 Activity monitoring and summarization for an intelligent meeting room Ivana Mikic, Kohsia Huang, Mohan Trivedi Computer Vision and Robotics Research

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Background Pixel Classification for Motion Detection in Video Image Sequences

Background Pixel Classification for Motion Detection in Video Image Sequences Background Pixel Classification for Motion Detection in Video Image Sequences P. Gil-Jiménez, S. Maldonado-Bascón, R. Gil-Pita, and H. Gómez-Moreno Dpto. de Teoría de la señal y Comunicaciones. Universidad

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION Scott Deeann Chen and Pierre Moulin University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering 5 North Mathews

More information

VICs: A Modular Vision-Based HCI Framework

VICs: A Modular Vision-Based HCI Framework VICs: A Modular Vision-Based HCI Framework The Visual Interaction Cues Project Guangqi Ye, Jason Corso Darius Burschka, & Greg Hager CIRL, 1 Today, I ll be presenting work that is part of an ongoing project

More information

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013

INTRODUCTION TO DEEP LEARNING. Steve Tjoa June 2013 INTRODUCTION TO DEEP LEARNING Steve Tjoa kiemyang@gmail.com June 2013 Acknowledgements http://ufldl.stanford.edu/wiki/index.php/ UFLDL_Tutorial http://youtu.be/ayzoubkuf3m http://youtu.be/zmnoatzigik 2

More information

First generation mobile communication systems (e.g. NMT and AMPS) are based on analog transmission techniques, whereas second generation systems

First generation mobile communication systems (e.g. NMT and AMPS) are based on analog transmission techniques, whereas second generation systems 1 First generation mobile communication systems (e.g. NMT and AMPS) are based on analog transmission techniques, whereas second generation systems (e.g. GSM and D-AMPS) are digital. In digital systems,

More information

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Airo Interantional Research Journal September, 2013 Volume II, ISSN: Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals

Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals Text and Language Independent Speaker Identification By Using Short-Time Low Quality Signals Maurizio Bocca*, Reino Virrankoski**, Heikki Koivo* * Control Engineering Group Faculty of Electronics, Communications

More information

FOCAL LENGTH CHANGE COMPENSATION FOR MONOCULAR SLAM

FOCAL LENGTH CHANGE COMPENSATION FOR MONOCULAR SLAM FOCAL LENGTH CHANGE COMPENSATION FOR MONOCULAR SLAM Takafumi Taketomi Nara Institute of Science and Technology, Japan Janne Heikkilä University of Oulu, Finland ABSTRACT In this paper, we propose a method

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Iris Recognition using Hamming Distance and Fragile Bit Distance

Iris Recognition using Hamming Distance and Fragile Bit Distance IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 06, 2015 ISSN (online): 2321-0613 Iris Recognition using Hamming Distance and Fragile Bit Distance Mr. Vivek B. Mandlik

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster) Lessons from Collecting a Million Biometric Samples 109 Expression Robust 3D Face Recognition by Matching Multi-component Local Shape Descriptors on the Nasal and Adjoining Cheek Regions 177 Shared Representation

More information

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments , pp.32-36 http://dx.doi.org/10.14257/astl.2016.129.07 Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments Viet Dung Do 1 and Dong-Min Woo 1 1 Department of

More information

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b R E S E A R C H R E P O R T I D I A P On Factorizing Spectral Dynamics for Robust Speech Recognition a Vivek Tyagi Hervé Bourlard a,b IDIAP RR 3-33 June 23 Iain McCowan a Hemant Misra a,b to appear in

More information

Service Robots in an Intelligent House

Service Robots in an Intelligent House Service Robots in an Intelligent House Jesus Savage Bio-Robotics Laboratory biorobotics.fi-p.unam.mx School of Engineering Autonomous National University of Mexico UNAM 2017 OUTLINE Introduction A System

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

Research Seminar. Stefano CARRINO fr.ch

Research Seminar. Stefano CARRINO  fr.ch Research Seminar Stefano CARRINO stefano.carrino@hefr.ch http://aramis.project.eia- fr.ch 26.03.2010 - based interaction Characterization Recognition Typical approach Design challenges, advantages, drawbacks

More information

Data Flow 4.{1,2}, 3.2

Data Flow 4.{1,2}, 3.2 < = = Computer Science Program, The University of Texas, Dallas Data Flow 4.{1,2}, 3.2 Batch Sequential Pipeline Systems Tektronix Case Study: Oscilloscope Formalization of Oscilloscope "systems where

More information

AUDIO VISUAL TRACKING OF A SPEAKER BASED ON FFT AND KALMAN FILTER

AUDIO VISUAL TRACKING OF A SPEAKER BASED ON FFT AND KALMAN FILTER AUDIO VISUAL TRACKING OF A SPEAKER BASED ON FFT AND KALMAN FILTER Muhammad Muzammel, Mohd Zuki Yusoff, Mohamad Naufal Mohamad Saad and Aamir Saeed Malik Centre for Intelligent Signal and Imaging Research,

More information

Neural Network Part 4: Recurrent Neural Networks

Neural Network Part 4: Recurrent Neural Networks Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017 http://pages.cs.wisc.edu/~yliang/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS

ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS ENF ANALYSIS ON RECAPTURED AUDIO RECORDINGS Hui Su, Ravi Garg, Adi Hajj-Ahmad, and Min Wu {hsu, ravig, adiha, minwu}@umd.edu University of Maryland, College Park ABSTRACT Electric Network (ENF) based forensic

More information

Voice Recognition Technology Using Neural Networks

Voice Recognition Technology Using Neural Networks Journal of New Technology and Materials JNTM Vol. 05, N 01 (2015)27-31 OEB Univ. Publish. Co. Voice Recognition Technology Using Neural Networks Abdelouahab Zaatri 1, Norelhouda Azzizi 2 and Fouad Lazhar

More information

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method

Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Direction-of-Arrival Estimation Using a Microphone Array with the Multichannel Cross-Correlation Method Udo Klein, Member, IEEE, and TrInh Qu6c VO School of Electrical Engineering, International University,

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Gesture Recognition with Real World Environment using Kinect: A Review

Gesture Recognition with Real World Environment using Kinect: A Review Gesture Recognition with Real World Environment using Kinect: A Review Prakash S. Sawai 1, Prof. V. K. Shandilya 2 P.G. Student, Department of Computer Science & Engineering, Sipna COET, Amravati, Maharashtra,

More information

White Intensity = 1. Black Intensity = 0

White Intensity = 1. Black Intensity = 0 A Region-based Color Image Segmentation Scheme N. Ikonomakis a, K. N. Plataniotis b and A. N. Venetsanopoulos a a Dept. of Electrical and Computer Engineering, University of Toronto, Toronto, Canada b

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

REFERENCES 4 CONCLUSIONS ACKNOWLEDGEMENT. Anticipated results for our investigations on acoustic and visual speech integration are:

REFERENCES 4 CONCLUSIONS ACKNOWLEDGEMENT. Anticipated results for our investigations on acoustic and visual speech integration are: Anticipated results for our investigations on acoustic and visual integration are: an improvement in recognition performance resulting from data fusion for normal input data and for a range of degraded

More information

OPPORTUNISTIC TRAFFIC SENSING USING EXISTING VIDEO SOURCES (PHASE II)

OPPORTUNISTIC TRAFFIC SENSING USING EXISTING VIDEO SOURCES (PHASE II) CIVIL ENGINEERING STUDIES Illinois Center for Transportation Series No. 17-003 UILU-ENG-2017-2003 ISSN: 0197-9191 OPPORTUNISTIC TRAFFIC SENSING USING EXISTING VIDEO SOURCES (PHASE II) Prepared By Jakob

More information

Controlling Humanoid Robot Using Head Movements

Controlling Humanoid Robot Using Head Movements Volume-5, Issue-2, April-2015 International Journal of Engineering and Management Research Page Number: 648-652 Controlling Humanoid Robot Using Head Movements S. Mounica 1, A. Naga bhavani 2, Namani.Niharika

More information

Audio Fingerprinting using Fractional Fourier Transform

Audio Fingerprinting using Fractional Fourier Transform Audio Fingerprinting using Fractional Fourier Transform Swati V. Sutar 1, D. G. Bhalke 2 1 (Department of Electronics & Telecommunication, JSPM s RSCOE college of Engineering Pune, India) 2 (Department,

More information

Separation and Recognition of multiple sound source using Pulsed Neuron Model

Separation and Recognition of multiple sound source using Pulsed Neuron Model Separation and Recognition of multiple sound source using Pulsed Neuron Model Kaname Iwasa, Hideaki Inoue, Mauricio Kugler, Susumu Kuroyanagi, Akira Iwata Nagoya Institute of Technology, Gokiso-cho, Showa-ku,

More information

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Improved Detection by Peak Shape Recognition Using Artificial Neural Networks Stefan Wunsch, Johannes Fink, Friedrich K. Jondral Communications Engineering Lab, Karlsruhe Institute of Technology Stefan.Wunsch@student.kit.edu,

More information

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image Somnath Mukherjee, Kritikal Solutions Pvt. Ltd. (India); Soumyajit Ganguly, International Institute of Information Technology (India)

More information

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number

5/17/2009. Digitizing Color. Place Value in a Binary Number. Place Value in a Decimal Number. Place Value in a Binary Number Chapter 11: Light, Sound, Magic: Representing Multimedia Digitally Digitizing Color Fluency with Information Technology Third Edition by Lawrence Snyder RGB Colors: Binary Representation Giving the intensities

More information

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Mariam Yiwere 1 and Eun Joo Rhee 2 1 Department of Computer Engineering, Hanbat National University,

More information

Robust Hand Gesture Recognition for Robotic Hand Control

Robust Hand Gesture Recognition for Robotic Hand Control Robust Hand Gesture Recognition for Robotic Hand Control Ankit Chaudhary Robust Hand Gesture Recognition for Robotic Hand Control 123 Ankit Chaudhary Department of Computer Science Northwest Missouri State

More information

Speech Signal Analysis

Speech Signal Analysis Speech Signal Analysis Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 2&3 14,18 January 216 ASR Lectures 2&3 Speech Signal Analysis 1 Overview Speech Signal Analysis for

More information

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT RASHMI MAKHIJANI Department of CSE, G. H. R.C.E., Near CRPF Campus,Hingna Road, Nagpur, Maharashtra, India rashmi.makhijani2002@gmail.com

More information

Face Registration Using Wearable Active Vision Systems for Augmented Memory

Face Registration Using Wearable Active Vision Systems for Augmented Memory DICTA2002: Digital Image Computing Techniques and Applications, 21 22 January 2002, Melbourne, Australia 1 Face Registration Using Wearable Active Vision Systems for Augmented Memory Takekazu Kato Takeshi

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

A Neural Solution for Signal Detection In Non-Gaussian Noise

A Neural Solution for Signal Detection In Non-Gaussian Noise 1 A Neural Solution for Signal Detection In Non-Gaussian Noise D G Khairnar, S N Merchant, U B Desai SPANN Laboratory Department of Electrical Engineering Indian Institute of Technology, Bombay, Mumbai-400

More information

Calibration of Microphone Arrays for Improved Speech Recognition

Calibration of Microphone Arrays for Improved Speech Recognition MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Calibration of Microphone Arrays for Improved Speech Recognition Michael L. Seltzer, Bhiksha Raj TR-2001-43 December 2001 Abstract We present

More information

Recognizing Talking Faces From Acoustic Doppler Reflections

Recognizing Talking Faces From Acoustic Doppler Reflections MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Recognizing Talking Faces From Acoustic Doppler Reflections Kaustubh Kalgaonkar, Bhiksha Raj TR2008-080 December 2008 Abstract Face recognition

More information

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis Volume 4, Issue 2, February 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Expectation

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

Introduction to Video Forgery Detection: Part I

Introduction to Video Forgery Detection: Part I Introduction to Video Forgery Detection: Part I Detecting Forgery From Static-Scene Video Based on Inconsistency in Noise Level Functions IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 5,

More information

Restoration of Motion Blurred Document Images

Restoration of Motion Blurred Document Images Restoration of Motion Blurred Document Images Bolan Su 12, Shijian Lu 2 and Tan Chew Lim 1 1 Department of Computer Science,School of Computing,National University of Singapore Computing 1, 13 Computing

More information

Color Constancy Using Standard Deviation of Color Channels

Color Constancy Using Standard Deviation of Color Channels 2010 International Conference on Pattern Recognition Color Constancy Using Standard Deviation of Color Channels Anustup Choudhury and Gérard Medioni Department of Computer Science University of Southern

More information

SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE

SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE Paper ID: AM-01 SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE Md. Rokunuzzaman* 1, Lutfun Nahar Nipa 1, Tamanna Tasnim Moon 1, Shafiul Alam 1 1 Department of Mechanical Engineering, Rajshahi University

More information

Compensation of a position servo

Compensation of a position servo UPPSALA UNIVERSITY SYSTEMS AND CONTROL GROUP CFL & BC 9610, 9711 HN & PSA 9807, AR 0412, AR 0510, HN 2006-08 Automatic Control Compensation of a position servo Abstract The angular position of the shaft

More information

Prof Trivedi ECE253A Notes for Students only

Prof Trivedi ECE253A Notes for Students only ECE 253A: Digital Processing: Course Related Class Website: https://sites.google.com/a/eng.ucsd.edu/ece253fall2017/ Course Graduate Assistants: Nachiket Deo Borhan Vasili Kirill Pirozenko Piazza Grading:

More information

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses Andreas Spanias Robert Santucci Tushar Gupta Mohit Shah Karthikeyan Ramamurthy Topics This presentation

More information

3D and Sequential Representations of Spatial Relationships among Photos

3D and Sequential Representations of Spatial Relationships among Photos 3D and Sequential Representations of Spatial Relationships among Photos Mahoro Anabuki Canon Development Americas, Inc. E15-349, 20 Ames Street Cambridge, MA 02139 USA mahoro@media.mit.edu Hiroshi Ishii

More information

Hand & Upper Body Based Hybrid Gesture Recognition

Hand & Upper Body Based Hybrid Gesture Recognition Hand & Upper Body Based Hybrid Gesture Prerna Sharma #1, Naman Sharma *2 # Research Scholor, G. B. P. U. A. & T. Pantnagar, India * Ideal Institue of Technology, Ghaziabad, India Abstract Communication

More information

Image Processing by Bilateral Filtering Method

Image Processing by Bilateral Filtering Method ABHIYANTRIKI An International Journal of Engineering & Technology (A Peer Reviewed & Indexed Journal) Vol. 3, No. 4 (April, 2016) http://www.aijet.in/ eissn: 2394-627X Image Processing by Bilateral Image

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

NEURALNETWORK BASED CLASSIFICATION OF LASER-DOPPLER FLOWMETRY SIGNALS

NEURALNETWORK BASED CLASSIFICATION OF LASER-DOPPLER FLOWMETRY SIGNALS NEURALNETWORK BASED CLASSIFICATION OF LASER-DOPPLER FLOWMETRY SIGNALS N. G. Panagiotidis, A. Delopoulos and S. D. Kollias National Technical University of Athens Department of Electrical and Computer Engineering

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Architectural Acoustics Session 1pAAa: Advanced Analysis of Room Acoustics:

More information

SmartCanvas: A Gesture-Driven Intelligent Drawing Desk System

SmartCanvas: A Gesture-Driven Intelligent Drawing Desk System SmartCanvas: A Gesture-Driven Intelligent Drawing Desk System Zhenyao Mo +1 213 740 4250 zmo@graphics.usc.edu J. P. Lewis +1 213 740 9619 zilla@computer.org Ulrich Neumann +1 213 740 0877 uneumann@usc.edu

More information

Discriminative Training for Automatic Speech Recognition

Discriminative Training for Automatic Speech Recognition Discriminative Training for Automatic Speech Recognition 22 nd April 2013 Advanced Signal Processing Seminar Article Heigold, G.; Ney, H.; Schluter, R.; Wiesler, S. Signal Processing Magazine, IEEE, vol.29,

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY A Speech/Music Discriminator Based on RMS and Zero-Crossings

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY A Speech/Music Discriminator Based on RMS and Zero-Crossings TRANSACTIONS ON MULTIMEDIA, VOL. 7, NO. 1, FEBRUARY 2005 1 A Speech/Music Discriminator Based on RMS and Zero-Crossings Costas Panagiotakis and George Tziritas, Senior Member, Abstract Over the last several

More information

Autonomous Vehicle Speaker Verification System

Autonomous Vehicle Speaker Verification System Autonomous Vehicle Speaker Verification System Functional Requirements List and Performance Specifications Aaron Pfalzgraf Christopher Sullivan Project Advisor: Dr. Jose Sanchez 4 November 2013 AVSVS 2

More information

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1

Recurrent neural networks Modelling sequential data. MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent neural networks Modelling sequential data MLP Lecture 9 / 13 November 2018 Recurrent Neural Networks 1: Modelling sequential data 1 Recurrent Neural Networks 1: Modelling sequential data Steve

More information

Smart antenna for doa using music and esprit

Smart antenna for doa using music and esprit IOSR Journal of Electronics and Communication Engineering (IOSRJECE) ISSN : 2278-2834 Volume 1, Issue 1 (May-June 2012), PP 12-17 Smart antenna for doa using music and esprit SURAYA MUBEEN 1, DR.A.M.PRASAD

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

A Comparison of Histogram and Template Matching for Face Verification

A Comparison of Histogram and Template Matching for Face Verification A Comparison of and Template Matching for Face Verification Chidambaram Chidambaram Universidade do Estado de Santa Catarina chidambaram@udesc.br Marlon Subtil Marçal, Leyza Baldo Dorini, Hugo Vieira Neto

More information

THE problem of acoustic echo cancellation (AEC) was

THE problem of acoustic echo cancellation (AEC) was IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 6, NOVEMBER 2005 1231 Acoustic Echo Cancellation and Doubletalk Detection Using Estimated Loudspeaker Impulse Responses Per Åhgren Abstract

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Video Synthesis System for Monitoring Closed Sections 1

Video Synthesis System for Monitoring Closed Sections 1 Video Synthesis System for Monitoring Closed Sections 1 Taehyeong Kim *, 2 Bum-Jin Park 1 Senior Researcher, Korea Institute of Construction Technology, Korea 2 Senior Researcher, Korea Institute of Construction

More information