Selected Research Signal & Information Processing Group

Similar documents
Improving Robustness against Environmental Sounds for Directing Attention of Social Robots

AAU SUMMER SCHOOL PROGRAMMING SOCIAL ROBOTS FOR HUMAN INTERACTION LECTURE 10 MULTIMODAL HUMAN-ROBOT INTERACTION

Microphone Array Design and Beamforming

Multi-band long-term signal variability features for robust voice activity detection

Using RASTA in task independent TANDEM feature extraction

Epoch Extraction From Emotional Speech

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Speaker and Noise Independent Voice Activity Detection

Lecture 2: Sensors. Zheng-Hua Tan

NOISE ESTIMATION IN A SINGLE CHANNEL

Speech/Music Discrimination via Energy Density Analysis

VOICE ACTIVITY DETECTION USING NEUROGRAMS. Wissam A. Jassim and Naomi Harte

Multi-Modal User Interaction

SOUND SOURCE RECOGNITION FOR INTELLIGENT SURVEILLANCE

Introduction of Audio and Music

arxiv: v2 [cs.sd] 15 May 2018

Learning to Unlearn and Relearn Speech Signal Processing using Neural Networks: current and future perspectives

Visvesvaraya Technological University, Belagavi

Lecture 4: Digital representation and data analysis

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

The Effects of Entrainment in a Tutoring Dialogue System. Huy Nguyen, Jesse Thomason CS 3710 University of Pittsburgh

A Survey and Evaluation of Voice Activity Detection Algorithms

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

MULTI-MICROPHONE FUSION FOR DETECTION OF SPEECH AND ACOUSTIC EVENTS IN SMART SPACES

IMPROVEMENTS TO THE IBM SPEECH ACTIVITY DETECTION SYSTEM FOR THE DARPA RATS PROGRAM

Progress in the BBN Keyword Search System for the DARPA RATS Program

ENHANCED HUMAN-AGENT INTERACTION: AUGMENTING INTERACTION MODELS WITH EMBODIED AGENTS BY SERAFIN BENTO. MASTER OF SCIENCE in INFORMATION SYSTEMS

Book Chapters. Refereed Journal Publications J11

Testing the Intelligibility of Corrupted Speech with an Automated Speech Recognition System

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

TA2 Newsletter April 2010

4-206 CST Voice: (315) (o), (315) (m) Department of EECS Fax: (315)

Language, Context and Location

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

* Intelli Robotic Wheel Chair for Specialty Operations & Physically Challenged

Audio Classification by Search of Primary Components

Clean Water Modelling Advisory Group Autumn Conference

Mikko Myllymäki and Tuomas Virtanen

A SUPERVISED SIGNAL-TO-NOISE RATIO ESTIMATION OF SPEECH SIGNALS. Pavlos Papadopoulos, Andreas Tsiartas, James Gibson, and Shrikanth Narayanan

Natural Interaction with Social Robots

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

NCCT IEEE PROJECTS ADVANCED ROBOTICS SOLUTIONS. Latest Projects, in various Domains. Promise for the Best Projects

TECHNOLOGICAL COOPERATION MISSION COMPANY PARTNER SEARCH

An Optimization of Audio Classification and Segmentation using GASOM Algorithm

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

CONTROL OF SENSORS FOR SEQUENTIAL DETECTION A STOCHASTIC APPROACH

Android Speech Interface to a Home Robot July 2012

Different Approaches of Spectral Subtraction Method for Speech Enhancement

ROTATIONAL RESET STRATEGY FOR ONLINE SEMI-SUPERVISED NMF-BASED SPEECH ENHANCEMENT FOR LONG RECORDINGS

The ASVspoof 2017 Challenge: Assessing the Limits of Replay Spoofing Attack Detection

A multi-class method for detecting audio events in news broadcasts

Robust Low-Resource Sound Localization in Correlated Noise

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Curriculum Vitae. Petar M. Djurić

MATLAB DIGITAL IMAGE/SIGNAL PROCESSING TITLES

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May ISSN

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Biometric Recognition: How Do I Know Who You Are?

PLACEMENT BROCHURE COMMUNICATION ENGINEERING

Speech Enhancement using Wiener filtering

Front Digital page Strategy and Leadership

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Combining Voice Activity Detection Algorithms by Decision Fusion

AUTOMATIC MODULATION RECOGNITION OF COMMUNICATION SIGNALS

RESEARCH AND DEVELOPMENT OF DSP-BASED FACE RECOGNITION SYSTEM FOR ROBOTIC REHABILITATION NURSING BEDS

This list supersedes the one published in the November 2002 issue of CR.

Voices Obscured in Complex Environmental Settings (VOiCES) corpus

ENTRAINMENT IN THE SUPREME COURT

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Audio Fingerprinting using Fractional Fourier Transform

INTERNATIONAL TELECOMMUNICATION UNION

An Improved Voice Activity Detection Based on Deep Belief Networks

EIE 528 Power System Operation & Control(2 Units)

Autonomous Face Recognition

Chapter IV THEORY OF CELP CODING

A General Architecture for Self-Adaptive AmI Components Applied in Speech Recognition

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

High-speed Noise Cancellation with Microphone Array

Vision & Industry 4.0: Towards smarter sensors. Dr. Amina Chebira Vision Embedded Systems, CSEM SA October 4 th, 2016

Social Big Data. LauritzenConsulting. Content and applications. Key environments and star researchers. Potential for attracting investment

FACULTY PROFILE. Total Experience : 18 Years 7 Months Academic : 18 Years 7 Months. Degree Branch / Specialization College University

Global SNR Estimation of Speech Signals for Unknown Noise Conditions using Noise Adapted Non-linear Regression

Front Digital page Strategy and leadership

Intelligent Power Economy System (Ipes)

Audiovisual speech source separation: a regularization method based on visual voice activity detection

Voice Activity Detection

1997 Annual Surveys of Journalism & Mass Communication Survey of Enrollments Survey of Graduates

I. INTRODUCTION II. LITERATURE SURVEY. International Journal of Advanced Networking & Applications (IJANA) ISSN:

Journal Title ISSN 5. MIS QUARTERLY BRIEFINGS IN BIOINFORMATICS

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Convolutional Neural Networks: Real Time Emotion Recognition

The Effects of Noise on Acoustic Parameters

Data processing framework for decision making

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

A New Framework for Supervised Speech Enhancement in the Time Domain

Proposers Day Workshop

Policing in the 21 st Century. Response from the Royal Academy of Engineering to the Home Affairs Select Committee

Transcription:

COST Action IC1206 - MC Meeting Selected Research Activities @ Signal & Information Processing Group Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk 1

Outline Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work 2

Aalborg, Denmark 3

Aalborg University Inaugurated in 1974 in Aalborg (population: 200,000). 20,000 students, 2,000 research personnel 12.5% international students (most graduate programmes taught in English) Engineering, natural and social sciences, medicine, and humanities Department of Electronic Systems: 300+ employees. Renowned for: Project oriented problem based learning in teams Interdisciplinarity and cooperation with industry Network university: Campus in Aalborg, Esbjerg, and Copenhagen 4

Research areas of the SIP group Speech and language processing, multimedia signal processing, machine learning, pattern recognition, Usability engineering, human computer/robot interaction, Signal processing, numerical linear algebra, statistics, compressed sensing, optimization, Reconfigurable architectures, resource optimal hardware/software co-design, computing, high performance scientific computing, 5

Funding agencies and companies + collaboration with a dozen of universities and institutes worldwide. 6

Outline Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work Denoising and VAD for SID; SID of disguised voice Age and gender identification for recommender systems Durable Interaction with Socially Intelligent Robots 7

On-going related projects A Robust Audio-based Hybrid Recommendations Framework for Interactive TV. (TESCO UK using faces is on headlines.) Bang & Olufsen A/S and The Danish Council for Technology and Innovation. isrobot - Durable Interaction with Socially Intelligent Robots. The Danish Council for Independent Research in Technology and Production Sciences. CoSound A Cognitive Systems Approach to Enriched and Actionable Information from Audio Streams Danish Strategic Research Council. Speaker Recognition under Adverse Environments Subproject supported by European Commission Erasmus Mobility for Life Scholarship. 8

Research topics Speaker identification under adverse environments Acoustic noise (denoising, VAD) Disguised voice (multistyle training, multiple frame rates) Age, gender and emotion identification For TV recommender systems For human robot interaction Audio-visual fusion based on sensor networks 9

VAD for speaker identification Two-pass segment-based denoising and voice activity detection (VAD) DARPA Robust Automatic Transcription of Speech (RATS) database 10

Challenge to denoising and VAD: non-stationary noise The burst-like noise requires special attention as it makes existing methods fail. Zheng-Hua Tan Mataro, Spain, 11/2013 11

Two-pass segment-based de-noising and VAD Considering the very different characteristics of stationary and burst-like noise 1 st pass: 1) High-energy segments are detected by using a posteriori SNR weighted energy difference (SNR-dE). [Z.-H. Tan and B. Lindberg, IEEE Journal of Selected Topics in Signal Processing, 2010.] 2) Within a high-energy segment, if no pitch is found, the segment is classified as noise. 2 nd pass: Stationary noise is removed by a modified MSNE method. VAD approach is applied to the denoised data. Zheng-Hua Tan Mataro, Spain, 11/2013 12

De-noising and VAD results for known data Zheng-Hua Tan Known data, channel H Mataro, Spain, 11/2013 13

De-noising and VAD results for unknown data Zheng-Hua Tan Unknown data, channel H Mataro, Spain, 11/2013 14

Speaker ID system performance O. Plchot, S. Matsoukas, P. Matejka, N. Dehak, J. Ma, S. Cumani, O. Glembek, H. Hermansky, S.H. Mallidi, N. Mesgarani, R. Schwartz, M. Soufifar, Z.-H. Tan, S. Thomas, B. Zhang and X. Zhou, Developing a Speaker Identification System for the DARPA RATS project, ICASSP 2013. 15

Age and gender ID for recommender systems Sven Ewan Shepstone, Zheng-Hua Tan and Søren Holdt Jensen, "Audio-based Age and Gender Identification to Enhance the Recommendation of TV Content," IEEE Transactions on Consumer Electronics, vol. 59, no. 3, pp. 721-729, August 2013. Sven Ewan Shepstone, Zheng-Hua Tan and Søren Holdt Jensen, Demographic Recommendation by means of Group Profile Elicitation Using Speaker Age and Gender Recognition, Interspeech 2013, Lyon, France, August 25-29, 2013. 16

Project overview A user profile is needed to make good TV recommendations. An audio classifier, as opposed to manual data or usage patterns, is used to implicitly gather data for the user profile. Age and gender are useful parameters for recommendations. Current accuracy for age and gender detection (7 classes) is just over 50 % (the agender corpus). There can be large confusion between age and gender classes. Hypothesis: Are items that are recommended based on the age-and-gender extracted profile perceived to be better than random items? 17

Matching and recommendation Recommendation Strategy Group profile adaptation (if necessary) to convert an M-user group profile to an N-slot content profile. Genetic Selection Algorithm (k chromosomes where each chromosome is a sequence of items) 18

Age and gender classification 7 age-and-gender classes: Child(C), Young Male(YM), Young Female(YF), Adult Male(AM), Adult Female(AF), Senior Male(SM) and Senior Female(SF). A Viewer Configuration is a profile for the group, e.g. C, C, SF. Each speaker is connected to real speaker utterances from the agender corpus. These are classified to determine each user s age and gender profile. Age and gender classification using both acoustic and prosodic features. 19

User study and results TV2 (Danish Broadcaster) advertisement corpus used. Results Significant increase in median rating for recommended ads (7.75 as opposed to 4.25). Conclusion: This work shows the potential of using age and gender audio classification for recommending sequences of video clips to group viewers. 20

Durable Interaction with Socially Intelligent Robots (isrobot) Socially assistive robots increase the quality of life decrease the expense in social care Robotics will be as important tomorrow as computers are today. - Aldebaran Robotics. "I can envision a future in which robotic devices will become a nearly ubiquitous part of our day to day lives." - Bill Gates. The global service robotics market: 2012 $20 billion, 2017 $46 billion (17.4% increase annually). 21

isrobot cont. Objective: To enable socially assistive robots to feel and express feelings with the ultimate goal of establishing durable social interaction. The Danish Council for Independent Research. 2013-2017. Challenges Low signal quality due to environmental noises and imperfect placement of sensors that significantly degrades the robot s capability to sense Lack of understanding of users and context making a robot a pet only with limited richness in expression. Social intelligence and durable interaction require the robot to locate, recognize and feel its users and to respond with awareness. 22

Summary Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work Denoising and VAD for SID; SID of disguised voice Age and gender identification for recommender systems Durable Interaction with Socially Intelligent Robots Thank you for your attention! 23