Selected Research Signal & Information Processing Group

COST Action IC1206 - MC Meeting Selected Research Activities @ Signal & Information Processing Group Zheng-Hua Tan Dept. of Electronic Systems, Aalborg Univ., Denmark zt@es.aau.dk 1

Outline Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work 2

Aalborg, Denmark 3

Aalborg University Inaugurated in 1974 in Aalborg (population: 200,000). 20,000 students, 2,000 research personnel 12.5% international students (most graduate programmes taught in English) Engineering, natural and social sciences, medicine, and humanities Department of Electronic Systems: 300+ employees. Renowned for: Project oriented problem based learning in teams Interdisciplinarity and cooperation with industry Network university: Campus in Aalborg, Esbjerg, and Copenhagen 4

Research areas of the SIP group Speech and language processing, multimedia signal processing, machine learning, pattern recognition, Usability engineering, human computer/robot interaction, Signal processing, numerical linear algebra, statistics, compressed sensing, optimization, Reconfigurable architectures, resource optimal hardware/software co-design, computing, high performance scientific computing, 5

Funding agencies and companies + collaboration with a dozen of universities and institutes worldwide. 6

Outline Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work Denoising and VAD for SID; SID of disguised voice Age and gender identification for recommender systems Durable Interaction with Socially Intelligent Robots 7

On-going related projects A Robust Audio-based Hybrid Recommendations Framework for Interactive TV. (TESCO UK using faces is on headlines.) Bang & Olufsen A/S and The Danish Council for Technology and Innovation. isrobot - Durable Interaction with Socially Intelligent Robots. The Danish Council for Independent Research in Technology and Production Sciences. CoSound A Cognitive Systems Approach to Enriched and Actionable Information from Audio Streams Danish Strategic Research Council. Speaker Recognition under Adverse Environments Subproject supported by European Commission Erasmus Mobility for Life Scholarship. 8

Research topics Speaker identification under adverse environments Acoustic noise (denoising, VAD) Disguised voice (multistyle training, multiple frame rates) Age, gender and emotion identification For TV recommender systems For human robot interaction Audio-visual fusion based on sensor networks 9

VAD for speaker identification Two-pass segment-based denoising and voice activity detection (VAD) DARPA Robust Automatic Transcription of Speech (RATS) database 10

Challenge to denoising and VAD: non-stationary noise The burst-like noise requires special attention as it makes existing methods fail. Zheng-Hua Tan Mataro, Spain, 11/2013 11

Two-pass segment-based de-noising and VAD Considering the very different characteristics of stationary and burst-like noise 1 st pass: 1) High-energy segments are detected by using a posteriori SNR weighted energy difference (SNR-dE). [Z.-H. Tan and B. Lindberg, IEEE Journal of Selected Topics in Signal Processing, 2010.] 2) Within a high-energy segment, if no pitch is found, the segment is classified as noise. 2 nd pass: Stationary noise is removed by a modified MSNE method. VAD approach is applied to the denoised data. Zheng-Hua Tan Mataro, Spain, 11/2013 12

De-noising and VAD results for known data Zheng-Hua Tan Known data, channel H Mataro, Spain, 11/2013 13

De-noising and VAD results for unknown data Zheng-Hua Tan Unknown data, channel H Mataro, Spain, 11/2013 14

Speaker ID system performance O. Plchot, S. Matsoukas, P. Matejka, N. Dehak, J. Ma, S. Cumani, O. Glembek, H. Hermansky, S.H. Mallidi, N. Mesgarani, R. Schwartz, M. Soufifar, Z.-H. Tan, S. Thomas, B. Zhang and X. Zhou, Developing a Speaker Identification System for the DARPA RATS project, ICASSP 2013. 15

Age and gender ID for recommender systems Sven Ewan Shepstone, Zheng-Hua Tan and Søren Holdt Jensen, "Audio-based Age and Gender Identification to Enhance the Recommendation of TV Content," IEEE Transactions on Consumer Electronics, vol. 59, no. 3, pp. 721-729, August 2013. Sven Ewan Shepstone, Zheng-Hua Tan and Søren Holdt Jensen, Demographic Recommendation by means of Group Profile Elicitation Using Speaker Age and Gender Recognition, Interspeech 2013, Lyon, France, August 25-29, 2013. 16

Project overview A user profile is needed to make good TV recommendations. An audio classifier, as opposed to manual data or usage patterns, is used to implicitly gather data for the user profile. Age and gender are useful parameters for recommendations. Current accuracy for age and gender detection (7 classes) is just over 50 % (the agender corpus). There can be large confusion between age and gender classes. Hypothesis: Are items that are recommended based on the age-and-gender extracted profile perceived to be better than random items? 17

Matching and recommendation Recommendation Strategy Group profile adaptation (if necessary) to convert an M-user group profile to an N-slot content profile. Genetic Selection Algorithm (k chromosomes where each chromosome is a sequence of items) 18

Age and gender classification 7 age-and-gender classes: Child(C), Young Male(YM), Young Female(YF), Adult Male(AM), Adult Female(AF), Senior Male(SM) and Senior Female(SF). A Viewer Configuration is a profile for the group, e.g. C, C, SF. Each speaker is connected to real speaker utterances from the agender corpus. These are classified to determine each user s age and gender profile. Age and gender classification using both acoustic and prosodic features. 19

User study and results TV2 (Danish Broadcaster) advertisement corpus used. Results Significant increase in median rating for recommended ads (7.75 as opposed to 4.25). Conclusion: This work shows the potential of using age and gender audio classification for recommending sequences of video clips to group viewers. 20

Durable Interaction with Socially Intelligent Robots (isrobot) Socially assistive robots increase the quality of life decrease the expense in social care Robotics will be as important tomorrow as computers are today. - Aldebaran Robotics. "I can envision a future in which robotic devices will become a nearly ubiquitous part of our day to day lives." - Bill Gates. The global service robotics market: 2012 $20 billion, 2017 $46 billion (17.4% increase annually). 21

isrobot cont. Objective: To enable socially assistive robots to feel and express feelings with the ultimate goal of establishing durable social interaction. The Danish Council for Independent Research. 2013-2017. Challenges Low signal quality due to environmental noises and imperfect placement of sensors that significantly degrades the robot s capability to sense Lack of understanding of users and context making a robot a pet only with limited richness in expression. Social intelligence and durable interaction require the robot to locate, recognize and feel its users and to respond with awareness. 22

Summary Introduction to Signal and Information Processing Group, Department of Electronic Systems, Aalborg University COST Action IC1206 related activities and work Denoising and VAD for SID; SID of disguised voice Age and gender identification for recommender systems Durable Interaction with Socially Intelligent Robots Thank you for your attention! 23