Resynthesizing audiovisual percep5on with augmented reality

Size: px
Start display at page:

Download "Resynthesizing audiovisual percep5on with augmented reality"

Transcription

1 Resynthesizing audiovisual percep5on with augmented reality Parag K Mital Department of Compu5ng, Goldsmiths, University of London hbp://pkmital.com Presented for Lunch BITES, CULTURE Lab, Newcastle on 30/06/11

2 Ques5ons What computa5onal processes describe audiovisual percep5on in the real- world? What can augmented reality reveal about our underlying percep5on? Objec5ves Build computa5onal models of audio- visual aben5on using controlled experiments Interpret these models in a real- 5me context situated in real- life scenarios using augmented reality and re- synthesis techniques

3 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

4 Modeling A"en%on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

5 Experimental Psychology What processes describe human cogni5on? Visual cogni5on Vision research Auditory scene analysis Auditory aben5on Psychophysics Psychoacous5cs Mul5sensory/Crossmodal percep5on Film cogni5on

6 Computa5onal Cogni5on What computa5onal models best describe human cogni5on? Computer vision Computa5onal neuroscience Machine learning Speech recogni5on Saliency models

7 Dynamic Images and Eye Movements John Henderson, Tim Smith, Robin Hill, Parag K Mital awarded to John Henderson and funded by Leverhulme and ESRC Ques5on What drives human aben5on and eye- movement behavior during moving images? Objec5ves Build a corpus of eye- movement data and corresponding moving images Develop theories and tools for understanding ac5ve visual cogni5on

8 82 videos Range between 30 seconds and 3 minutes. 200 viewers+ Broad range of s5muli: adverts film clips real- world scenes social scenes film trailers video game trailers music videos documentaries news clips anima5on 8

9 Eye- tracking data CARPE X/Y coords of eyes per millisecond per eye per person, plus various eye- movement events and messages. >1000 lines of 8- column data per second! Gaze videos Gaussian Mixture Models Low- level feature visualiza5ons Op5cal flow, edges, gabors, flicker, chroma5city, luminance Dynamic Heatmap videos

10

11

12

13

14

15

16

17 Auditory ABen5on Modeling

18 Modeling ABen5on Prior Spectral/Region Segmenta%on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

19 Vision Processing Detec5on Features (SIFT, SURF, Harris Corners) Regions (Mean- shil, MSER) Haar- Features (Boosted Cascades, Viola- Jones) Templates (MI, SSD, Lucas- Kanade) Descrip5on Vector codes (GIST, SIFT, SURF, BRIEF) Trees (FlANN, LSH) Model- based reconstruc5on (PCA, plsa, LDA)

20 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

21 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

22 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

23 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

24 Source Separa5on Ques5on How can we describe a chunk of audio in terms of seman5c factors? Paris Smaragdis et al, Sparse and Shil- Invariant Feature Extrac5on From Non- Nega5ve Data

25 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta%on Synthesis Retrieval/Indexing Scene Reconstruc5on

26 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

27 Interpre5ng the Model in Real- Time Ques5on How can technology employing cogni5ve models help us to beber understand the model?

28 Human- Computer Interac5on Ques5on How can we build interfaces to our own perceptual processes? Augmented reality Interfaces for musical expression Robot percep5on

29 Corpus based resynthesis Catart SoundspoBer A new approach to crea5ng musical streams by selec5ng and concatena5ng source segments from a large audio database using methods from music informa5on retrieval (Casey, 2009) Casey, M Soundsposng: a new kind of process?. In The Oxford Handbook of Computer Music, ed. R. Dean New York: Oxford University Press.

30

31 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc%on

32 Sound Spa5aliza5on HRIR using both MIT and IRCAM LISTEN 1 Perceptual filter encoding source of sound [1]. hbp://recherche.ircam.fr/equipes/salles/listen

33 Loca5on of Impulse Responses

34 Convolu5on Convolu5on Impulse response Binaural Audio

35 hbp://pkmital.com

Introduction to Computer Engineering

Introduction to Computer Engineering Introduction to Computer Engineering Mohammad Hossein Manshaei manshaei@gmail.com Textbook Computer Science an Overview J.Glenn Brooksher, 11 th Edition Pearson 2011 2 Contents 1. Computer science vs computer

More information

Rhythm Analysis in Music

Rhythm Analysis in Music Rhythm Analysis in Music EECS 352: Machine Percep;on of Music & Audio Zafar Rafii, Winter 24 Some Defini;ons Rhythm movement marked by the regulated succession of strong and weak elements, or of opposite

More information

Recognizing Words in Scenes with a Head-Mounted Eye-Tracker

Recognizing Words in Scenes with a Head-Mounted Eye-Tracker Recognizing Words in Scenes with a Head-Mounted Eye-Tracker Takuya Kobayashi, Takumi Toyama, Faisal Shafait, Masakazu Iwamura, Koichi Kise and Andreas Dengel Graduate School of Engineering Osaka Prefecture

More information

Ivan Tashev Microsoft Research

Ivan Tashev Microsoft Research Hannes Gamper Microsoft Research David Johnston Microsoft Research Ivan Tashev Microsoft Research Mark R. P. Thomas Dolby Laboratories Jens Ahrens Chalmers University, Sweden Augmented and virtual reality,

More information

Computational Methods for Analysis of Footwear Impression Evidence

Computational Methods for Analysis of Footwear Impression Evidence Computational Methods for Analysis of Footwear Impression Evidence Sargur Srihari University at Buffalo, The State University of New York Presenta(on Outline Background on Shoeprint Evidence Database Crea(on

More information

An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning in Toddlers

An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning in Toddlers An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning in Toddlers S. Bambach, D. Crandall, L. Smith, C. Yu. ICDL 2017 Experiment presenters: Arjun, Ginevra Their Experiments Image source:

More information

Today. CS 232: Ar)ficial Intelligence. Introduc)on August 31, What is ar)ficial intelligence? What can AI do? What is this course?

Today. CS 232: Ar)ficial Intelligence. Introduc)on August 31, What is ar)ficial intelligence? What can AI do? What is this course? CS 232: Ar)ficial Intelligence Introduc)on August 31, 2015 Today What is ar)ficial intelligence? What can AI do? What is this course? [These slides were created by Dan Klein and Pieter Abbeel for CS188

More information

6.02 Fall 2013 Lecture #7

6.02 Fall 2013 Lecture #7 6. Fall Lecture #7 Viterbi decoding of convoluonal codes 6. Fall Lecture 7, Slide # Convolutional Coding Shift Register View + mod p [n] x[n] x[n-] x[n-] The values in the registers define the state of

More information

Computa(onal Vision Introduc(on and Overview. Lecture 1: Introduc(on Hamid Dehghani Office: UG38

Computa(onal Vision Introduc(on and Overview. Lecture 1: Introduc(on Hamid Dehghani Office: UG38 Computa(onal Vision Introduc(on and Overview Lecture 1: Introduc(on Hamid Dehghani Office: UG38 Schedule 1 Lecture / week 9 am, Fridays@ Nuffield G13 1 Lab / week 11 am Fridays, @ UG04, CS Modules webpages

More information

Exploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues

Exploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues The Technology of Binaural Listening & Understanding: Paper ICA216-445 Exploiting envelope fluctuations to achieve robust extraction and intelligent integration of binaural cues G. Christopher Stecker

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

Evaluating Context-Aware Saliency Detection Method

Evaluating Context-Aware Saliency Detection Method Evaluating Context-Aware Saliency Detection Method Christine Sawyer Santa Barbara City College Computer Science & Mechanical Engineering Funding: Office of Naval Research Defense University Research Instrumentation

More information

Bowdoin Computer Science

Bowdoin Computer Science Bowdoin Computer Science Reasons to study Computer Science Compu3ng is part of everything we do! Exper3se in compu3ng enables you to solve complex problems Compu3ng enables you to make a posi3ve difference

More information

Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories

Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories Qualitative and Quantitative Methods in Libraries (QQML) 4: 811-820, 2015 Quality Assessment Method for Warping and Cropping Error Detection in Digital Repositories Roman Graf and Ross King and Martin

More information

Computer Vision Slides curtesy of Professor Gregory Dudek

Computer Vision Slides curtesy of Professor Gregory Dudek Computer Vision Slides curtesy of Professor Gregory Dudek Ioannis Rekleitis Why vision? Passive (emits nothing). Discreet. Energy efficient. Intuitive. Powerful (works well for us, right?) Long and short

More information

Spatialization and Timbre for Effective Auditory Graphing

Spatialization and Timbre for Effective Auditory Graphing 18 Proceedings o1't11e 8th WSEAS Int. Conf. on Acoustics & Music: Theory & Applications, Vancouver, Canada. June 19-21, 2007 Spatialization and Timbre for Effective Auditory Graphing HONG JUN SONG and

More information

Effects of the Unscented Kalman Filter Process for High Performance Face Detector

Effects of the Unscented Kalman Filter Process for High Performance Face Detector Effects of the Unscented Kalman Filter Process for High Performance Face Detector Bikash Lamsal and Naofumi Matsumoto Abstract This paper concerns with a high performance algorithm for human face detection

More information

Bowdoin Computer Science

Bowdoin Computer Science Bowdoin Computer Science Reasons to study Computer Science Compu3ng is part of everything we do! Exper3se in compu3ng enables you to solve complex problems Compu3ng enables you to make a posi3ve difference

More information

LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION. Ross Cutler and Larry Davis

LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION. Ross Cutler and Larry Davis LOOK WHO S TALKING: SPEAKER DETECTION USING VIDEO AND AUDIO CORRELATION Ross Cutler and Larry Davis Institute for Advanced Computer Studies University of Maryland, College Park rgc,lsd @cs.umd.edu ABSTRACT

More information

Sound source localization and its use in multimedia applications

Sound source localization and its use in multimedia applications Notes for lecture/ Zack Settel, McGill University Sound source localization and its use in multimedia applications Introduction With the arrival of real-time binaural or "3D" digital audio processing,

More information

Video Segmentation and Its Applications

Video Segmentation and Its Applications Video Segmentation and Its Applications King Ngi Ngan Hongliang Li Editors Video Segmentation and Its Applications ABC Editors King Ngi Ngan Department of Electronic Engineering The Chinese University

More information

Environmental Sound Recognition using MP-based Features

Environmental Sound Recognition using MP-based Features Environmental Sound Recognition using MP-based Features Selina Chu, Shri Narayanan *, and C.-C. Jay Kuo * Speech Analysis and Interpretation Lab Signal & Image Processing Institute Department of Computer

More information

Real-Time Face Detection and Tracking for High Resolution Smart Camera System

Real-Time Face Detection and Tracking for High Resolution Smart Camera System Digital Image Computing Techniques and Applications Real-Time Face Detection and Tracking for High Resolution Smart Camera System Y. M. Mustafah a,b, T. Shan a, A. W. Azman a,b, A. Bigdeli a, B. C. Lovell

More information

Face Detection using 3-D Time-of-Flight and Colour Cameras

Face Detection using 3-D Time-of-Flight and Colour Cameras Face Detection using 3-D Time-of-Flight and Colour Cameras Jan Fischer, Daniel Seitz, Alexander Verl Fraunhofer IPA, Nobelstr. 12, 70597 Stuttgart, Germany Abstract This paper presents a novel method to

More information

Convolu'onal Neural Networks. November 17, 2015

Convolu'onal Neural Networks. November 17, 2015 Convolu'onal Neural Networks November 17, 2015 Ar'ficial Neural Networks Feedforward neural networks Ar'ficial Neural Networks Feedforward, fully-connected neural networks Ar'ficial Neural Networks Feedforward,

More information

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4 SOPA version 2 Revised July 7 2014 SOPA project September 21, 2014 Contents 1 Introduction 2 2 Basic concept 3 3 Capturing spatial audio 4 4 Sphere around your head 5 5 Reproduction 7 5.1 Binaural reproduction......................

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Audio Imputation Using the Non-negative Hidden Markov Model

Audio Imputation Using the Non-negative Hidden Markov Model Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.

More information

Embodiment Mark W. Newman SI 688 Fall 2010

Embodiment Mark W. Newman SI 688 Fall 2010 Embodiment Mark W. Newman SI 688 Fall 2010 Where the Action Is The cogni

More information

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES ROOM AND CONCERT HALL ACOUSTICS The perception of sound by human listeners in a listening space, such as a room or a concert hall is a complicated function of the type of source sound (speech, oration,

More information

A classification-based cocktail-party processor

A classification-based cocktail-party processor A classification-based cocktail-party processor Nicoleta Roman, DeLiang Wang Department of Computer and Information Science and Center for Cognitive Science The Ohio State University Columbus, OH 43, USA

More information

A VIDEO CAMERA ROAD SIGN SYSTEM OF THE EARLY WARNING FROM COLLISION WITH THE WILD ANIMALS

A VIDEO CAMERA ROAD SIGN SYSTEM OF THE EARLY WARNING FROM COLLISION WITH THE WILD ANIMALS Vol. 12, Issue 1/2016, 42-46 DOI: 10.1515/cee-2016-0006 A VIDEO CAMERA ROAD SIGN SYSTEM OF THE EARLY WARNING FROM COLLISION WITH THE WILD ANIMALS Slavomir MATUSKA 1*, Robert HUDEC 2, Patrik KAMENCAY 3,

More information

Time- frequency Masking

Time- frequency Masking Time- Masking EECS 352: Machine Percep=on of Music & Audio Zafar Rafii, Winter 214 1 STFT The Short- Time Fourier Transform (STFT) is a succession of local Fourier Transforms (FT) Time signal Real spectrogram

More information

Perceptual Interfaces. Matthew Turk s (UCSB) and George G. Robertson s (Microsoft Research) slides on perceptual p interfaces

Perceptual Interfaces. Matthew Turk s (UCSB) and George G. Robertson s (Microsoft Research) slides on perceptual p interfaces Perceptual Interfaces Adapted from Matthew Turk s (UCSB) and George G. Robertson s (Microsoft Research) slides on perceptual p interfaces Outline Why Perceptual Interfaces? Multimodal interfaces Vision

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

Today. CS 395T Visual Recognition. Course content. Administration. Expectations. Paper reviews

Today. CS 395T Visual Recognition. Course content. Administration. Expectations. Paper reviews Today CS 395T Visual Recognition Course logistics Overview Volunteers, prep for next week Thursday, January 18 Administration Class: Tues / Thurs 12:30-2 PM Instructor: Kristen Grauman grauman at cs.utexas.edu

More information

Opponent Colors Revisited. Sabine Süsstrunk Image and Visual Representation Lab

Opponent Colors Revisited. Sabine Süsstrunk Image and Visual Representation Lab Opponent Colors Revisited Sabine Süsstrunk Image and Visual Representation Lab A small exercise Ques0on: what color is [255,0,0]? A small exercise Ques0on: what color is [255,0,0]? Answer: red S.K. Shevell

More information

Lecture 14: Source Separation

Lecture 14: Source Separation ELEN E896 MUSIC SIGNAL PROCESSING Lecture 1: Source Separation 1. Sources, Mixtures, & Perception. Spatial Filtering 3. Time-Frequency Masking. Model-Based Separation Dan Ellis Dept. Electrical Engineering,

More information

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks Alfredo Zermini, Qiuqiang Kong, Yong Xu, Mark D. Plumbley, Wenwu Wang Centre for Vision,

More information

Data Insufficiency in Sketch Versus Photo Face Recognition

Data Insufficiency in Sketch Versus Photo Face Recognition CVPR Workshop in Biometrics 2012 Data Insufficiency in Sketch Versus Photo Face Recognition 17 June 2012 Jonghyun Choi Abhishek Sharma, David W. Jacobs, Larry S. Davis Ins=tute of Advanced Computer Studies

More information

Microphone Array Design and Beamforming

Microphone Array Design and Beamforming Microphone Array Design and Beamforming Heinrich Löllmann Multimedia Communications and Signal Processing heinrich.loellmann@fau.de with contributions from Vladi Tourbabin and Hendrik Barfuss EUSIPCO Tutorial

More information

Visual Search using Principal Component Analysis

Visual Search using Principal Component Analysis Visual Search using Principal Component Analysis Project Report Umesh Rajashekar EE381K - Multidimensional Digital Signal Processing FALL 2000 The University of Texas at Austin Abstract The development

More information

Pose Invariant Face Recognition

Pose Invariant Face Recognition Pose Invariant Face Recognition Fu Jie Huang Zhihua Zhou Hong-Jiang Zhang Tsuhan Chen Electrical and Computer Engineering Department Carnegie Mellon University jhuangfu@cmu.edu State Key Lab for Novel

More information

Interframe Coding of Global Image Signatures for Mobile Augmented Reality

Interframe Coding of Global Image Signatures for Mobile Augmented Reality Interframe Coding of Global Image Signatures for Mobile Augmented Reality David Chen 1, Mina Makar 1,2, Andre Araujo 1, Bernd Girod 1 1 Department of Electrical Engineering, Stanford University 2 Qualcomm

More information

DERIVATION OF TRAPS IN AUDITORY DOMAIN

DERIVATION OF TRAPS IN AUDITORY DOMAIN DERIVATION OF TRAPS IN AUDITORY DOMAIN Petr Motlíček, Doctoral Degree Programme (4) Dept. of Computer Graphics and Multimedia, FIT, BUT E-mail: motlicek@fit.vutbr.cz Supervised by: Dr. Jan Černocký, Prof.

More information

VIRTUAL REALITY Introduction. Emil M. Petriu SITE, University of Ottawa

VIRTUAL REALITY Introduction. Emil M. Petriu SITE, University of Ottawa VIRTUAL REALITY Introduction Emil M. Petriu SITE, University of Ottawa Natural and Virtual Reality Virtual Reality Interactive Virtual Reality Virtualized Reality Augmented Reality HUMAN PERCEPTION OF

More information

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions INTERSPEECH 2015 Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions Ning Ma 1, Guy J. Brown 1, Tobias May 2 1 Department of Computer

More information

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Prof. Feng Liu. Winter /09/2017

Prof. Feng Liu. Winter /09/2017 Prof. Feng Liu Winter 2017 http://www.cs.pdx.edu/~fliu/courses/cs410/ 01/09/2017 Today Course overview Computer vision Admin. Info Visual Computing at PSU Image representation Color 2 Big Picture: Visual

More information

Face detection, face alignment, and face image parsing

Face detection, face alignment, and face image parsing Lecture overview Face detection, face alignment, and face image parsing Brandon M. Smith Guest Lecturer, CS 534 Monday, October 21, 2013 Brief introduction to local features Face detection Face alignment

More information

VICs: A Modular Vision-Based HCI Framework

VICs: A Modular Vision-Based HCI Framework VICs: A Modular Vision-Based HCI Framework The Visual Interaction Cues Project Guangqi Ye, Jason Corso Darius Burschka, & Greg Hager CIRL, 1 Today, I ll be presenting work that is part of an ongoing project

More information

Monaural and Binaural Speech Separation

Monaural and Binaural Speech Separation Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as

More information

LCC 3710 Principles of Interaction Design. Readings. Sound in Interfaces. Speech Interfaces. Speech Applications. Motivation for Speech Interfaces

LCC 3710 Principles of Interaction Design. Readings. Sound in Interfaces. Speech Interfaces. Speech Applications. Motivation for Speech Interfaces LCC 3710 Principles of Interaction Design Class agenda: - Readings - Speech, Sonification, Music Readings Hermann, T., Hunt, A. (2005). "An Introduction to Interactive Sonification" in IEEE Multimedia,

More information

SCIENCE & TECHNOLOGY

SCIENCE & TECHNOLOGY Pertanika J. Sci. & Technol. 25 (S): 163-172 (2017) SCIENCE & TECHNOLOGY Journal homepage: http://www.pertanika.upm.edu.my/ Performance Comparison of Min-Max Normalisation on Frontal Face Detection Using

More information

From acoustic simulation to virtual auditory displays

From acoustic simulation to virtual auditory displays PROCEEDINGS of the 22 nd International Congress on Acoustics Plenary Lecture: Paper ICA2016-481 From acoustic simulation to virtual auditory displays Michael Vorländer Institute of Technical Acoustics,

More information

ROBUST LOCALIZATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES

ROBUST LOCALIZATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES ROBUST LOCALIZATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES Tobias May Technical University of Denmark Centre for Applied Hearing Research DK - 28

More information

Comparing Computer-predicted Fixations to Human Gaze

Comparing Computer-predicted Fixations to Human Gaze Comparing Computer-predicted Fixations to Human Gaze Yanxiang Wu School of Computing Clemson University yanxiaw@clemson.edu Andrew T Duchowski School of Computing Clemson University andrewd@cs.clemson.edu

More information

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23 Audio Similarity Mark Zadel MUMT 611 March 8, 2004 Audio Similarity p.1/23 Overview MFCCs Foote Content-Based Retrieval of Music and Audio (1997) Logan, Salomon A Music Similarity Function Based On Signal

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK SMILE DETECTION WITH IMPROVED MISDETECTION RATE AND REDUCED FALSE ALARM RATE VRUSHALI

More information

Glossary of Terms. Beta Movement (Lesson 3)

Glossary of Terms. Beta Movement (Lesson 3) Glossary of Terms Frame Rate (Lesson 3) The rate of frames per second in film and video. Modern theatrical film runs at 24 frames a second. This is the rate for both tradi:onal film and digital cinema

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Architectural Acoustics Session 1pAAa: Advanced Analysis of Room Acoustics:

More information

Fundamentals of Signals, DSP and Applica7ons in m- Health. By Deepta Rajan FSE Oct 10, 2013.

Fundamentals of Signals, DSP and Applica7ons in m- Health. By Deepta Rajan FSE Oct 10, 2013. Fundamentals of Signals, DSP and Applica7ons in m- Health By Deepta Rajan FSE 100 - Oct 10, 2013. Outline Signals What are they? Fourier Transform - T/F domain Challenges in Signal Processing The AJDSP

More information

YDDON. Humans, Robots, & Intelligent Objects New communication approaches

YDDON. Humans, Robots, & Intelligent Objects New communication approaches YDDON Humans, Robots, & Intelligent Objects New communication approaches Building Robot intelligence Interdisciplinarity Turning things into robots www.ydrobotics.co m Edifício A Moagem Cidade do Engenho

More information

Using RASTA in task independent TANDEM feature extraction

Using RASTA in task independent TANDEM feature extraction R E S E A R C H R E P O R T I D I A P Using RASTA in task independent TANDEM feature extraction Guillermo Aradilla a John Dines a Sunil Sivadas a b IDIAP RR 04-22 April 2004 D a l l e M o l l e I n s t

More information

3D and Sequential Representations of Spatial Relationships among Photos

3D and Sequential Representations of Spatial Relationships among Photos 3D and Sequential Representations of Spatial Relationships among Photos Mahoro Anabuki Canon Development Americas, Inc. E15-349, 20 Ames Street Cambridge, MA 02139 USA mahoro@media.mit.edu Hiroshi Ishii

More information

Institute for Media Technology Electronic Media Technology (ELMT)

Institute for Media Technology Electronic Media Technology (ELMT) Institute for Media Technology Electronic Media Technology (ELMT) 21.09.2017 Page 1 Key expertise of EMT The key expertise in research and education is related to technological developments for capturing,

More information

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle SUB-BAND INDEPENDEN SUBSPACE ANALYSIS FOR DRUM RANSCRIPION Derry FitzGerald, Eugene Coyle D.I.., Rathmines Rd, Dublin, Ireland derryfitzgerald@dit.ie eugene.coyle@dit.ie Bob Lawlor Department of Electronic

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion

Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion Cogn Comput (2011) 3:5 24 DOI 10.1007/s12559-010-9074-z Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion Parag K. Mital Tim J. Smith Robin L. Hill John M. Henderson Received: 23 April

More information

Inside the Psychology of the Agent Informa)on, Associa)on, A/rac)on and Repulsion

Inside the Psychology of the Agent Informa)on, Associa)on, A/rac)on and Repulsion Ins$tute for Advanced Topics in the Digital Humani$es University of North Carolina, Charlo?e June 10 th, 2011 Inside the Psychology of the Agent Informa)on, Associa)on, A/rac)on and Repulsion Inside the

More information

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015 Perception Introduction to HRI Simmons & Nourbakhsh Spring 2015 Perception my goals What is the state of the art boundary? Where might we be in 5-10 years? The Perceptual Pipeline The classical approach:

More information

Vision-based User-interfaces for Pervasive Computing. CHI 2003 Tutorial Notes. Trevor Darrell Vision Interface Group MIT AI Lab

Vision-based User-interfaces for Pervasive Computing. CHI 2003 Tutorial Notes. Trevor Darrell Vision Interface Group MIT AI Lab Vision-based User-interfaces for Pervasive Computing Tutorial Notes Vision Interface Group MIT AI Lab Table of contents Biographical sketch..ii Agenda..iii Objectives.. iv Abstract..v Introduction....1

More information

ROBUST LOCALISATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES

ROBUST LOCALISATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES Downloaded from orbit.dtu.dk on: Dec 28, 2018 ROBUST LOCALISATION OF MULTIPLE SPEAKERS EXPLOITING HEAD MOVEMENTS AND MULTI-CONDITIONAL TRAINING OF BINAURAL CUES May, Tobias; Ma, Ning; Brown, Guy Published

More information

Direction-Dependent Physical Modeling of Musical Instruments

Direction-Dependent Physical Modeling of Musical Instruments 15th International Congress on Acoustics (ICA 95), Trondheim, Norway, June 26-3, 1995 Title of the paper: Direction-Dependent Physical ing of Musical Instruments Authors: Matti Karjalainen 1,3, Jyri Huopaniemi

More information

Classification of Clothes from Two Dimensional Optical Images

Classification of Clothes from Two Dimensional Optical Images Human Journals Research Article June 2017 Vol.:6, Issue:4 All rights are reserved by Sayali S. Junawane et al. Classification of Clothes from Two Dimensional Optical Images Keywords: Dominant Colour; Image

More information

An Un-awarely Collected Real World Face Database: The ISL-Door Face Database

An Un-awarely Collected Real World Face Database: The ISL-Door Face Database An Un-awarely Collected Real World Face Database: The ISL-Door Face Database Hazım Kemal Ekenel, Rainer Stiefelhagen Interactive Systems Labs (ISL), Universität Karlsruhe (TH), Am Fasanengarten 5, 76131

More information

Book Chapters. Refereed Journal Publications J11

Book Chapters. Refereed Journal Publications J11 Book Chapters B2 B1 A. Mouchtaris and P. Tsakalides, Low Bitrate Coding of Spot Audio Signals for Interactive and Immersive Audio Applications, in New Directions in Intelligent Interactive Multimedia,

More information

Virtual Acoustic Space as Assistive Technology

Virtual Acoustic Space as Assistive Technology Multimedia Technology Group Virtual Acoustic Space as Assistive Technology Czech Technical University in Prague Faculty of Electrical Engineering Department of Radioelectronics Technická 2 166 27 Prague

More information

Spatial Audio & The Vestibular System!

Spatial Audio & The Vestibular System! ! Spatial Audio & The Vestibular System! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 13! stanford.edu/class/ee267/!! Updates! lab this Friday will be released as a video! TAs

More information

Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza

Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza Computer Graphics Computational Imaging Virtual Reality Joint work with: A. Serrano, J. Ruiz-Borau

More information

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet

An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Journal of Information & Computational Science 8: 14 (2011) 3027 3034 Available at http://www.joics.com An Audio Fingerprint Algorithm Based on Statistical Characteristics of db4 Wavelet Jianguo JIANG

More information

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation

Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Recurrent Timing Neural Networks for Joint F0-Localisation Estimation Stuart N. Wrigley and Guy J. Brown Department of Computer Science, University of Sheffield Regent Court, 211 Portobello Street, Sheffield

More information

MPEG-4 Structured Audio Systems

MPEG-4 Structured Audio Systems MPEG-4 Structured Audio Systems Mihir Anandpara The University of Texas at Austin anandpar@ece.utexas.edu 1 Abstract The MPEG-4 standard has been proposed to provide high quality audio and video content

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

Convention e-brief 400

Convention e-brief 400 Audio Engineering Society Convention e-brief 400 Presented at the 143 rd Convention 017 October 18 1, New York, NY, USA This Engineering Brief was selected on the basis of a submitted synopsis. The author

More information

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions

Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions Downloaded from orbit.dtu.dk on: Dec 28, 2018 Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions Ma, Ning; Brown, Guy J.; May, Tobias

More information

RIR Estimation for Synthetic Data Acquisition

RIR Estimation for Synthetic Data Acquisition RIR Estimation for Synthetic Data Acquisition Kevin Venalainen, Philippe Moquin, Dinei Florencio Microsoft ABSTRACT - Automatic Speech Recognition (ASR) works best when the speech signal best matches the

More information

Spring 2018 CS543 / ECE549 Computer Vision. Course webpage URL:

Spring 2018 CS543 / ECE549 Computer Vision. Course webpage URL: Spring 2018 CS543 / ECE549 Computer Vision Course webpage URL: http://slazebni.cs.illinois.edu/spring18/ The goal of computer vision To extract meaning from pixels What we see What a computer sees Source:

More information

Roberto Togneri (Signal Processing and Recognition Lab)

Roberto Togneri (Signal Processing and Recognition Lab) Signal Processing and Machine Learning for Power Quality Disturbance Detection and Classification Roberto Togneri (Signal Processing and Recognition Lab) Power Quality (PQ) disturbances are broadly classified

More information

Mobile and Ubiquitous Compu3ng. Wireless Signals. George Roussos.

Mobile and Ubiquitous Compu3ng. Wireless Signals. George Roussos. Mobile and Ubiquitous Compu3ng Wireless Signals George Roussos g.roussos@dcs.bbk.ac.uk Overview Signal characteris3cs Represen3ng digital informa3on with wireless Transmission and propaga3on Accessing

More information

ApProgXimate Audio: A Distributed Interactive Experiment in Sound Art and Live Coding

ApProgXimate Audio: A Distributed Interactive Experiment in Sound Art and Live Coding ApProgXimate Audio: A Distributed Interactive Experiment in Sound Art and Live Coding Chris Kiefer Department of Music & Sussex Humanities Lab, University of Sussex, Brighton, UK. School of Media, Film

More information

- applications on same or different network node of the workstation - portability of application software - multiple displays - open architecture

- applications on same or different network node of the workstation - portability of application software - multiple displays - open architecture 12 Window Systems - A window system manages a computer screen. - Divides the screen into overlapping regions. - Each region displays output from a particular application. X window system is widely used

More information

VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification

VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification VISUAL PITCH CLASS PROFILE A Video-Based Method for Real-Time Guitar Chord Identification First Author Name, Second Author Name Institute of Problem Solving, XYZ University, My Street, MyTown, MyCountry

More information

Techniques for Designing GPGPU Games. Mark Joselli Esteban Clua

Techniques for Designing GPGPU Games. Mark Joselli Esteban Clua Techniques for Designing GPGPU Games Mark Joselli Esteban Clua Presenta?on; Background; Mo?va?on; Objec?ves; Games and GPGPU; Techniques analyzed; Examples; Conclusions; Agenda Presenta?on: Mark Joselli

More information

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks

Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks Mariam Yiwere 1 and Eun Joo Rhee 2 1 Department of Computer Engineering, Hanbat National University,

More information

Challenge to Open Systems Problems

Challenge to Open Systems Problems Stanford Unversity EE380 Computer Systems Colloquium Challenge to Open Systems Problems September 29, 2010 Mario Tokoro President & CEO Sony Computer Science Laboratories, Inc. Victory of Science and Technology

More information

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments

Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments , pp.32-36 http://dx.doi.org/10.14257/astl.2016.129.07 Multi-Resolution Estimation of Optical Flow on Vehicle Tracking under Unpredictable Environments Viet Dung Do 1 and Dong-Min Woo 1 1 Department of

More information

IMAGE PROCESSING IEEE TITLES

IMAGE PROCESSING IEEE TITLES 2017 2018 IMAGE IEEE TITLES S.no TITLE DOMAIN 1 An Watermarking Scheme Using Threshold Based Secret Sharing 2 Brain Tumor Detection And Segmentation Using Conditional Random Field 3 A Reversible Rie Based

More information