Resynthesizing audiovisual percep5on with augmented reality

Size: px

Start display at page:

Download "Resynthesizing audiovisual percep5on with augmented reality"

Collin Robbins
6 years ago
Views:

1 Resynthesizing audiovisual percep5on with augmented reality Parag K Mital Department of Compu5ng, Goldsmiths, University of London hbp://pkmital.com Presented for Lunch BITES, CULTURE Lab, Newcastle on 30/06/11

2 Ques5ons What computa5onal processes describe audiovisual percep5on in the real- world? What can augmented reality reveal about our underlying percep5on? Objec5ves Build computa5onal models of audio- visual aben5on using controlled experiments Interpret these models in a real- 5me context situated in real- life scenarios using augmented reality and re- synthesis techniques

3 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

4 Modeling A"en%on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

5 Experimental Psychology What processes describe human cogni5on? Visual cogni5on Vision research Auditory scene analysis Auditory aben5on Psychophysics Psychoacous5cs Mul5sensory/Crossmodal percep5on Film cogni5on

6 Computa5onal Cogni5on What computa5onal models best describe human cogni5on? Computer vision Computa5onal neuroscience Machine learning Speech recogni5on Saliency models

7 Dynamic Images and Eye Movements John Henderson, Tim Smith, Robin Hill, Parag K Mital awarded to John Henderson and funded by Leverhulme and ESRC Ques5on What drives human aben5on and eye- movement behavior during moving images? Objec5ves Build a corpus of eye- movement data and corresponding moving images Develop theories and tools for understanding ac5ve visual cogni5on

8 82 videos Range between 30 seconds and 3 minutes. 200 viewers+ Broad range of s5muli: adverts film clips real- world scenes social scenes film trailers video game trailers music videos documentaries news clips anima5on 8

>1000 lines of 8- column data per second!

level feature visualiza5ons Op5cal flow,

9 Eye- tracking data CARPE X/Y coords of eyes per millisecond per eye per person, plus various eye- movement events and messages. >1000 lines of 8- column data per second! Gaze videos Gaussian Mixture Models Low- level feature visualiza5ons Op5cal flow, edges, gabors, flicker, chroma5city, luminance Dynamic Heatmap videos

17 Auditory ABen5on Modeling

18 Modeling ABen5on Prior Spectral/Region Segmenta%on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

19 Vision Processing Detec5on Features (SIFT, SURF, Harris Corners) Regions (Mean- shil, MSER) Haar- Features (Boosted Cascades, Viola- Jones) Templates (MI, SSD, Lucas- Kanade) Descrip5on Vector codes (GIST, SIFT, SURF, BRIEF) Trees (FlANN, LSH) Model- based reconstruc5on (PCA, plsa, LDA)

20 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp. 384-396, 2002.

21 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

22 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

23 J. Matas, O. Chum, M. Urba, and T. Pajdla. "Robust wide baseline stereo from maximally stable extremal regions. Proc. Of Bri5sh Machine Vision Conference, pp , Stanislav Basovnik, Lukas Mach, Andrej Mikulik, and David Obdrzalek. Detec5ng Scene Elements Using Maximally Stable Colour Regions IEEE Computer Vision and PaBern Recogni5on, 2007.

24 Source Separa5on Ques5on How can we describe a chunk of audio in terms of seman5c factors? Paris Smaragdis et al, Sparse and Shil- Invariant Feature Extrac5on From Non- Nega5ve Data

25 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta%on Synthesis Retrieval/Indexing Scene Reconstruc5on

26 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc5on

27 Interpre5ng the Model in Real- Time Ques5on How can technology employing cogni5ve models help us to beber understand the model?

28 Human- Computer Interac5on Ques5on How can we build interfaces to our own perceptual processes? Augmented reality Interfaces for musical expression Robot percep5on

29 Corpus based resynthesis Catart SoundspoBer A new approach to crea5ng musical streams by selec5ng and concatena5ng source segments from a large audio database using methods from music informa5on retrieval (Casey, 2009) Casey, M Soundsposng: a new kind of process?. In The Oxford Handbook of Computer Music, ed. R. Dean New York: Oxford University Press.

31 Modeling ABen5on Prior Spectral/Region Segmenta5on Temporal Event Segmenta5on Synthesis Retrieval/Indexing Scene Reconstruc%on

32 Sound Spa5aliza5on HRIR using both MIT and IRCAM LISTEN 1 Perceptual filter encoding source of sound [1]. hbp://recherche.ircam.fr/equipes/salles/listen

33 Loca5on of Impulse Responses

34 Convolu5on Convolu5on Impulse response Binaural Audio

35 hbp://pkmital.com

COMP 9517 Computer Vision. Introduc<on

COMP 9517 Computer Vision. Introduc<on COMP 9517 Computer Vision Introduc