The psychoacoustics of reverberation

Size: px

Start display at page:

Download "The psychoacoustics of reverberation"

Camron Mathews
6 years ago
Views:

1 The psychoacoustics of reverberation Steven van de Par July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

2 Introduction The psychoacoustics of reverberation, what is this talk about? Reverberation is nearly always present in our daily life It creates large distortions of the physical waveform Yet it mostly has only a small effect on (speech) perception

3 Amplitude Amplitude Introduction The psychoacoustics of reverberation, what is this talk about? Reverberation is nearly always present in our daily life It creates large distortions of the physical waveform Yet it mostly has only a small effect on (speech) perception 1 Clean speech Reverberated speech Clean speech Reverberated speech T60 = 250 ms Time(s) Time(s)

4 Introduction The psychoacoustics of reverberation, what is this talk about? Reverberation is nearly always present in our daily life It creates large distortions of the physical waveform Yet it mostly has only a small effect on (speech) perception Outline: Principles and mechanisms in perception that help beating reverberation Some ideas about controlling sound fields in a perceptually motivated manner

5 The Peripheral Auditory System

6 The inner ear Cochlea: Mechanical energy (oval window) is converted into a neural signal (auditory nerve) Performs a time-frequency analysis

7 The Cochlea 1. cochlear duct 2. scala vestibuli 3. scala tympani 4. spiral ganglion 5. auditory nerve fibres The red arrow is from the oval window The blue arrow points to the round window The cochlea is about 2 mm in diameter

8 Inner Ear: the Basilar Membrane Frequency-to-place transformation: Each point on BM acts as a band-pass filter

9 Cochleagram Simulates basilar-membrane filtering, and represents magnitudes in dbs. Brain captures a relatively coarse spectro-temporal representation

10 Auditory signal representation Cochleagram is a reasonable first order approximation of perception (loudness, timbre) Additional perceptual cues ( texture cues ): - Timing Information for binaural processing - ITDs, 20 s JND (source direction) - Interaural cross-correlation (source width, listener envelopment) - Temporal pitch cues - Modulation cues (e.g. roughness of a sound) Included in advanced models by e.g. Patterson, Meddis and colleagues and Dau et al. (1996, 1997)

11 Another function within the auditory system Source segregation: Often multiple sources are present simultaneously We can focus on one source Cocktail party processing: Listen to one speaker only Spatial separation helps How does the brain do it?

Frequency (Hz) Energy Frequency (Hz) Azimuth (deg) 5000 Complex acoustical scenes 5000 50 4500 3159 40 4000 1995 30 3500 3000 2500

12 Frequency (Hz) Energy Frequency (Hz) Azimuth (deg) 5000 Complex acoustical scenes Time (sec) Time (sec) -50 Cochleagram of a mix of two speakers binary mask indicating source dominance Acoustic mixtures are often spectro-temporally sparse: For each time-frequency interval one source dominates in level Grouping of signal components is essential to make sense of the speech signal

13 Auditory grouping / segregation Bregman 1990: Auditory Scene Analysis Primitive grouping cues: Common onset Common pitch Common AM/FM modulation Common location All have to do with the physics of sound generation See also:

14 Auditory grouping / segregation Fusion by common frequency change Common frequency modulation is a grouping cue (Bregman,

15 Visual grouping / occlusion Apparent continuity Difficult to see what we are dealing with

16 Visual grouping / occlusion Apparent continuity Without providing extra parts of the letters we can now see the letter B We added information about where the letters are cut The overlay is a physically plausible cause for not seeing part of the letters

Frequency Frequency Frequency Is the auditory equivalent a

9 Two 2 speakers 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.

5 Time Male Speaker x 10 4 Male speaker 0 0.5 1 1.5 2 2.

17 Frequency Frequency Frequency Is the auditory equivalent a two speaker situation? Speaker Female speaker Two 2 speakers Time Male Speaker x 10 4 Male speaker Time x

18 Is the auditory equivalent a two speaker situation? Female speaker 2 speakers 30 ms frames 1 critical band Linear Sum Male speaker Mask Dominating voice only Female mask Male mask

19 Role of low-snr speech glimpses? Schoenmaker and van de Par (Advances in experimental medicine and biology, 2016) Remove speech target tiles with and SNR below a criterion value Speech intelligibility impaired only beyond about 0 db SNR Only positive SNR parts of speech contribute to intelligibility

20 What about reverberation? Female speaker 2 speakers 30 ms frames 1 critical band T60 = 750 ms Linear Sum Male speaker Mask Reverberated Female mask Male mask

21 Reverberation and the Auditory Representation Reverberation will temporally smear the auditory signal Multiple delayed reflections will add to the direct sound Often reverberant field is stronger than direct sound (critical radius) Speech phonemes will start to overlap (Speech rate 10 Syllables/sec) Music is slower (Allegro 150 bpm 3 notes/sec) Segregation will become more difficult Remember the primitive grouping cues: Common onset (largely preserved) Common pitch (pitch unaffected, changes will be smeared) Common AM/FM modulation (high rates changed and converted) Common location (much reduced reliability)

22 Target at 10º Measure distribution of binaural cues

23 Target at 10º Reverberation Measure distribution of binaural cues

24 Sound localization: Precedence effect (Haas effect) Precedence effect: - The first arriving wave front determines perceived direction - Allows spatial cues to contribute to segregation in reverberant conditions

25 Intermediate summary How does perception cope with reverberation: Reverberation is not represented well in the brain due to coarse spectro-temporal resolution of the auditory system Important perceptual segregation cues are robust against reverb Common onset Pitch Common low-rate AM/FM Spatial cues (due to precedence effect)

26 How to use this knowledge for sound field control The auditory principles that cope with reverberation are implemented in the transformed auditory domain. It is not possible to apply these processing principles directly on acoustical signals. Two examples will be given that use perceptual processing knowledge for sound field control

27 Authentic Audio reproduction Recording room Playback room

28 Authentic Audio reproduction Approach to authentic reproduction: - Optimizing spatial parameters on a coarse spectro-temporal scale is enough: - Direct sound for directional information - Reverberant sound for ASW and LEV (IACC) - Texture cues are represented in microphone signals - Consider the (reverberant) acoustics at the reproduction side Grosse and van de Par (IEEE J. OF SELECTED TOPICS IN SIGNAL PROCESSING, 2015)

29 Room-in-room reproduction

30 Room-in-room reproduction Only direct sound can be reproduced optimally No control over reverberant sound field

31 Perceptual approach Perceptual Optimization

32 Perceptual approach Perceptual Optimization Optimization targeting perceptually relevant statistical properties of reverberant sound field

33 Perceptual approach Perceptual Optimization Optimization targeting perceptually relevant statistical properties of reverberant sound field The acoustics of the playback room is an integral part of the optimization

34 Optimization Optimize perceptually relevant statistical parameters: Auditory Transfer Function Direct sound (front loudspeakers) Reverberant sound (dipole loudspeakers) Interaural Cross Correlation (frequency dependent) Cross-talk dipole loudspeakers T60 Direct-to-reverberant ratio

35 Perceptual approach

36 Evaluation All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings Compare objective parameters Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)

37 L (db) IACC E (db) E RinR E RinR,Opt E mch Frequency (Hz) -0.5 IACC ref IACC RinR IACC RinR-Opt IACC mch Frequency (Hz) edc ref edc RinR edc RinR-Opt edc mch t (ms) Objective parameters RinR = Conventional reproduction without optimization RinR,Opt = Our proposed optimization mch = Multi-channel reproduction with surround speakers Ref = Recording room - Coloration can be reduced compared to RinR - Spatial properties (IACC) better conserved

38 L (db) IACC E (db) E RinR E RinR,Opt E mch Frequency (Hz) -0.5 IACC ref IACC RinR IACC RinR-Opt IACC mch Frequency (Hz) edc ref edc RinR edc RinR-Opt edc mch t (ms) Objective parameters RinR = Conventional reproduction without optimization RinR,Opt = Our proposed optimization mch = Multi-channel reproduction with surround speakers Ref = Recording room - Coloration can be reduced compared to RinR - Spatial properties (IACC) better conserved

39 Listening test All reproduction methods simulated with Room Impulse Responses over headphones convolved with dry instrument recordings MUSHRA test Ref = Recording room Recording: Seminar room & Church 699 (ms) 3040 (ms) Playback (PBR): Small Lab & Seminar Room 371(ms) 697 (ms)

40 Results and Conclusions Simple loudspeaker setup allows: Seminar room Perceptual authentic reproduction Individualization by considering playback acoustics Church Grosse and van de Par (2015) IEEE Journal of Selected Topics in Signal Processing

41 Perceptual dereverberation Scenario: - Speech reproduction in a reverberant room - Preprocessing of the speech signal to enhance speech intelligibility Speech signal Preprocessing Reverberant room

42 Amplitude Perceptual dereverberation Main Idea: - Conserve spectro-temporal pattern - Use time-variant filtering (Hodoshima et al., 2006) Reverberated sine Clean sine Time(s)

43 Perceptual dereverberation Approach: Preprocessing of Loudspeaker inputs Adapt current frame based on past Optimize algorithm parameters with perceptual model. (Jørgensen et al. 2013)

44 Perceptual dereverberation Listening test: - Reverberated (pre-processed) speech with reverberated noise - Measure Speech Reception Threshold

45 Perceptual dereverberation Listening test: - Robustness for position - Measure Speech Reception Threshold

46 Summary The auditory system: Uses low-resolution spectro-temporal representation Extracts some special texture cues Uses robust cues for segregation/grouping Two examples for sound field control were shown Authentic audio reproduction in a reverberant playback room Perceptual dereverberation

47 Thank you for your attention Questions

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation