Perceptual Distortion Maps for Room Reverberation

Size: px

Start display at page:

Download "Perceptual Distortion Maps for Room Reverberation"

Basil Holt
6 years ago
Views:

1 Perceptual Distortion Maps for oom everberation Thomas Zarouchas 1 John Mourjopoulos 1 1 Audio and Acoustic Technology Group Wire Communications aboratory Electrical Engineering and Computer Engineering Department University of Patras Greece thozar@wcl.ee.upatras.gr mourjop@wcl.ee.upatras.gr ABSTACT From reverberated audio signals and using as reference the input (anechoic) audio a number of distortion maps are etracted indicating how room reverberation distorts in time-frequency scales perceived features in the received signal. These maps are simplified to describe the monaural time-frequency / level distortions and the distortion of the spatial cues (i.e. inter-channel cues and coherence) which are very important for sound localization in a reverberant environment. Such maps here are studied as functions of room parameters (size acoustics distance etc) as well as due to input signal properties. Overall perceptual distortion ratings are produced and reverberationresilient signal features are etracted. 1. INTODUCTION oom acoustics introduce reverberation to audio signals which is usually formally described by the linear system response functions (e.g. convolutional input/output relationships using appropriate oom Impulse responses). Such approach helps to describe up to a certain degree features of reverberation important from a signal processing perspective [1 2 3]. However the perception of reverberation is a very comple phenomenon resulting from time-frequency delay level directional and signal-dependent cues [ ]. Currently there is a significant gap between the objective and the subjective approach for analyzing such phenomena. This wor etends earlier published results in signal processing-based methodology to deal with room reverberation [8 9] and also recent attempts to introduce perceptually motivated models for similar applications. Specifically in this wor a Computational Auditory Masing Model (CAMM) [1011] complemented by a novel Inter-channel Cue Mapping Module (ICMM) is used for the perceptual description of reverberation distortions in audio signals and the degradations of the stereo image in a typical audio reproduction setup. corresponding auditory model time/frequency regions with significant degradation due to reverberation. Furthermore the output of the inter-channel cue process module indicates the modification of the relevant spatial cues due to reverberation. In both cases the outputs of the CAMM and ICMM are presented in a form of timefrequency (2-D) maps [12]. The paper is organized as follow: In section 2 the analysis scheme for the etraction of the distortion maps of room reverberation is presented. In section 3 the utilization of the Computational Auditory Masing Model to derive the distortion maps due to reverberation is presented. In section 4 the Inter-channel Cue Mapping Model is presented. Simulation results are given in section 5. Finally some conclusions are drawn in section DISTOTION MAPS FO EVEBEATION The proposed structure for the evaluation of the distortion maps of room reverberation is shown in Fig. 1 for stereo reproduction. The concept can be etended to multi-channel audio signal reproduction. The proposed method requires as inputs the anechoic audio signal and the corresponding reverberant signal. According to this approach it is possible to locate from the evaluated internal representations of the

2 EVEBEATION DISTOTION MAPS audio signal generated in any real room as can be measured via an omni directional microphone and was described in detail in [14]. Alternatively a simulated reverberant signal may be obtained via convolution with a measured room impulse response. The CAMM derives the monaural internal representations of the audio signal in a number of frequency bands which are inputted to a Decision-Threshold Device (DTD). The output of the DTD represents time-frequency maps with significant reverberation distortions. The concept of the DTD is based on the Just Noticeable Intensity Difference of the internal signal representations. Figure 1: Analysis scheme for the assessment of the perceptual distortion cues The scheme shown in Fig. 1 employs a monaural Masing Model for estimation of perceived distortion due to reverberation complemented by a Inter-channel Cue Mapping Module (ICMM) for the evaluation of the alterations in the relevant spatial cues. Inputs to the proposed method are the source audio signals and estimates or measurements of the reverberant audio signals generated in any real room. For the analysis in the monaural (CAMM) and spatial cues (ICMM) the signals are transformed into the timefrequency domain. For the purpose of this processing a novel filterban is utilized with near-perfect reconstruction properties enabling fleible signal modification. This filterban provides non-uniform analysis bands with sufficient frequency resolution in order to capture the perceptually relevant cues at low frequencies following closely the critical band / EB scale. The sub-band domain signals s n are obtained as: M1 K m0 s n s n m h( m) cos (1 ) snis the input signal M is the length of a where prototype filter hn and can be considered as a function of the sub-band inde the number of subbands K and a phase parameter φ [15 16]. 3. DISTOTION MAPS BASED ON MONAUA MASKING MODE The Computational Auditory Masing Model used here can successfully emulate many aspects of the monaural signal processing of the auditory system. Input to the auditory model are a single channel of the original audio signal anechoic and the corresponding reverberant The detailed structure for the evaluation of the distortions due to reverberation is shown in Figure 2. Input to the filter ban is the original (source) single channel audio signal n and the corresponding n. The sub- reverberant single channel audio signal band signals see Eq. (1) n and n are then inputted to the CAMM producing the monaural internal representations z n and z n respectively. The Decision-Threshold Device (DTD) accompanied with a set of thresholds T n is utilized to etract the difference n according to n z n z n (2 ) and therefore to derive the parameter D n n T n (3 ) The D n parameter indicates the degree of the perceived distortions due to reverberation above the specified threshold when D n 0 in the timefrequency domain for single channel audio signals and generally 0 D n 1 (4 ) It is clear that the CAMM can easily be etended to evaluate separately the parameters D nfor the 2 channels of a stereo signal. However in such a case any binaural masing mechanisms are not considered. Page 2 of 8

3 EVEBEATION DISTOTION MAPS N p n n m m1 N p n n m m1 2 2 (5 ) and cross-power estimate between the two channels (left-right) is also performed according to N (6 ) m1 p n n m n m Figure 2: Analysis scheme for processing of reference and reverberant audio signals in order to derive the time-frequency map of parameter D n 4. DISTOTION MAPS BASED ON INTE- CHANNE CUE MAPPING Input to the Inter-channel Cue Mapping Module are stereo channels for both the source audio signal and the corresponding reverberant signal. The relevant spatial cues [413] eamined here are the inter-channel level difference (ID) inter-channel time difference (ITD) and inter-channel coherence (IC). These are derived for all signals and channels independently in each frequency band and as function of time (see Figure 3). As it is nown [4 13] Inter Channel evel Difference (ICD [db]) denotes the level/intensity differences between two (left right) channels: ICD n p 10log10 p with a typical level range of n (7 ) ICD (8 ) Inter Channel Time Difference (ICTD [samples]) describes the time difference between two channels and is the time instance at which the maimum value of a short-time estimate of the normalized cross-correlation function has occurred i.e.: n p p n p n (9 ) ICTD has a typical time range (samples) of n ICTD n (10 ) Figure 3: Analysis scheme of the Inter-channel Cue Mapping Module Short-time estimates of the power for each channel (i.e. left right for a typical stereo setup) and for each subband are considered for a window size of N samples according to: Inter Channel Coherence (ICC) defines the coherence between two channels ( and ) and can be epressed as: ICC n ma n (11 ) nn0 considering the maimum value of the instantaneous normalized cross-correlation. ICC has a range of: 0 ICC n 1 (12 ) 0 where 1 indicates that and are perfectly coherent. Page 3 of 8

4 EVEBEATION DISTOTION MAPS Based on the definitions of equations (7-12) the evaluation of the spatial cues for both the source and the reverberant signal (denoted by ~) is performed as: ICD and ICD n ICTD n and n ICTD n 0 ICC 1 and 0 ICC 1 1 n 1 2 n 2 1 n 1 2 n 2 (13 ) Here from these maps which correspond to source and received signals distortion maps will be evaluated (in the time-frequency domain) defined by the differences of these maps. Therefore the differential metric (distortion map) for each cue is introduced according to: ICD ICD t n ICTD ICTDn c n ICC ICCn (14 ) Based on equations ( ) the typical level time and coherence range for each differential metric will be: 1 2 n 1 2 t n n n n n 1 2 c (15 ) Hence the output of the ICMM according to equation (15) indicates the variation of the inter-channel cues in the time-frequency domain for both signals in a form of differential cue mapping. Typical differential maps for a number of test cases are shown in Figures 6 and 7. For c t the differential metrics ( n ) and for the distortion parameter ( D ) the mean values in a frame by frame basis can be also evaluated for each test case. Additional a logarithmic epression of the corresponded mean values (ecept for estimated according to: db 10 n ) can also i 20 log X i i=1...m (16 ) where Xn i is the mean value of the corresponding differential metric or the distortion parameter in frame i for a number of M frames and typical frame length of 1024 samples leading to a simplified interpretation of the overall perceived signal-dependent distortion which can be assigned to each map. 5. TESTS AND ESUTS Preliminary tests were conducted having as reference typical stereo 16 bit resolution signals at f s = 44100Hz. These tests were using input (reference) audio signals of different categories i.e. big band jazz (JAZZ) solo classical piano (PIANO) and male speech (SPEECH) and as second input the corresponding signals recorded under various reverberation conditions in different real enclosures (see Table 1) ranging from a acoustically treated laboratory to a large sports hall. From this set of distortion maps the local variations and the overall distortion metrics for each specific test case are evaluated for typical audio signals and reverberation conditions. oom Dimensions WH(m) T(sec) (freq avg.) Type aboratory Classroom Sports hall Table 1: Properties of rooms used for tests Table 2 indicates the variation of the monaural masing distortion parameter D (db) for different audio signals recorded in three different enclosures with varying acoustical properties. As it is shown room 3 (large sports hall) indicates a higher degree of perceived distortion for all types of signal. Signal oom JAZZ PIANO SPEECH Table 2: Monaural masing distortion parameter D (db) for different real enclosures and different audio signals The variation of the distortion parameter D (db) and the corresponding distortion map based on CAMM for a reverberant audio signal segment recorded in room 3 are shown in Figure 4. As can be observed in Fig. 4(c) the frequency averaged distortion metric Page 4 of 8

5 EVEBEATION DISTOTION MAPS D increases during the reverberant decay of the piano note (shown without reverberation in Fig. 4(a)) indicating the increase of the perceived distortion due to reverberant tail. The corresponding 2-D perceptually motivated map of Fig. 4(d) gives a more detailed illustration of the corresponding time-frequency distortions indicating in red signal regions with higher degree of perceived distortion. The effect of different room acoustics on the variation of distortion parameter D is shown in Figure 5. Note that the dashed line in each case indicates the frequency-averaged mean value of the perceived distortion for the corresponding audio signal segment. It is clear that the mean metric increases with reverberation time and the perceived effects of reverberation are more pronounced for the larger rooms (Fig. 5(a) and 5(b)) than for the acoustically treated room (Fig. 5(c)). Furthermore heavy reverberation seems to lower the frequency and depth of the modulations of the perceived distortion. higher dispersion can be observed for each differential inter-channel metric (Fig. 7). 6. CONCUSIONS The wor illustrates the efficiency of the proposed Computationally Auditory Masing Model and the novel Inter-channel Cue Mapping Module to describe with appropriate 2-D maps the perceived distortion and the general degradation of audio signals due to reverberation. As it was shown signal-dependent perceived distortions can be isolated into specific timefrequency regions. Table 3 shows the variation of the inter-channel differential metrics for the above enclosures using the PIANO as input signal. As it is shown the divergence in the inter-channel coherence differential c and the inter-channel time differential t metrics between rooms 1 and room 3 is close to 3 db. Differential oom Metric Coherence c evel Time t Table 3: Differential metrics for different real enclosures and PIANO as a test signal For the differential inter-channel metrics similar overall trends as with the previously described monaural metric can be observed in Figures 6 and 7 corresponding to the acoustically treated room (Fig. 6) and the large sports hall (Fig. 7). For low reverberation all differential interchannel metrics display low dispersion (distortion maps have large time-frequency regions close to green i.e. 0 db for metric). Furthermore deviations are low around this value. With heavy reverberation conditions Page 5 of 8

EVEBEATION DISTOTION MAPS Figure 4: (a) source audio signal segment (PIANO) (b) corresponding reverberant signal recorded in oom 3 (c) distortion parameter (db) (d) Distortion Map D based on CAMM for

6 EVEBEATION DISTOTION MAPS Figure 4: (a) source audio signal segment (PIANO) (b) corresponding reverberant signal recorded in oom 3 (c) distortion parameter (db) (d) Distortion Map D based on CAMM for reverberant signal These distortion maps illustrate different aspects of perceived degradations from monaural masing due to reverberant decay to inter-channel level time and coherence variations in stereo signals. Figure 5: Distortion parameter for PIANO audio segment (a) room 3 (b) room 2 (c) room 1. Dashed line indicates overall mean value This detailed identification of the distortions can allow novel signal-processing methods to evolve so that such distortions can be suppressed without or in conjunction with the more traditional inverse filter based methods [14]. It is also promising that both short-term and long-term trends of the proposed distortion metrics seem to follow the trends in the established physical acoustical parameters of the recorded space. However unlie eisting acoustical measurements the proposed distortion maps are dynamically-varying with the signal evolution and are dependent on the specific audio signal. Hence such maps may help to reconsider the problem of reverberation from a signal-processing perspective that is closer to perception and the specific signal reproduced inside such an enclosure. Page 6 of 8

Zarouchas Mourjopoulos EVEBEATION DISTOTION MAPS Figure 6: Differential Cue Mapping (a) inter-channel coherence (b) inter-channel level difference (c) interchannel time difference for room 1

hierarchical structure of the relative importance of each differential metric.

7 Zarouchas Mourjopoulos EVEBEATION DISTOTION MAPS Figure 6: Differential Cue Mapping (a) inter-channel coherence (b) inter-channel level difference (c) interchannel time difference for room 1 and JAZZ as test signal Future wor will eamine the variation and sensitivity of the differential metrics introduced here with respect to different source receiver positions leading to a hierarchical structure of the relative importance of each differential metric. Figure 7: Differential Cue Mapping (a) inter-channel coherence (b) inter-channel level difference (c) interchannel time difference for room 3 and JAZZ as test signal 7. EFEENCES [1] M.. Schroeder B. F. ogan Colorless Artificial everberation Journal of the Audio Engineering Society Vol. 9 p Page 7 of 8

8 EVEBEATION DISTOTION MAPS [2] S. T. Neely J. B. Allen Invertibility of a oom Impulse esponse Journal of the Acoustical Society of America Vol. 66 pp [3] J. N. Mourjopoulos Digital Equalization of oom Acoustics Journal of the Audio Engineering Society Vol. 42 No 11 pp [4] J. Blauert (1997). Spatial Hearing: The Psychophysics of Human ocalization (evised Edition) The MIT press USA-Cambridge. [5] J. M. Buchholz J. Mourjopoulos J. Blauert oom Masing: Understanding and Modeling the Masing of oom eflections 110 th AES Convention Amsterdam May 2001 preprint (5312). [6]. H. Bolt A. D. MacDonald Theory of Speech Masing by everberation Journal of the Acoustical Society of America Vol. 21(6) pp [7] F. E. Toole oudspeaers and ooms for Sound eproduction A Scientific eview Journal of the Audio Engineering Society Vol. 54 No. 6 June Based on Statistics of Binaural Interaction IEEE Transactions on Audio Speech and anguage Processing Vol. 14 No. 1 January [13] C. Faller J. Merimaa Source ocalization in Comple istening Situations: Selection of Binaural Cues Based on Interaural Coherence Journal of the Acoustical Society of America Vol. 116(5) pp November [14] T. Zarouchas J. Mourjopoulos J. Buchholz P. Hatziantoniou A Perceptual Measure for Assessing and emoving everberation from Audio Signals 120 th AES Convention Paris May 2006 preprint (6702). [15] ISO/IEC Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mbit/s Part 3: Audio. [16] J. Breebaart S. van de Par A. Kohlrausch E. Schuijers Parametric Coding of Stereo Audio EUASIP Journal on Applied Signal Processing Vol Issue 9 pages [8] J.. Flanagan. C. ummis Signal Processing to educe Multipath Distortions in Small ooms Journal of the Audio Engineering Society Vol. 47 pp [9] J. B. Allen D. A. Berley J. Blauert Multimicrophone Signal Processing Technique to emove oom everberation from Speech Signals Journal of the Acoustical Society of America Vol. 64(2) pp [10] J. M. Buchholz J. Mourjopoulos A Computational Auditory Masing Model Based on Signal-Dependent Compression. I. Model Description and Performance Analysis Acta Acustica United with Acustica Vol. 90 pp (2004). [11] J. M. Buchholz J. Mourjopoulos A Computational Auditory Masing Model Based on Signal-Dependent Compression. II. Model Simulations and Analytical Approimations Acta Acustica United with Acustica Vol. 90 pp (2004). [12] S. Harding J. Barer G. J. Brown Mas Estimation for Missing Data Speech ecognition Page 8 of 8

Subband Analysis of Time Delay Estimation in STFT Domain

PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,