PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane
|
|
- Tyler Short
- 5 years ago
- Views:
Transcription
1 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY PAPER Enhanced Vertical Perception through Head-Related Impulse Response Customization Based on Pinna Response Tuning in the Median Plane Ki Hoon SHIN a), Nonmember and Youngjin PARK, Member SUMMARY Human s ability to perceive elevation of a sound and distinguish whether a sound is coming from the front or rear strongly depends on the monaural spectral features of the pinnae. In order to realize an effective virtual auditory display by HRTF (head-related transfer function) customization, the pinna responses were isolated from the median HRIRs (head-related impulse responses) of 45 individual HRIRs in the CIPIC HRTF database and modeled as linear combinations of 4 or 5 basic temporal shapes (basis functions) per each elevation on the median plane by PCA (principal components analysis) in the time domain. By tuning the weight of each basis function computed for a specific height to replace the pinna response in the KEMAR HRIR at the same height with the resulting customized pinna response and listening to the filtered stimuli over headphones, 4 individuals with normal hearing sensitivity were able to create a set of HRIRs that outperformed the KEMAR HRIRs in producing vertical effects with reduced front/back ambiguity in the median plane. Since the monaural spectral features of the pinnae are almost independent of azimuthal variation of the source direction, similar vertical effects could also be generated at different azimuthal directions simply by varying the ITD (interaural time difference) according to the direction as well as the size of each individual s own head. key words: HRTF customization, HRIR, pinna response tuning, principal components analysis 1. Introduction The ability of humans to use sonic cues to localize a sound in the surrounding 3 dimensional space is referred to as auditory localization. At its very core, lies the head-related transfer function (HRTF) which comprises major cues for spatial hearing such as the ITD (interaural time difference), ILD (interaural level difference), and spectral modification induced by the pinna folds. Synthesis of spatial hearing based on HRTFs is of great practical and research importance and non-individualized HRTFs measured with a dummy head microphone system (the KEMAR for instance) are used for most virtual audio syntheses. However, subjective evaluations on these non-individualized HRTFs involving a group of individuals often report front/back reversal and poor vertical effects. Both front/back distinction and vertical perception for humans are mainly triggered by the spectral features (peaks Manuscript received April 9, Manuscript revised June 29, The author is with Samsung Electronics, Suwon-City, , Republic of Korea. The author is with KAIST, Science Town, Daejeon, , Republic of Korea. a) kihoon221.shin@samsung.com DOI: /ietfec/e91 a and notches) produced by the direction-dependent filtering of the pinna as described by Shaw and Teranishi [1]. In particular, the importance of spectral notches (or nulls) as localization cues in the median plane (0 azimuth) is supported by Blauert [2] and also by Hebrank and Wright [3]. They concluded that elevation in the median plane where both ITD and ILD are zero is cued by a spectral notch whose frequency has similar dependence on elevation as that previously observed by Shaw and Teranishi in the lateral plane. Further results confirmed this conclusion both in the median plane [4] and in the lateral plane [5]. In an attempt to explain such a prominent feature in HRTFs, Lopez-Poveda and Meddis [6] suggested a diffraction/reflection model based on the posterior wall of the human concha and was able to predict the notch frequencies with reasonable accuracy. More recently, Langendijk and Bronkhorst [7] were able to isolate the frequency bands responsible for front/back and up/down cues in human HRTFs via a series of subjective listening tests. They concluded that front/back cues and up/down cues were located mainly in the 8 16-kHz band and in the 6 12-kHz band, respectively. Both bands lie in the spectral region of the pinna response which generally spans from 2 khz to above 14 khz [8]. Individual pinnae take a large variety of size and shape and the artificial set of pinnae mounted on the KEMAR are manufactured based on the average dimensions of human pinna cavities. Therefore, the pinna response of the non-individualized HRTF generally cannot match that of each individual HRTF resulting in front/back confusion and compromised vertical effects for most listeners. Based on the hypothesis that the structure of an HRTF is closely related to the dimensions and orientation of each individual body part, i.e. head, torso, shoulders, and pinnae, a variety of HRTF customization techniques by modifying other people s HRTFs has been introduced to accomplish perceptual fidelity in virtual audio synthesis. Some studies such as HRTF clustering and selection of a few most representative ones by Shimada et al. [9], a structural model for composition and decomposition of HRTFs by Algazi et al. [10], HRTF scaling in frequency by Middlebrooks [11], and database matching by Zotkin et al. [12] already suggested that the hypothesis is somewhat valid although a perfect localization (equivalent to the localization based on the listener s own HRTFs) was never closely achieved. For example, the work of Middlebrooks is based on the idea that Copyright c 2008 The Institute of Electronics, Information and Communication Engineers
2 346 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 the HRTF will be shifted toward the lower frequencies while maintaining its shape when the pinna is scaled up in size. If the listener deduces the source elevation from the positions of peaks and notches in the oncoming sound spectrum, localization with the scaled-up pinna larger than the listener s own pinna will result in systematic bias in elevation perception and personalization may be achieved simply by scaling down the HRTF of the scaled-up pinna. However, the pinnae of different individuals are different in many more aspects than just a simple scaling, and an insignificantly small change in the shape of the pinna can cause dramatic changes in the HRTF. The database matching technique suggested by Zotkin et al. [12] relies on the HRTF database released by the CIPIC Interface Laboratory at UC Davis containing 43 sets of individual HRTFs and 2 sets of KEMAR HRTFs along with some anthropometric information. By taking a picture of the listener s own ear and comparing the anthropometric parameters measured from the image to the ones provided in the database, they selected the best matching set of individual HRTFs for virtual auditory synthesis. Although the localization performance on source elevation was improved by 20 30% for 4 out of 6 subjects, this method requires a sophisticated imaging system that can capture the subject s ear to its real life size and automatically compute the anthropometric dimensions from the image. In 1984, Morimoto and Aokata [13] introduced the interaural-polar coordinate system and showed that the similar spectral cues observed in the median plane occur in any sagittal plane. Moreover, Wightman and Kistler [14] conducted a series of experiments in which the produced stimuli contained the ITD signaling one direction and ILD and pinna cues signaling another direction through manipulation of the ITD in the measured HRTFs of several individuals. The apparent lateral directions of such stimuli with conflicting cues almost always followed the ITD cue as long as the stimuli included low frequencies. Morimoto et al. [15] proposed a new sound localization method based on [13] that successfully rendered 3-d sound images in a sagittal plane by simulating interaural differences (ITD and ILD) and individual HRTFs measured in the median plane. They further showed that the ITD was dominant on lateral perception by performing localization tests in which either one of the ITD or ILD was manipulated separately while the other one was kept at zero. In this paper, a measurement-free and yet effective HRTF customization method that can be based on any individual HRTF database of substantial size is proposed. The goal of our study is not in the retrieval of exact individual HRTFs. Rather, our goal lies in the development of hybrid HRTFs that can deliver the necessary vertical perception better than the non-individual HRTFs while reducing front/back reversal for any particular listener. The basic idea is similar to that suggested in [15]. Vertical perception is controlled by modifying the pinna responses extracted from the median HRIRs in any individual HRTF database that does not contain the HRTF of the target subject, and lateral perception is controlled by introducing the head shadow effect to compensate for ILDs and proper ITDs that are represented as simple linear delays. Justification for approximating the HRTF phase as linear functions independent of frequency can be found in the work of Kulkarni et al. [16]. Our method is developed primarily in the time domain because structural decomposition of an HRTF is generally not easy in the frequency domain. An HRIR is a sequence of temporal events of sound waves reaching the ears over multiple paths. Therefore, the pinna response can be easily extracted from an HRIR simply by clipping away the shoulder/torso response and keeping only the early response since the pinna is located closest to the ear canal. Brown and Duda [17] argued that most pinna activity occurred in the first 0.7 ms since the arrival of the direct pulse by comparing the KE- MAR HRIRs measured with pinna to the HRIRs measured without pinna. However, a more detailed comparison of the data presented in their work reveals that the difference is not so prominent after the first 0.2 ms. Examination of the HRIRs from our HRTF database [18] and those from the CIPIC HRTF database [19] also indicates that most pinna activity with largest intersubject variation is concentrated in the first 0.2 ms, which corresponds to 10 samples at a khz sampling rate. The proposed HRTF customization procedure consists of the following steps (See Fig. 1). First, the temporal pinna responses, each containing exactly 10 samples from the beginning of the direct pulse, are extracted from a group of individual HRIRs measured in the median plane after all initial time delays are removed. Then, principal components analysis (PCA) is performed on the isolated pinna Fig. 1 Outline of procedures for the proposed HRTF customization method.
3 SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION 347 responses at each selected elevation angle to model them as linear combinations of 4 or 5 basis functions (or principal components) by using the covariance method [20]. A graphical user interface (GUI) designed using MATLAB TM allows the subject to tune the pinna response by changing the weight on each basis function and listening to a broadband stimulus (100 Hz 20 khz) filtered with the resulting pinna response aligned with a shoulder/torso response extracted from the KEMAR HRIR at the same elevation angle over a set of headphones (Sennheiser HD 250 linear II). KEMAR s shoulder/torso response at each elevation angle can be obtained simply by clipping away the pinna response and linear delay from the corresponding KEMAR HRIR and this step is indicated by the dashed crosses shown in Fig. 1. Adjustment of the weight on each basis function can continue until a satisfactory elevation perception is achieved. The proposed HRIR customization procedure also includes the steps for introducing the head shadow effect and individualized ITDs to the customized pinna responses as shown in Fig. 1 for an accurate virtual auditory synthesis in the entire 3-d space around a target listener s head. However, it should be noted that these interaural differences were ignored in this study because we wanted to verify first the effectiveness of the proposed HRIR customization method in rendering enhanced elevation perception and reduced front/back confusion in the median plane only where all the interaural differences are zero. A total of 4 subjects with normal hearing sensitivity participated in this study. For performance comparison, the individual HRTFs of these 4 participants were measured in the median plane. Subjective listening tests were performed on the customized HRIRs, individual HRIRs, and the KEMAR HRIRs in order to verify feasibility of the proposed method. 2. Method 2.1 PCA of Pinna Responses in the Time Domain A typical HRIR can be decomposed into a series of tempo- ral sound events as shown in Fig. 2. There is first an initial time delay due to the distance of the source with respect to the ears. Then, a direct pulse whose amplitude depends on the source distance and shadow effect arrives, followed by a ridge-trough combination caused by reflection and diffraction due to pinna cavities. The rest of the signal contains reflections from shoulder, torso, and measurement devices such as the turntable and vertical hoop stand for holding the point source at desired angle. Technically, the direct pulse cannot be part of the pinna response, but the early response that lasts for about 0.2 ms since the arrival of the direct pulse is referred to as the pinna response throughout the rest of this paper for convenience. It should be noted that the individual HRIRs used in our analysis are the ones from the CIPIC HRTF database [19] containing HRTFs obtained from 43 individual subjects plus the KEMAR with 2 sets of pinnae of different size. The procedure of the covariance method [20] used for PCA is as follows. Let X be an M by N data matrix containing the extracted pinna responses at selected elevation angle where M is the number of total dimensions (10 in this case) and N is the number of available data sets (45 in this case). The empirical mean of X along each dimension m = 1,...,M can be computed from u[m] = 1 N N X[m, n]. (1) n=1 The empirical mean of the 45 individual pinna responses measured at 45 elevation is shown for both ears in Fig. 3 as an example. This mean vector u is then subtracted from each column of X to get a mean-subtracted data matrix B: B = X u h (2) where h is a 1 by N row vector of all 1 s. The M by M covariance matrix C is obtained from the outer product of B with itself: 1 C = E[B B] = N 1 B B (3) where * is the conjugate transpose operator. Next, the eigenvalue matrix D and the orthonormal eigenvector matrix V of Fig. 2 Structural decomposition of an HRIR measured with a B&K HATS (Head And Torso Simulator) with an acoustic point source located at 0 azimuth and 0 elevation [18]. Fig. 3 Empirical mean of 45 pinna responses per each ear collected from the CIPIC HRIRs measured at 45 elevation.
4 348 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 the covariance matrix C are computed satisfying the following relationship: C V = V D (4) where D is an M by M diagonal matrix with eigenvalues of C in the diagonal. Matrices V and D must be rearranged in order of decreasing eigenvalue. Now the eigenvalues represent the energy distribution of the data X among each of the eigenvectors that forms a basis for the data. The cumulative energy content g is the sum of the energy content across all of the eigenvectors from 1 through m: g[m] = m λ q (5) q=1 where λ q is the qth eigenvalue and m = 1,...,M. By choosing a suitable accuracy bound, which was set to be more than 90% of the total energy stored in the original data in our analysis, a subset of the eigenvectors are selected as basis vectors (principal components). The first L columns of V that satisfies the following accuracy bound on the cumulative energy ratio (CER) are chosen as the principal components (PCs): CER (%) = g[l] 100 > 90%. (6) g[m] The CER computed for the pinna responses at 45 elevation using the above equation with L = 1,...,10 is shown in Fig. 4. It can be seen that at least 5 PCs are required for the modeled data to represent more than 90% of the energy in the original data for both ears. So L = 5 in this case. These 5 PCs obtained for each ear are shown in Fig. 5. Note that the PCs obtained for the left ear pinna responses are almost identical to those obtained for the right ear pinna responses. This was generally the case for other sets of data at different elevation angles. Sometimes the required number of PCs was 4 depending on the elevation angle. Now let W be an M by L matrix with L PCs as its column vectors: for p = 1,...,M and q = 1,...,L. A new data matrix Y, which is a transformation of X onto the L principal components, can be obtained simply by Y = W B. (8) This new data matrix Y (an L by N matrix) can then be used to retrieve a truncated version of the original data X by X = W Y + u h. (9) In essence, a linear superposition of the L PCs in W with the nth column of Y as a set of L principal component weights (PCWs) approximately recovers the nth column of the orig- Fig. 5 Five basis functions (principal components: PC1 PC5) of the pinna responses at 45 elevation. The solid lines denote the left ear principal components and the dashed lines denote the right ear principal components. W[p, q] = V[p, q] (7) Fig. 4 Cumulative energy ratio (CER in Eq. (6)) plotted with increasing number of PCs for 45 elevation. The number of PCs on the horizontal axis represents L in Eq. (6). Fig. 6 Pinna responses at 45 elevation of subject 50 (solid) in the CIPIC HRTF database and their approximations (dashed) computed as a linear combination of the 5 PCs per ear shown in Fig. 5. Left ear responses are plotted in the upper panel and right ear responses in the lower panel.
5 SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION 349 Fig. 7 Five sets of PCWs required in order to recover the original pinna responses in the CIPIC HRTF database as linear combinations of the five PCs (left ear) depicted in Fig. 5. Note that the distribution of the PCWs becomes smaller as the eigenvalue decreases. Fig. 9 Left ear pinna responses of subject 8 (solid), subject 60 (dashed), and subject 153 (dotted) from the CIPIC HRTF database at various elevation angles. The numbers in the right indicate the corresponding angles. Fig. 8 Left ear pinna responses at 45 elevation of 4 randomly selected subjects from the CIPIC HRTF database. inal data X. The left and right pinna responses at 45 elevation for subject 50 from the CIPIC HRTF database along with the approximations computed using Eq. (9) are plotted for comparison in Fig. 6. It can be seen that 5 PCs are enough to recover the original data with close resemblance. 45 sets of 5 PCWs for the left ear PCs shown in Fig. 5 that are required to model the entire 45 left ear pinna responses in the CIPIC HRTF database are captured in Fig. 7. Note that the spread of PCWs is the largest for PC 1 and smallest for PC 5. This is a direct consequence of rearranging V and D (Eq. (4)) in order of decreasing eigenvalue since larger eigenvalue implies bigger energy distribution of the original data along the corresponding eigenvector. In other words, the first 2 PCs are more important basis functions than the latter 3 PCs in representing the variation of the original data. The left ear pinna responses of 4 randomly selected individuals at 45 elevation depicted in Fig. 8 shows large intersubject variations around 0.08, 0.11, and 0.16 ms. One can easily observe from the left ear PCs in Fig. 5 that the first 3 PCs have ridges at the above temporal positions indicating that a linear combination of these first 3 PCs with appropriate PCWs can cover most intersubject variation in the shape and amplitude of the ridge-trough pair following the direct pulse. Amplitude variation of the direct pulse can be covered with PC 5 because it has a ridge in the region where the direct pulse is likely to reside. Therefore, by allowing a subject to tune the weight on each PC for customization, one is merely adding a timed ridge-trough pair with adjusted amplitude and an overall level shift to the mean pinna response in Fig. 3. The left ear pinna responses of 3 randomly selected individuals at elevations from 30 through 210 are plotted in Fig. 9 in order to observe the intersubject variation pattern per elevation angle in the median plane. The most common and salient change in the individual pinna responses as the source climbs in elevation lies in the arrival time and level of the first reflection (second ridge) immediately after the direct pulse (first ridge) and also in the shape and duration of the trough that follows. The temporal interval between the arrivals of the direct pulse and first reflection contracts as the source rises in the frontal hemisphere up to 60 where the two pulses merge into a single ridge. The two pulses stay merged for all rear source positions. Meanwhile, the width of the following trough decreases as the source rises to 90 which is directly over the head and increases back as the source descends in the rear hemisphere. The above
6 350 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 phenomenon is similar to that observed by Hiranaka and Yamasaki [21]. After examining many individual pinna responses in the CIPIC HRTF database, we could conclude that most intersubject variation in pinna responses lies in the amplitude and arrival times of either the direct pulse or ridge-trough pair depending on the elevation angle of the source. Note that these intersubject variations become quite small as the source moves into the rear hemisphere especially when the source lies directly behind the listener at 180.However,it can be shown that even a very small difference in the time domain yields a large difference in the frequency domain. 2.2 PCW Tuning for Customization As mentioned above, letting a subject tune the weight on each PC brings an actual change in the shape of the pinna response. Four male subjects with normal hearing sensitivity participated in making customized HRTFs by using a GUI (graphical user interface) depicted in Fig. 10. Sectors in the GUI are bound by boxes and labeled per function in the figure. A subject may choose any elevation angle from 45 to 230 in the median plane since the HRTFs from the CIPIC HRTF database are available in that angular range at intervals of However, customization was only carried out at 9 specific elevation angles from 30 to 210 at 30 intervals in the median plane in order to compare the localization performance of the customized HRTFs to that of individual HRTFs of the participants measured at those angles. Bal- ance control in the GUI adjusts gains to be applied to the left and right channels since it is necessary to render sound images in the center before the tuning commences and an interaural difference in perceived levels between the left and right ears is quite common even for individuals with normal hearing sensitivity. As mentioned in the previous section, the PCs obtained for left and right ears turned out to be similar to each other at most elevation angles despite the interaural shape difference in pinna responses for some individuals in the CIPIC HRTF database. As a result, ear symmetry was assumed and customization was performed by tuning the PCWs on only one ear. The slider on each slide-bar on the GUI represents the PCW values for each PC. After punching in an elevation angle at which customization is to be performed, principal components analysis is executed on the isolated pinna responses measured at the specified angle and corresponding PCs are computed by simply pushing the PCA button. Then, each participant fiddles with the slide-bars to adjust the PCW on each PC and listens to an input stimulus (100 Hz 20 khz) filtered by the newly created HRIR (marked as Custom HRIR in Fig. 10) by pushing the PLAY button. This Custom HRIR is formed by aligning the pinna response obtained as a linear combination of the tuned PCs to the shoulder/torso response of the KEMAR HRIR measured at the same angle. The PLAY KEMAR button is for listening to the same input stimulus filtered by the KEMAR HRIR. Some listeners may find the vertical perceptions produced by the KEMAR HRIRs good enough in which case they can tune the PCWs so Fig. 10 A MATLAB TM GUI for pinna response customization based on tuning of PCWs (See text for details).
7 SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION 351 that the resulting pinna response shown as a solid line in the top-right panel on the GUI takes a similar shape with that of the KEMAR s shown as a dashed line in the same plot or simply keep the KEMAR HRIR as their customized HRIR at each angle of concern. On the other hand, if the KEMAR HRIR performs poorly in producing the necessary vertical effects, then the tuning can continue until each participant is satisfied with the resulting vertical effect he or she perceives. In our study, all participants reported unsatisfactory vertical perceptions with the KEMAR HRIRs so the tuning was performed on all target angles. Note that the headphone- pinna coupling effectforeachsubjectwascancelledusing thesubject s own headphone-to-meatus-entrance transfer function for all the output stimuli produced in the above tuning experiment. 2.3 Individual HRTF Measurement The individual HRTFs of the four subjects who participated in the above tuning experiment were measured at the elevation angles where the pinna customization took place. Subjects were seated in a chair coupled to a vertical hoop designed to hold an acoustic point source. Details on the measurement apparatus and method can be looked up in our previous work on modeling the HRTFs for nearby sources [18]. For correct headphone-presented simulation of freefield listening when evaluating these individual HRTFs on their localization capabilities, headphone-pinna coupling effect was cancelled using the headphone-to-meatus-entrance transfer function measured on each subject according to the method suggested by Wightman and Kistler [22]. While a typical HRTF measurement for an individual is carried out by placing a probe tube in the ear canal at a position very close to the eardrum, this is obviously a very difficult task. Møller, Sorensen, Hammershøi, and Jensen [23] demonstrated that HRTF measurements could also be made by measuring free-field and headphone responses at the entrance of a blocked ear canal. With their technique, however, a miniature microphone embedded in an earplug that can be fitted in each subject s ear canal is required. Instead of dealing with all the laborious procedures involved in the conventional measurement techniques, we adopted the Fig. 11 B&K Binaural Microphone Type 4101 (right) for measuring individual HRTFs mounted inside a subject s pinna at the entrance to the ear canal (left). blocked-meatus measurement technique using a B&K Binaural Microphone Type 4101 mounted inside each subject s pinna as shown in Fig. 11 for the sakes of convenience and efficiency. Although this stethoscope-like microphone set simplifies the overall measurement process by far, it was difficult to bend the microphone arms so that the microphone tips could be fitted with precision at the ear canal entrance without touching the tragus. Anchoring them in the exact same positions during measurement was another difficulty we faced. The microphone arms were taped on each subject s lower cheeks in an effort to anchor the microphone tips and the subjects were instructed to restrain from making any noticeable movement during the experiment. However, as the evaluation results shown in the subsequent chapter suggest, we believe that our individual HRTFs contain some errors induced by imprecise positioning of the microphone tips. 3. Subjective Evaluation Results Subjective listening tests were carried out on all four subjects (ID: SK, HS, KB, and CH) to assess the performance of the three HRIR sets: Customized HRIRs, individual HRIRs, and KEMAR HRIRs. In an attempt to prevent any possible learning acquired by the subject during the tuning process from affecting the overall evaluation result, the evaluation experiment was conducted several days after completion of tuning by all subjects. The subjects listened to broadband stimuli filtered by HRIRs from each of the above three HRIR sets over the headphones and gave their perceived responses by typing into a GUI designed for the evaluation test. Each of the 9 elevation angles is simulated 10 times in a random order yielding in total 90 stimuli to evaluate per HRIR set. The subjective evaluation results are shown in Figs for all 4 subjects. Evaluations on the KEMAR, individual, and customized HRIRs are displayed in the left, center, and right panel, respectively, in each figure. The horizontal axis denotes the actual source positions and the vertical axis denotes the perceived source positions in each panel. Note the response frequency scale drawn in a small box in the right panel of Fig. 15. The response frequency is represented by the size of the square with the largest square indicating 10 redundant responses and the smallest square indicating 1 response per each source location. The positive-sloped diagonal line in each panel indicates the perfect hearing condition in which the perceived source position corresponds exactly with the actual source position. The following observations are based on the evaluation responses presented in Figs All subjects reported difficulties of varying degree in making correct judgments on the source elevation on most trials with the KEMAR HRIRs. Either front/back reversal was frequent (especially for subjects SK and CH), which is evident from the many off-diagonal responses in symmetric positions with respect to the diagonal, or localization performance was low (for all 4 subjects) judging by the large response spread about
8 352 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 Fig. 12 Subjective evaluation result for subject SK on 3 HRIR sets: KEMAR, individual, and customized HRIRs (Refer to text for detail). Fig. 13 Subjective evaluation result for subject HS (Refer to text for detail). Fig. 14 Subjective evaluation result for subject KB (Refer to text for detail). Fig. 15 Subjective evaluation result for subject CH (Refer to text for detail). the diagonal. With individual HRIRs, front/back reversals were reduced for all subjects except for subject HS who often perceived the frontal sources at 30 and 0 to be in the rear instead. Subject KB made quite a few errors in
9 SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION 353 Table 1 Localization errors s in Eq. (10) and front/back confusion counts computed by resolution of the responses shown in Figs The letters denote confusion clusters, i.e. C indicate the total confusions, B the backward confusions, and F the forward confusions. Fig. 16 Illustration for resolving front/back confusions. The confusions are reflected about the vertical plane (horizontal dashed line) onto the correct hemisphere. localizing the rear sources even with his own HRIRs and the scattered responses produced by subject CH for sources at 30,0, and 30 suggest that he too had difficulty in localizing the frontal sources near the horizontal plane. In general, however, it can be said that all subjects performed better with their own individual HRIRs than with the KE- MAR HRIRs judging by the tighter distribution of the responses around the diagonal. Comparison of the response data made with customized HRIRs to those made with the KEMAR HRIRs reveals the following. Front/back reversals were reduced for all subjects with customized HRIRs except for subject HS who made similar confusion errors for the sources on and below the horizontal plane as he did with his own HRIR set. The localization performance was enhanced for all subjects for most source positions judging by the smaller spread about the diagonal. Although subject HS localization performance with customized HRIRs was poor for sources near the horizontal plane, it was slightly improved for sources positioned at other elevation angles, i.e. from 0 to 150. Subject KB made poor elevation judgments with customized HRIRs as the source shifted from 90 to 210 into the rear hemisphere, but it should be noted that his localization performance on rear sources was poor with all 3 HRIR sets. When computing error indices to account for the localization performance associated with a particular set of HRIRs, it has been common practice to treat front/back confusions and localization accuracy separately by resolving the confusions in order to avoid error inflation [24]. On the other hand, resolution of the confusions can be misleading if we assume the responses correctly reflect the subject s perception. However, since our primary goal was to compare the three HRIR sets in terms of localization performance, we too elected to resolve all apparent confusions and report the incidence of confusions associated with each set of HRIRs. If the angle between the actual source position and the perceived response is made smaller by reflecting the response about the vertical plane passing through the subject s ears as shown in Fig. 16, the response is entered in reflected form and the confusion count is increased by one. Then, the localization error was computed in the root mean square sense including both the responses lying in the same hemisphere as the sources and the confusions in reflected form by the following definition 1 90 s = (x i φ source (i)) 2 90 i=1 1 2 (10) where x i is the perceived response for the ith stimulus corresponding to the actual source position φ source (i) andthe number 90 is the total number of presented stimuli per each HRIR set. Table 1 depicts these RMS errors and the confusion counts organized per subject per HRIR set evaluated. The RMS errors are indicated by the numbers in the top row of each cell and the confusion counts follow in the bottom row in the form: s/no. of total confusions (no. of backward confusions + no. of forward confusions). From these error indices shown in Table 1 we can deduce the following conclusions regarding the localization performance associated with each set of HRIRs. Comparison of the localization errors produced with the KEMAR HRIRs to those with the customized HRIRs reveals that the localization accuracy was improved by far with the customized HRIRs for subjects KB and CH whereas subjects SK and HS showed slightly better accuracy with the KE- MAR HRIRs. Obviously this is a direct result of resolution of the confusions because it appears to be otherwise for subjects SK and HS in Figs. 12 and 13. Of course, with the customized HRIRs front/back confusions were reduced for all subjects, and in particular, subjects SK and CH have shown dramatic improvements, i.e., the confusion counts wentfrom29to9forskandfrom43to6forch.on the contrary, the localization performance with individual HRIRs was not quite satisfactory for all subjects. Individual HRIRs are generally known to produce good localization results, but studies in the past like the one by Wightman and Kistler [24] show that headphone simulation of free-field lis-
10 354 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 Fig. 17 KEMAR (solid), individual (dashed), and customized (dotted) HRIRs (left) and the corresponding HRTFs (right) for subject SK. Fig. 18 KEMAR (solid), individual (dashed), and customized (dotted) HRIRs (left) and the corresponding HRTFs (right) for subject CH. tening tend to produce more frequent front/back confusions and less well defined source elevation as opposed to the freefield condition. With individual HRIRs, subjects HS and KB produced the best overall localization accuracy and subject KB s front/back confusions weretheleastof all three HRIR cases. On the other hand, the localization performance indices by the customized and individual HRIRs indicate that subjects SK and CH showed better localization accuracy and subjects HS and CH produced less confusions with the customized HRIRs than with individual HRIRs. In short, it can be said that with the customized HRIRs most subjects produced less confusions and 2 out of 4 subjects (SK and CH) performed best in the aspects of both the localization accuracy and front/back confusion.
11 SHIN and PARK: ENHANCED VERTICAL PERCEPTION THROUGH HEAD-RELATED IMPULSE RESPONSE CUSTOMIZATION 355 The customized and individual HRIRs for subjects SK and CH along with the KEMAR HRIRs and the corresponding HRTFs which are direct Fourier transforms obtained from the temporal responses are depicted in Figs. 17 and 18 for example. These plots immediately reveal that most spectral deviations among the HRTFs take place in the high frequency region and that the differences between the KEMAR and customized HRTFs mostly occur in the region above 6 khz, which is a direct consequence of the pinna response modification by tuning. It is also clear that even a small variation in the time response renders a substantial difference in the frequency response. In our study, we had hoped to find some similarity between the customized and individual responses both in the temporal and spectral shapes because in theory the two sets of responses are supposed to capture and reflect the individual pinna features better than the KEMAR HRIRs if the tuning had worked well as it did for these two subjects in particular. Unfortunately however, as was expected during the measurement phase of our study and also from the analysis of the evaluation results, there was very little similarity or none at all between the customized and individual HRTFs. The spectral notches and roll-offs that are known to be responsible for elevation perception do not seem to coincide even barely except for a few spectral regions, i.e. notches at 7 khz at 210, roll-offs at10khzat 150, notches at 16 khz at 120 and notches at 11.3 khz at 90 for subject SK in Fig. 17. Although the localization performances by subjects SK and CH using their own HRIRs were passable considering the results of headphone simulation of free-field condition achieved by others in the past, we believe that the individual HRIRs measured in this study contain errors probably induced by imprecise positioning of the microphone tips at the ear canal entrance as mentioned earlier. As a result, we cannot confirm if the spectral features in the HRTFs obtained by the proposed customization method indeed represent each individual s pinna characteristics at this point even though they have shown to bring improvements in the localization performance. 4. Discussion and Future Work The proposed HRIR customization method based on tuning of the basis functions obtained from decomposition of the pinna responses in the time domain by PCA was shown to be effectivein producing the necessaryvertical effects while reducing front/back reversals. We confirmed this by a series of subjective listening tests. With the customized HRIRs in comparison to the KEMAR HRIRs, 2 out of 4 subjects managed to show explicit improvements with noticeable decrease in front/back reversals while the other 2 subjects demonstrated enhanced elevation perception to some degree. All subjects reported that the sources at 60,90,and 120 in elevation angle were among the toughest to discriminate from one another for both individual and customized HRIRs and that they had to guess the source elevation on most trials with the KEMAR HRIRs. We also verified that similar vertical effects could also be generated at other azimuthal directions simply by adding proper ITDs to the customized HRIRs developed using the proposed method. The localization performance in other sagittal planes along with detailed analysis will follow in a subsequent paper. Acknowledgments This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the National Research Laboratory Program (M J ) and the BK 21 Project (2006) of Republic of Korea. References [1] E.A.G. Shaw and R. Teranishi, Sound pressure generated in an external-ear replica and real human ears by a nearby point source, J. Acoust. Soc. Am., vol.44, pp , [2] J. Blauert, Sound localization in the median plane, Acoustica, vol.22, pp , 1969/1970. [3] J. Hebrank and D. Wright, Spectral cues used in the localization of sound sources on the median plane, J. Acoust. Soc. Am., vol.56, pp , [4] R.A. Butler and K. Belendiuk, Spectral cues utilized in the localization of sound in the median sagittal plane, J. Acoust. Soc. Am., vol.61, pp , [5] P.J. Bloom, Determination of monaural sensitivity changes due to the pinna by use of minimum-audible-field measurements in the lateral vertical plane, J. Acoust. Soc. Am., vol.61, pp , [6] E.A. Lopez-Poveda and R. Meddis, A physical model of sound diffraction and reflections in the human concha, J. Acoust. Soc. Am., vol.100, pp , [7] E.H.A. Langendijk and A.W. Bronkhorst, Contribution of spectral cues to human sound localization, J. Acoust. Soc. Am., vol.112, pp , [8] H.W. Gierlich, The application of binaural technology, Applied Acoustics, vol.36, pp , [9] S. Shimada, M. Hayashi, and S. Hayashi, A clustering method for sound localization transfer functions, J. Audio Eng. Soc., vol.42, pp , [10] V.R. Algazi, R.O. Duda, R.P. Morrison, and D.M. Thompson, Structural composition and decomposition of HRTFs, Proc. WAS- PAA01, pp , New Paltz, NY, [11] J.C. Middlebrooks, Virtual localization improved by scaling nonindividualized external-ear transfer functions in frequency, J. Acoust. Soc. Am., vol.106, pp , [12] D.N. Zotkin, R. Duraiswami, and L.S. Davis, Customizable auditory displays, Proc. Int. Conf. on Auditory Display (ICAD), pp , Kyoto, Japan, [13] M. Morimoto and H. Aokata, Localization cues of sound sources in the upper hemisphere, J. Acoust. Soc. Jpn. (E), vol.5, pp , [14] F.L. Wightman and D.J. Kistler, The dominant role of lowfrequency interaural time differences in sound localization, J. Acoust. Soc. Am., vol.91, pp , [15] M. Morimoto, M. Itoh, and K. Iida, 3-D sound image localization by interaural differences and the median plane HRTF, Proc Int. Conf. on Auditory Display (ICAD), Kyoto, Japan, July [16] A. Kulkarni, S.K. Isabelle, and H.S. Colburn, Sensitivity of human subjects to head-related transfer-function phase spectra, J. Acoust. Soc. Am., vol.105, pp , [17] C.P. Brown and R.O. Duda, A structural model for binaural sound synthesis, IEEE Trans. Speech Audio Process., vol.6, no.5, pp , [18] K. Shin and Y. Park, Modeling of non-individualized head-related transfer functions for nearby sources, Proc. 9th Western Pacific
12 356 IEICE TRANS. FUNDAMENTALS, VOL.E91 A, NO.1 JANUARY 2008 Acoustics Conf. (WESPAC), pp , Seoul, Korea, June [19] CIPIC HRTF Database Files, Release 1.1, August 2001, CIPIC Interface Laboratory, U.C. Davis, available from ucdavis.edu/ [20] J.E. Jackson, A User s Guide to Principal Components, pp.1 25, John Wiley & Sons, [21] Y. Hiranaka and H. Yamasaki, Envelope representation of pinna impulse responses relating to three-dimensional localization of sound sources, J. Acoust. Soc. Am., vol.73, pp , [22] F.L. Wightman and D.J. Kistler, Headphone simulation of freefield listening. I: Stimulus synthesis, J. Acoust. Soc. Am., vol.85, pp , [23] H. Møller, M.F. Sorensen, D. Hammershøi, and C.B. Jensen, Headrelated transfer functions of human subjects, J. Audio Eng. Soc., vol.43, pp , [24] F.L. Wightman and D.J. Kistler, Headphone simulation of freefield listening. II: Psychophysical validation, J. Acoust. Soc. Am., vol.85, pp , Ki Hoon Shin was born in Seoul, Korea in He received his B.S. and M.S. degrees in mechanical engineering from University of Rochester, NY, in 1996 and 1998, respectively. From 1998 to 2000, he was enrolled in a Ph.D. program in aerospace engineering at Georgia Tech, GA. Since 2001, he engaged in researches on virtual audio synthesis for a Ph.D. in mechanical engineering at Korea Advanced Institute of Science and Technology (KAIST). He is now at the Digital Media R&D Center of Samsung Electronics developing audio algorithms for DTVs and home theatres. Youngjin Park was born in Seoul, Korea in He received his B.S. and M.S. degrees in mechanical engineering from Seoul National University in 1980 and 1982, respectively, and the Ph.D. in mechanical engineering from University of Michigan, MI, in From 1987 to 1988, he worked as a research fellow at University of Michigan. He also worked as an assistant professor at NJIT, NJ, from 1988 to He joined the faculty of Korea Advanced Institute of Science and Technology (KAIST) in 1990, where he is a Professor of Mechanical Engineering. His research interests include general control theories, virtual audio synthesis, active control of noise and vibration, system identification, etc.
HRIR Customization in the Median Plane via Principal Components Analysis
한국소음진동공학회 27 년춘계학술대회논문집 KSNVE7S-6- HRIR Customization in the Median Plane via Principal Components Analysis 주성분분석을이용한 HRIR 맞춤기법 Sungmok Hwang and Youngjin Park* 황성목 박영진 Key Words : Head-Related Transfer
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 007 A MODEL OF THE HEAD-RELATED TRANSFER FUNCTION BASED ON SPECTRAL CUES PACS: 43.66.Qp, 43.66.Pn, 43.66Ba Iida, Kazuhiro 1 ; Itoh, Motokuni
More informationSpatial Audio Reproduction: Towards Individualized Binaural Sound
Spatial Audio Reproduction: Towards Individualized Binaural Sound WILLIAM G. GARDNER Wave Arts, Inc. Arlington, Massachusetts INTRODUCTION The compact disc (CD) format records audio with 16-bit resolution
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 1, 21 http://acousticalsociety.org/ ICA 21 Montreal Montreal, Canada 2 - June 21 Psychological and Physiological Acoustics Session appb: Binaural Hearing (Poster
More informationAudio Engineering Society. Convention Paper. Presented at the 131st Convention 2011 October New York, NY, USA
Audio Engineering Society Convention Paper Presented at the 131st Convention 2011 October 20 23 New York, NY, USA This Convention paper was selected based on a submitted abstract and 750-word precis that
More informationEnhancing 3D Audio Using Blind Bandwidth Extension
Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,
More informationSound Source Localization using HRTF database
ICCAS June -, KINTEX, Gyeonggi-Do, Korea Sound Source Localization using HRTF database Sungmok Hwang*, Youngjin Park and Younsik Park * Center for Noise and Vibration Control, Dept. of Mech. Eng., KAIST,
More information3D sound image control by individualized parametric head-related transfer functions
D sound image control by individualized parametric head-related transfer functions Kazuhiro IIDA 1 and Yohji ISHII 1 Chiba Institute of Technology 2-17-1 Tsudanuma, Narashino, Chiba 275-001 JAPAN ABSTRACT
More informationAcoustics Research Institute
Austrian Academy of Sciences Acoustics Research Institute Spatial SpatialHearing: Hearing: Single SingleSound SoundSource Sourcein infree FreeField Field Piotr PiotrMajdak Majdak&&Bernhard BernhardLaback
More informationUpper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences
Acoust. Sci. & Tech. 24, 5 (23) PAPER Upper hemisphere sound localization using head-related transfer functions in the median plane and interaural differences Masayuki Morimoto 1;, Kazuhiro Iida 2;y and
More informationANALYZING NOTCH PATTERNS OF HEAD RELATED TRANSFER FUNCTIONS IN CIPIC AND SYMARE DATABASES. M. Shahnawaz, L. Bianchi, A. Sarti, S.
ANALYZING NOTCH PATTERNS OF HEAD RELATED TRANSFER FUNCTIONS IN CIPIC AND SYMARE DATABASES M. Shahnawaz, L. Bianchi, A. Sarti, S. Tubaro Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico
More informationA triangulation method for determining the perceptual center of the head for auditory stimuli
A triangulation method for determining the perceptual center of the head for auditory stimuli PACS REFERENCE: 43.66.Qp Brungart, Douglas 1 ; Neelon, Michael 2 ; Kordik, Alexander 3 ; Simpson, Brian 4 1
More informationComputational Perception. Sound localization 2
Computational Perception 15-485/785 January 22, 2008 Sound localization 2 Last lecture sound propagation: reflection, diffraction, shadowing sound intensity (db) defining computational problems sound lateralization
More informationConvention Paper 9870 Presented at the 143 rd Convention 2017 October 18 21, New York, NY, USA
Audio Engineering Society Convention Paper 987 Presented at the 143 rd Convention 217 October 18 21, New York, NY, USA This convention paper was selected based on a submitted abstract and 7-word precis
More informationORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF
ORIENTATION IN SIMPLE VIRTUAL AUDITORY SPACE CREATED WITH MEASURED HRTF F. Rund, D. Štorek, O. Glaser, M. Barda Faculty of Electrical Engineering Czech Technical University in Prague, Prague, Czech Republic
More informationA Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations
A Virtual Audio Environment for Testing Dummy- Head HRTFs modeling Real Life Situations György Wersényi Széchenyi István University, Hungary. József Répás Széchenyi István University, Hungary. Summary
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Psychological and Physiological Acoustics Session 2aPPa: Binaural Hearing
More informationAnalysis of Frontal Localization in Double Layered Loudspeaker Array System
Proceedings of 20th International Congress on Acoustics, ICA 2010 23 27 August 2010, Sydney, Australia Analysis of Frontal Localization in Double Layered Loudspeaker Array System Hyunjoo Chung (1), Sang
More informationExtracting the frequencies of the pinna spectral notches in measured head related impulse responses
Extracting the frequencies of the pinna spectral notches in measured head related impulse responses Vikas C. Raykar a and Ramani Duraiswami b Perceptual Interfaces and Reality Laboratory, Institute for
More informationBinaural Hearing. Reading: Yost Ch. 12
Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to
More informationStudy on method of estimating direct arrival using monaural modulation sp. Author(s)Ando, Masaru; Morikawa, Daisuke; Uno
JAIST Reposi https://dspace.j Title Study on method of estimating direct arrival using monaural modulation sp Author(s)Ando, Masaru; Morikawa, Daisuke; Uno Citation Journal of Signal Processing, 18(4):
More informationOn distance dependence of pinna spectral patterns in head-related transfer functions
On distance dependence of pinna spectral patterns in head-related transfer functions Simone Spagnol a) Department of Information Engineering, University of Padova, Padova 35131, Italy spagnols@dei.unipd.it
More informationHRTF adaptation and pattern learning
HRTF adaptation and pattern learning FLORIAN KLEIN * AND STEPHAN WERNER Electronic Media Technology Lab, Institute for Media Technology, Technische Universität Ilmenau, D-98693 Ilmenau, Germany The human
More informationAuditory Localization
Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception
More informationConvention Paper 9712 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany
Audio Engineering Society Convention Paper 9712 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany This convention paper was selected based on a submitted abstract and 750-word precis that
More informationModeling Head-Related Transfer Functions Based on Pinna Anthropometry
Second LACCEI International Latin American and Caribbean Conference for Engineering and Technology (LACCEI 24) Challenges and Opportunities for Engineering Education, Research and Development 2-4 June
More informationPERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION
PERSONALIZED HEAD RELATED TRANSFER FUNCTION MEASUREMENT AND VERIFICATION THROUGH SOUND LOCALIZATION RESOLUTION Michał Pec, Michał Bujacz, Paweł Strumiłło Institute of Electronics, Technical University
More informationIntroduction. 1.1 Surround sound
Introduction 1 This chapter introduces the project. First a brief description of surround sound is presented. A problem statement is defined which leads to the goal of the project. Finally the scope of
More informationPsychoacoustic Cues in Room Size Perception
Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany 6084 This convention paper has been reproduced from the author s advance manuscript, without editing,
More informationPersonalization of head-related transfer functions in the median plane based on the anthropometry of the listener s pinnae a)
Personalization of head-related transfer functions in the median plane based on the anthropometry of the listener s pinnae a) Kazuhiro Iida, b) Yohji Ishii, and Shinsuke Nishioka Faculty of Engineering,
More informationSpatial Audio & The Vestibular System!
! Spatial Audio & The Vestibular System! Gordon Wetzstein! Stanford University! EE 267 Virtual Reality! Lecture 13! stanford.edu/class/ee267/!! Updates! lab this Friday will be released as a video! TAs
More informationConvention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA
Audio Engineering Society Convention Paper Presented at the 139th Convention 2015 October 29 November 1 New York, USA 9447 This Convention paper was selected based on a submitted abstract and 750-word
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Psychological and Physiological Acoustics Session 3pPP: Multimodal Influences
More informationThe analysis of multi-channel sound reproduction algorithms using HRTF data
The analysis of multichannel sound reproduction algorithms using HRTF data B. Wiggins, I. PatersonStephens, P. Schillebeeckx Processing Applications Research Group University of Derby Derby, United Kingdom
More informationSpatial audio is a field that
[applications CORNER] Ville Pulkki and Matti Karjalainen Multichannel Audio Rendering Using Amplitude Panning Spatial audio is a field that investigates techniques to reproduce spatial attributes of sound
More informationSOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4
SOPA version 2 Revised July 7 2014 SOPA project September 21, 2014 Contents 1 Introduction 2 2 Basic concept 3 3 Capturing spatial audio 4 4 Sphere around your head 5 5 Reproduction 7 5.1 Binaural reproduction......................
More informationIvan Tashev Microsoft Research
Hannes Gamper Microsoft Research David Johnston Microsoft Research Ivan Tashev Microsoft Research Mark R. P. Thomas Dolby Laboratories Jens Ahrens Chalmers University, Sweden Augmented and virtual reality,
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution
More informationIII. Publication III. c 2005 Toni Hirvonen.
III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on
More informationA CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL
9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen
More informationSound Source Localization in Median Plane using Artificial Ear
International Conference on Control, Automation and Systems 28 Oct. 14-17, 28 in COEX, Seoul, Korea Sound Source Localization in Median Plane using Artificial Ear Sangmoon Lee 1, Sungmok Hwang 2, Youngjin
More informationExternalization in binaural synthesis: effects of recording environment and measurement procedure
Externalization in binaural synthesis: effects of recording environment and measurement procedure F. Völk, F. Heinemann and H. Fastl AG Technische Akustik, MMK, TU München, Arcisstr., 80 München, Germany
More information6-channel recording/reproduction system for 3-dimensional auralization of sound fields
Acoust. Sci. & Tech. 23, 2 (2002) TECHNICAL REPORT 6-channel recording/reproduction system for 3-dimensional auralization of sound fields Sakae Yokoyama 1;*, Kanako Ueno 2;{, Shinichi Sakamoto 2;{ and
More informationCreating three dimensions in virtual auditory displays *
Salvendy, D Harris, & RJ Koubek (eds.), (Proc HCI International 2, New Orleans, 5- August), NJ: Erlbaum, 64-68. Creating three dimensions in virtual auditory displays * Barbara Shinn-Cunningham Boston
More information396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011
396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011 Obtaining Binaural Room Impulse Responses From B-Format Impulse Responses Using Frequency-Dependent Coherence
More informationComputational Perception /785
Computational Perception 15-485/785 Assignment 1 Sound Localization due: Thursday, Jan. 31 Introduction This assignment focuses on sound localization. You will develop Matlab programs that synthesize sounds
More informationDataset of head-related transfer functions measured with a circular loudspeaker array
Acoust. Sci. & Tech. 35, 3 (214) TECHNICAL REPORT #214 The Acoustical Society of Japan Dataset of head-related transfer functions measured with a circular loudspeaker array Kanji Watanabe 1;, Yukio Iwaya
More informationIEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 26, NO. 7, JULY
IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 26, NO. 7, JULY 2018 1243 Do We Need Individual Head-Related Transfer Functions for Vertical Localization? The Case Study of a Spectral
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,700 108,500 1.7 M Open access books available International authors and editors Downloads Our
More informationSound localization with multi-loudspeakers by usage of a coincident microphone array
PAPER Sound localization with multi-loudspeakers by usage of a coincident microphone array Jun Aoki, Haruhide Hokari and Shoji Shimada Nagaoka University of Technology, 1603 1, Kamitomioka-machi, Nagaoka,
More informationSound source localization and its use in multimedia applications
Notes for lecture/ Zack Settel, McGill University Sound source localization and its use in multimedia applications Introduction With the arrival of real-time binaural or "3D" digital audio processing,
More informationDiscrimination of Virtual Haptic Textures Rendered with Different Update Rates
Discrimination of Virtual Haptic Textures Rendered with Different Update Rates Seungmoon Choi and Hong Z. Tan Haptic Interface Research Laboratory Purdue University 465 Northwestern Avenue West Lafayette,
More informationCircumaural transducer arrays for binaural synthesis
Circumaural transducer arrays for binaural synthesis R. Greff a and B. F G Katz b a A-Volute, 4120 route de Tournai, 59500 Douai, France b LIMSI-CNRS, B.P. 133, 91403 Orsay, France raphael.greff@a-volute.com
More informationListening with Headphones
Listening with Headphones Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back Substantial individual differences Most evident in elevation
More informationStructural Modeling Of Pinna-Related Transfer Functions
Structural Modeling Of Pinna-Related Transfer Functions Simone Spagnol spagnols@dei.unipd.it Michele Geronazzo Università di Padova geronazz@dei.unipd.it Federico Avanzini avanzini@dei.unipd.it ABSTRACT
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 213 http://acousticalsociety.org/ IA 213 Montreal Montreal, anada 2-7 June 213 Psychological and Physiological Acoustics Session 3pPP: Multimodal Influences
More informationMANY emerging applications require the ability to render
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 4, AUGUST 2004 553 Rendering Localized Spatial Audio in a Virtual Auditory Space Dmitry N. Zotkin, Ramani Duraiswami, Member, IEEE, and Larry S. Davis, Fellow,
More informationTHE TEMPORAL and spectral structure of a sound signal
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 13, NO. 1, JANUARY 2005 105 Localization of Virtual Sources in Multichannel Audio Reproduction Ville Pulkki and Toni Hirvonen Abstract The localization
More informationConvention Paper Presented at the 144 th Convention 2018 May 23 26, Milan, Italy
Audio Engineering Society Convention Paper Presented at the 144 th Convention 2018 May 23 26, Milan, Italy This paper was peer-reviewed as a complete manuscript for presentation at this convention. This
More informationBINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA
EUROPEAN SYMPOSIUM ON UNDERWATER BINAURAL RECORDING SYSTEM AND SOUND MAP OF MALAGA PACS: Rosas Pérez, Carmen; Luna Ramírez, Salvador Universidad de Málaga Campus de Teatinos, 29071 Málaga, España Tel:+34
More informationTara J. Martin Boston University Hearing Research Center, 677 Beacon Street, Boston, Massachusetts 02215
Localizing nearby sound sources in a classroom: Binaural room impulse responses a) Barbara G. Shinn-Cunningham b) Boston University Hearing Research Center and Departments of Cognitive and Neural Systems
More information3D Sound Simulation over Headphones
Lorenzo Picinali (lorenzo@limsi.fr or lpicinali@dmu.ac.uk) Paris, 30 th September, 2008 Chapter for the Handbook of Research on Computational Art and Creative Informatics Chapter title: 3D Sound Simulation
More informationTHE INTERACTION BETWEEN HEAD-TRACKER LATENCY, SOURCE DURATION, AND RESPONSE TIME IN THE LOCALIZATION OF VIRTUAL SOUND SOURCES
THE INTERACTION BETWEEN HEAD-TRACKER LATENCY, SOURCE DURATION, AND RESPONSE TIME IN THE LOCALIZATION OF VIRTUAL SOUND SOURCES Douglas S. Brungart Brian D. Simpson Richard L. McKinley Air Force Research
More informationNEAR-FIELD VIRTUAL AUDIO DISPLAYS
NEAR-FIELD VIRTUAL AUDIO DISPLAYS Douglas S. Brungart Human Effectiveness Directorate Air Force Research Laboratory Wright-Patterson AFB, Ohio Abstract Although virtual audio displays are capable of realistically
More informationThe relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation
Downloaded from orbit.dtu.dk on: Feb 05, 2018 The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation Käsbach, Johannes;
More informationTone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.
Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and
More informationPerceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction.
Perceptual effects of visual images on out-of-head localization of sounds produced by binaural recording and reproduction Eiichi Miyasaka 1 1 Introduction Large-screen HDTV sets with the screen sizes over
More informationPaper Body Vibration Effects on Perceived Reality with Multi-modal Contents
ITE Trans. on MTA Vol. 2, No. 1, pp. 46-5 (214) Copyright 214 by ITE Transactions on Media Technology and Applications (MTA) Paper Body Vibration Effects on Perceived Reality with Multi-modal Contents
More informationUniversity of Huddersfield Repository
University of Huddersfield Repository Moore, David J. and Wakefield, Jonathan P. Surround Sound for Large Audiences: What are the Problems? Original Citation Moore, David J. and Wakefield, Jonathan P.
More informationTHE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS
THE DEVELOPMENT OF A DESIGN TOOL FOR 5-SPEAKER SURROUND SOUND DECODERS by John David Moore A thesis submitted to the University of Huddersfield in partial fulfilment of the requirements for the degree
More informationROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES
ROOM AND CONCERT HALL ACOUSTICS The perception of sound by human listeners in a listening space, such as a room or a concert hall is a complicated function of the type of source sound (speech, oration,
More information19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST
19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 VIRTUAL AUDIO REPRODUCED IN A HEADREST PACS: 43.25.Lj M.Jones, S.J.Elliott, T.Takeuchi, J.Beer Institute of Sound and Vibration Research;
More informationHRTF measurement on KEMAR manikin
Proceedings of ACOUSTICS 29 23 25 November 29, Adelaide, Australia HRTF measurement on KEMAR manikin Mengqiu Zhang, Wen Zhang, Rodney A. Kennedy, and Thushara D. Abhayapala ABSTRACT Applied Signal Processing
More informationAudio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA. Why Ambisonics Does Work
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
More informationIntensity Discrimination and Binaural Interaction
Technical University of Denmark Intensity Discrimination and Binaural Interaction 2 nd semester project DTU Electrical Engineering Acoustic Technology Spring semester 2008 Group 5 Troels Schmidt Lindgreen
More informationANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES
Abstract ANALYSIS AND EVALUATION OF IRREGULARITY IN PITCH VIBRATO FOR STRING-INSTRUMENT TONES William L. Martens Faculty of Architecture, Design and Planning University of Sydney, Sydney NSW 2006, Australia
More informationInterference in stimuli employed to assess masking by substitution. Bernt Christian Skottun. Ullevaalsalleen 4C Oslo. Norway
Interference in stimuli employed to assess masking by substitution Bernt Christian Skottun Ullevaalsalleen 4C 0852 Oslo Norway Short heading: Interference ABSTRACT Enns and Di Lollo (1997, Psychological
More informationAN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES
Proceedings of the COST G-6 Conference on Digital Audio Effects (DAFX-), Verona, Italy, December 7-9,2 AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES Tapio Lokki Telecommunications
More informationAalborg Universitet. Binaural Technique Hammershøi, Dorte; Møller, Henrik. Published in: Communication Acoustics. Publication date: 2005
Aalborg Universitet Binaural Technique Hammershøi, Dorte; Møller, Henrik Published in: Communication Acoustics Publication date: 25 Link to publication from Aalborg University Citation for published version
More informationSimulation of wave field synthesis
Simulation of wave field synthesis F. Völk, J. Konradl and H. Fastl AG Technische Akustik, MMK, TU München, Arcisstr. 21, 80333 München, Germany florian.voelk@mytum.de 1165 Wave field synthesis utilizes
More informationConvention e-brief 433
Audio Engineering Society Convention e-brief 433 Presented at the 144 th Convention 2018 May 23 26, Milan, Italy This Engineering Brief was selected on the basis of a submitted synopsis. The author is
More informationValidation of lateral fraction results in room acoustic measurements
Validation of lateral fraction results in room acoustic measurements Daniel PROTHEROE 1 ; Christopher DAY 2 1, 2 Marshall Day Acoustics, New Zealand ABSTRACT The early lateral energy fraction (LF) is one
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Architectural Acoustics Session 1pAAa: Advanced Analysis of Room Acoustics:
More informationA binaural auditory model and applications to spatial sound evaluation
A binaural auditory model and applications to spatial sound evaluation Ma r k o Ta k a n e n 1, Ga ë ta n Lo r h o 2, a n d Mat t i Ka r ja l a i n e n 1 1 Helsinki University of Technology, Dept. of Signal
More informationAalborg Universitet. Audibility of time switching in dynamic binaural synthesis Hoffmann, Pablo Francisco F.; Møller, Henrik
Aalborg Universitet Audibility of time switching in dynamic binaural synthesis Hoffmann, Pablo Francisco F.; Møller, Henrik Published in: Journal of the Audio Engineering Society Publication date: 2005
More informationDistortion products and the perceived pitch of harmonic complex tones
Distortion products and the perceived pitch of harmonic complex tones D. Pressnitzer and R.D. Patterson Centre for the Neural Basis of Hearing, Dept. of Physiology, Downing street, Cambridge CB2 3EG, U.K.
More informationWAVELET-BASED SPECTRAL SMOOTHING FOR HEAD-RELATED TRANSFER FUNCTION FILTER DESIGN
WAVELET-BASE SPECTRAL SMOOTHING FOR HEA-RELATE TRANSFER FUNCTION FILTER ESIGN HUSEYIN HACIHABIBOGLU, BANU GUNEL, AN FIONN MURTAGH Sonic Arts Research Centre (SARC), Queen s University Belfast, Belfast,
More informationTDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones and Source Counting
TDE-ILD-HRTF-Based 2D Whole-Plane Sound Source Localization Using Only Two Microphones Source Counting Ali Pourmohammad, Member, IACSIT Seyed Mohammad Ahadi Abstract In outdoor cases, TDOA-based methods
More informationImproving room acoustics at low frequencies with multiple loudspeakers and time based room correction
Improving room acoustics at low frequencies with multiple loudspeakers and time based room correction S.B. Nielsen a and A. Celestinos b a Aalborg University, Fredrik Bajers Vej 7 B, 9220 Aalborg Ø, Denmark
More informationOn the Estimation of Interleaved Pulse Train Phases
3420 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 48, NO. 12, DECEMBER 2000 On the Estimation of Interleaved Pulse Train Phases Tanya L. Conroy and John B. Moore, Fellow, IEEE Abstract Some signals are
More informationProceedings of Meetings on Acoustics
Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Engineering Acoustics Session 2pEAb: Controlling Sound Quality 2pEAb10.
More informationURBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.
UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Human perception of sound and space ITD, IID,
More informationEvaluating HRTF Similarity through Subjective Assessments: Factors that can Affect Judgment
Evaluating HRTF Similarity through Subjective Assessments: Factors that can Affect Judgment Areti Andreopoulou Audio Acoustics Group, LIMSI - CNRS andreopoulou@limsi.fr Agnieszka Roginska Music and Audio
More informationCitation for published version (APA): Nutma, T. A. (2010). Kac-Moody Symmetries and Gauged Supergravity Groningen: s.n.
University of Groningen Kac-Moody Symmetries and Gauged Supergravity Nutma, Teake IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please
More informationCapturing 360 Audio Using an Equal Segment Microphone Array (ESMA)
H. Lee, Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA), J. Audio Eng. Soc., vol. 67, no. 1/2, pp. 13 26, (2019 January/February.). DOI: https://doi.org/10.17743/jaes.2018.0068 Capturing
More informationSpeech Compression. Application Scenarios
Speech Compression Application Scenarios Multimedia application Live conversation? Real-time network? Video telephony/conference Yes Yes Business conference with data sharing Yes Yes Distance learning
More informationBEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor
BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient
More informationENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE
BeBeC-2016-D11 ENHANCED PRECISION IN SOURCE LOCALIZATION BY USING 3D-INTENSITY ARRAY MODULE 1 Jung-Han Woo, In-Jee Jung, and Jeong-Guon Ih 1 Center for Noise and Vibration Control (NoViC), Department of
More informationBinaural Speaker Recognition for Humanoid Robots
Binaural Speaker Recognition for Humanoid Robots Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader Université Pierre et Marie Curie Institut des Systèmes Intelligents et de Robotique, CNRS UMR 7222
More informationSOPA version 3. SOPA project. July 22, Principle Introduction Direction of propagation Speed of propagation...
SOPA version 3 SOPA project July 22, 2015 Contents 1 Principle 2 1.1 Introduction............................ 2 1.2 Direction of propagation..................... 3 1.3 Speed of propagation.......................
More information