Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications

Size: px
Start display at page:

Download "Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications"

Transcription

1 Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications Mark Vinton 1, David McGrath 2, Charles Robinson 3, Phil Brown 4 1 Dolby Laboratories, Inc., USA, mvint@dolby.com 2 Dolby Laboratories, Inc., USA, dxm@dolby.com 3 Dolby Laboratories, Inc., USA, cqr@dolby.com 4 Dolby Laboratories, Inc., USA, cbrow@dolby.com Abstract This paper describes a new spatial audio algorithm that creates a channel-based, three-dimensional sound scene from two or more input channels. The algorithm was designed to decode matrix encoded programs (Lt/Rt). It is also an effective stereo upmixer; the signal relationships that guide the decoding algorithm (e.g. cross correlation) also provide appropriate cues to the intended spatial scene for standard, unencoded programs we decode the artist s intent. Input channel configurations with more than two channels are decomposed into channel pairs which are then processed independently. Improvements relative to existing surround decoding systems include improved selectivity and separation due to multi-band processing; increased listener envelopment through independent processing for direct and diffuse signal components and user adjustable decorrelation; and support for an arbitrary number of output channels at user specified locations including elevation. The system described has been recently deployed in consumer and professional products for home, mobile, and cinema applications. In this paper we give a detailed description of the signal processing, and provide results from a subjective listening test which indicates significant improvement relative to legacy systems. 1. Introduction With the widespread adoption of DVD, Blu-ray, digital broadcast, as well as internet based over the solutions, the utilization of multi-channel (greater than 2 channels) audio playback devices have become common place. Despite the advent of new, high-bandwidth, formats and content delivery mechanisms there still remains a large amount of legacy stereo and matrix-encoded material being delivered to consumers. For this reason, there is a natural desire to rerender stereo content such that the benefits of multi-channel reproduction can be attained. Furthermore, as Immersive Audio content delivery becomes more prevalent, consumers will install audio playback systems that go well beyond the 5.1 or 7.1 channels supported by DVD and current digital broadcast. Therefore, there will be a growing need to not only render stereo and matrix-encoded material to these new high speaker channel configurations, but also to render 5.1 and 7.1 channel material to yet higher output channel configurations. While surround matrix decoders have been common place for more than 30 years, the availably of increasingly more powerful and affordable signal processors have led to improvements in the matrix decoding algorithms. The use of frequency based techniques and new signal models have enabled matrix-decoders to have improved selectivity and sense of envelopment. This paper introduces a new matrix decoding system with the capability to render matrix-encoded and stereo material to an arbitrary number of speaker channels. Furthermore, the matrix decoder can be used to re-render 5.1 and 7.1 channel content to even higher channel formats. Key to the operation of this new matrix decoder is the separation of steered (direct) signals from diffuse signals in both time and frequency. The steered signal components can be accurately spatially rendered irrespective of the output speaker configuration. While the diffuse signal content is spread to improve the sense of envelopment. The paper is organized as follows: in section 2, conventional matrix encoding is presented and an extension for encoding arbitrary speaker locations is formulated. The new matrix decoding system that renders a 2 channel input to arbitrary output channels is detailed in section 3. Section 4 discusses the use of the core matrix decoder to convert multi-channel content (5.1 & 7.1 content) to even higher output speaker configurations. In section 5 subjective test results are presented that demonstrate the performance improvements achieved by the new matrix decoder. 2. Matrix Encoding for Arbitrary Horizontal Directions Two-channel, matrix encoded, signals have been used to encode 4-channel, 5-channel and 7-channel surround sound, according to the "rules" of Dolby Pro-Logic encoding/decoding. The goal of this section is to formulate equations that will define the encoding (and hence, also, decoding) rules for audio objects or speaker channels at arbitrary locations in a ring around the listener. Figure 1 shows the arrangement of a number of speaker channels on the matrix-coding circle. Early versions of Dolby Surround made use of the L, C, R and S channels, and these channels appear at 90 intervals around the matrixcoding circle.

2 Figure 1: The Matrix Coding Circle L Ls θ C S 5-channel Surround sound was accommodated by replacing the S channel with Ls and Rs channels, and these are also shown in Figure 1. A traditional matrix encoder was supplied with a number of input channels, each corresponding to a specific speaker. The two channel (LtRt) output, was formed by a linear matrix operation. For example, a 5-channel matrix encoder operates according to equation (1): LT = R T Rs j j where the imaginary unit, j, indicates a R L C j (1) R j Ls Rs 90 phase shift Encoding Gains For Arbitrary Angles The following functions (given by equations (2) through (4)) are defined for generating the panning gains for arbitrary speaker locations with respect to : f ( ) = G( ) (2) G = G L R ( ) ( ) cos e 2 4 = cos e 2 4 j ( ) j ( ) This function is not yet fully defined, because it includes an additional phase shift, ( ), applied to both channels. This phase shift is optional, but without it, the equations will contain an ambiguity. For example, if () = 0, this will mean that f ( ) is anti-periodic over an interval of 2. This means that f ( 2 ) = f ( ), so a speaker channel placed at = will not have it's panning function uniquely defined, since the same channel could have been assigned to the angle, =. The solution to this ambiguity problem is to define an appropriate phase shifting function to ensure that ( ) is also anti-periodic (since the product of two anti-periodic (3) (4) ( functions will be periodic). Therefore, ) is constructed so that it has the following properties: 0 when ( ) = j when (5) 4 3 j when 4 A function for ( ) is constructed by smoothly interpolating (with a cubic spline) in the regions and So, for all in the range [, ], equations (6) and (7) are used to calculate ( ). ( ) = max (0, min(1, 2 sin )) (6) 2 3 (3 2 ) ( ) = (3 2 ) 2 0 (7) < 0 The final panning functions are shown in Figure Azimuth, Figure 2: L T and R T gains, and additional phase added outside the region from -90 to Channel Warping Phase The panning function defined above is only valid if the speaker channels are defined in an unconventional manner. For example, by placing the L and R channels at 90 and 90, respectively. In practice, the assumed (physical) location of these speakers would be 30 and 30. In order to accommodate this convention, the physical speaker locations are warped. Assuming the speaker angle is s (in the range [, ]), (the encode angle) is computed as a piecewise linear function of s as shown in equation (8): L T R T

3 6 s 4 10 = 3 s 6 s 4 10 when s < 6 when s 6 6 when s 6 3. Matrix Decoding from 2 Channels to N Output Channels One of the primary limitations of legacy matrix decoding systems such as Dolby Pro Logic and Pro Logic 2 [1] is that they can only identify and pan a single sound source at any given time. To alleviate this limitation the proposed matrix decoder operates in the frequency domain, hence allowing for improved signal separation as each analysis band can separate and pan a sound source independently. Figure 3 shows a high level schematic of the new decoding system. The 2-channel input signal is first transformed into the frequency domain. The frequency domain representation is then grouped into 20 bands each having approximately 2- ERB spacing (equivalent rectangular bandwidth [2]). Estimates of the input signal statistics in each of the 20 bands are calculated and used to drive the matrix decoding to N arbitrary output channels. Finally the spectral domain representation of the output channels is converted back to the time domain with a frequency time transform. LT(n) RT(n) Time- Frequency Transform Time- Frequency Transform LT(m,k) RT(m,k) Statistical Estimation 2 to N channel upmix Z1(m,k) Z2(m,k) ZN(m,k) (8) Frequency- Time Transform Frequency- Time Transform Frequency- Time Transform Figure 3: A high level block diagram of the matrix decoding system showing the use of spectral domain processing for matrix decoding to arbitrary output channels. The operation of the matrix decoder in each of the 20 processing bands is depicted in figure 4. In the upper path, 3 estimates of the input signal statistics are calculated. The statistical estimates are then used to separate the input signal into the left and right steered signals (s L (m,k) and s R (m,k)) and the left and right diffuse signals (d L (m,k) and d R (m,k)). The steered signals are then panned to appropriate output channels while the diffuse signals are decorrelated and spread to all of the output channels. The statistical estimation, steered and diffuse separation, steered signal panning, and diffuse signal decorrelation and spreading are described in detail in the following sections. Z1(n) Z2(n) ZN(n) LT(m,k) RT(m,k) Statistical Estimation X(m,b) Y(m,b) T(m,b) Separate Direct and Diffuse Signals sl(m,k) sr(m,k) dl(m,k) dr(m,k) X(m,b) Y(m,b) T(m,b) Panning Decorrelation and Spreading Z1(m,k) ZN(m,k) Figure 4: A block diagram showing the operation of the matrix decoder in each of the 20 processing bands Estimating Input Signal Statistics As described in the previous section the matrix decoder estimates 3 signal statistics from the 2 channel input. Specifically, the decoder estimates the real part of the crosscorrelation between the input signals (X(m,b)), the difference between the powers of the input signals (Y(m,b)), and the sum of the powers of the input signals (T(m,b)). Each of the statistical estimates are accumulated over the processing band (b) and then further smoothed over time blocks (m) using a frequency dependent leaky integrator (first order IIR filter). As the band grouping approximates 2-ERB spacing and due to the linear spacing of the analysis time-frequency transform, there is more data available to make the statistical estimates at high frequencies relative to the lower frequencies. Therefore, the decay times of the leaky integrators at low frequencies are longer than those used at higher frequencies. Equations (9) through (11) show the estimation of X(m,b), Y(m,b), and T(m,b) respectively. ( ) (9) ( ) (10) ( ) (11) Where: m is the time block index, k is the frequency index, b is the processing band index, Bs(b) is the frequency index k of the start of processing band b, Be(b) is the frequency index k of the end of the processing band b, and α b is the frequency dependent smoothing coefficient for band b

4 3.2. Separation of the Steered and Diffuse Signals The core of the matrix decoding is the separation of the steered signal between the input signals in each band from any diffuse signals in each of the input signals. The separation is achieved by assuming a simple signal model (similar to the model proposed by Faller in [3]) that allows for a single steered signal between the inputs L T and R T with a diffuse signal in each input signal. The signal model is represented by equations (12) through (17) for input L T and R T respectively. For simplicity, the time, frequency, and complex signal notations have been omitted. (12) (13) From equation (12), L T is constructed from a gain G L multiplied by the steered signal s plus a diffuse signal d L.R T is similarly constructed as shown in equation (13). It is further assumed that the power of the steered signal is S 2 as shown in equation (14). The cross-correlation between s, d L, and d R are all zero as shown in equation (15), and power in the left diffuse signal (d L ) is equal to the power in the right diffuse signal (d R ) which, are equal to D 2 as shown in equation (16). With these assumptions, the covariance matrix between the input signals L T and R T is given by equation (17). { } (14) { } { } { } (15) { } { } (16) { } [ ] (17) In order to separate out the steered signals from L T and R T a 2x2 signal dependent separation matrix is calculated using the least squares method as shown in equation (18). The solution to the least squares equation is given by equation (19). The separated steered signal is therefore estimated by equation (20). [ {([ ] [ ]) }] (18) [ ] (19) [ ] [ ] (20) The derivation of the signal dependent separation matrix W for time block m in processing band b with respect to the signal statistic estimations described in section 3.1 is given by equation (21). [ ](21) The 3 measured signal statistics (X, Y, and T) with respect to the assumed signal model are given by equations (22) through (24). The result of substituting equations (22), (23), and (24) into equation (21) is biased estimate of the least squares solution given by equation (25). (22) (23) (24) [ ] (25) 3.3. Panning the Steered Signal to Arbitrary Output channels In section 3.2 the separation of the steered signal components from the 2 input signals was discussed. In order to generate the output channel signals, the steered content is panned to 2 output channels that bound the sound source. Figure 5 illustrates the location of the bounding output channels for a simple 5 channel output configuration (relative to the encode angle discussed in section 2.1). In the example shown in figure 5, the sound source is located between the left output channel and the left surround output channel. Bounding output channels θ 1 L θ (m,b) Figure 5: The bounding output channels for a steered signal located between the left and left surround output channels. Ls θ 2 To calculate the angle of the steered sound source the generalized encode equations discussed in section 2.1 are substituted into the signal model covariance matrix in equation (17) to give equation (26). From equation (26) it is clear that the angle of the steered signal is given by equation (27). [ { } ( ) C Rs R ( ) [ ] (26) (27) ]

5 The calculation of the steered signal angle with respect to the measured statistics from section 3.1 for time block m and processing band b is given by equation (28). ( ) (28) To construct the panned signals for the 2 bounding output channels, the separated left and right steered signals are multiplied by the inverse of the assumed encode matrix for the 2 bounding channels. From the example illustrated in figure 5, the left output channel has an encode angle θ 1 and the left surround channel has encode angle of θ 2. Hence, for this example the panned signals to the left and left surround output channels are given by equation (29). Note that the steered signal is the separation matrix W(m,b) multiplied by the 2 input signals L T and R T. [ ] ([ ( [ ( ) ( ) ( ) ]) ) ] (29) The steered signal for all the output channels (Z S (m,k)) is a zero vector with the panned signal from equation (29) inserted for the elements corresponding to the 2 bounding output channels as shown in equation (30). [ ] (30) Figure 6 shows the resulting panning coefficients for a circular pan to a standard 5 channel output configuration. While the examples in this section have been for a standard 5 channel output the solution is easily applied to any arbitrary output channel configuration. Gain C L Ls Rs R Angle (degrees) 3.4. Decorrelating and Spreading the Diffuse Signals to the Output Channels The last component of the matrix decoder is to separate out the diffuse signals and spread them to each of the output channels. The derivation of the left and right diffuse signals is simply the left and right total signals minus the left and right steered signals as shown in equation (31). [ ] [ ] [ ] (31) To improve the diversity of the diffuse content that will be spread to all of the output channels, 3 additional diffuse signals are created as functions of the left and right diffuse signals, derived in equation (31), such that there a total of 5 diffuse channels. The functions are designed to (ideally) ensure that there is zero correlation between any of the diffuse signals. The creation of the 5 intermediate diffuse signals is shown in equation (32). ( ) (32) ( ) [ ( )] As shown in equation (32), the functions of d L and d R are sums of frequency dependent delayed versions of d L and d R. The frequency dependent delays are chosen such that low frequencies are delayed more than high frequencies. ( ) (33) Finally, the diffuse signals are spread to all of the output channels to create the output diffuse signal (Z d (m,k)). To achieve this, the 5 element column vector from equation (33) is multiplied by N by 5 matrix (where N is number of output channels) as shown in equation (34). The spreading matrix O from equation (34) is designed such that each row is a unit vector. [ ] ( ) (34) ( ) [ ( )] The diffuse signal is applied to all output channels; this includes channels that are not on the horizontal plane (e.g. surround channels). From figure 2 in section 2, the steered signals (Z s (m,k)) from equation (30) and the diffuse signals (Z d (m,k)) from equation (34) are added together as shown in equation (35) prior converting the matrix decoded signal back to the time domain. (35) Figure 6: The resulting gains applied to a standard 5 channel output configuration for a circular pan.

6 4. Upmixing and Extension to Multichannel Input 4.1. Upmixing Stereo Signals The methods described in section 3 for separating and processing the steered and diffuse content of matrix encoded content work very well for standard stereo mixes. Audio elements panned between the left and right channels can be identified, extracted, and re-panned to the loudspeakers available in the playback environment. Extraction of a center signal that is then played out through a center loudspeaker is an important example. Likewise, extraction of the diffuse signal component and the creation of multiple uncorrelated loudspeaker signals for playback generates a strong sense of envelopment from a stereo program The process of creating a higher channel count program from non-encoded content is commonly referred to as upmixing. Previous matrix decoders have demonstrated efficacy operating in an upmixing mode. Many of the features introduced in the algorithm to improve matrix decoding multi-band processing, arbitrary output channel count and location, and added diversity for the diffuse signal component also add benefit when operating in an upmixing mode Extention to Multi-channel Input A generalized M input channel to N output channel (M-to- N) upmixing system can be constructed by employing 1 or more instantiations of the 2-to-N upmixer. The basic premise is to use the 2-to-N upmixer as a building block to expand the existing surround channels to additional rear channels as well as expand the front and surround channels to surround channels ( surround channels provide audio signals from above the listener). Up to this point in the document, the input to the 2-to-N upmixer/matrix decoder has been assumed to be matrix encoded L T /R T or stereo signals. If instead, the input is the surround channels from a 5.1 mix, side and rear surround channels can be created by simply specifying the correct output configuration that mimics the positions of the side surround and rear surround channels. In this example, the upmixer is fed Ls/Rs from 5.1 and configured with 4 output speaker channels; 2 at ±10 and 2 at ±30 (referring back to section 2, 0 is center front). The output of the upmixer is the side surround channels (the outer pair) as well as the rear surround channels (the inner pair). Furthermore, the 2-to-N upmixer can be used to create surround channels from a legacy 5.1 mix. While a legacy 5.1 mix does not intrinsically carry height information, the upmixer can be used to extract ambient or diffuse information from the input signals and steer it toward the surround channels. The result of spreading ambient content to surround channels greatly improves the sense of envelopment. In the case of 2 surround channels, the diffuse signal from both the left and right channels as well as the diffuse signal from left and right surround channels can be combined. In the case of four surround channels, the diffuse signals can be combined or kept separate. If the diffuse signals from the front and back are kept separate then the front receives only left and right ambience and the rear only receives the left and right surround ambience. Figures 7-13 show various M-to-N upmix scenarios. Table 1 shows the channel notations used in the figures. As an example, figure 7 shows an upmix from a traditional 5.1 mix to essentially adding two height channels. The 2-to- 4 block is the 2-to-N matrix decoder configured to output two front channels and two channels. The front channels are simply the traditional left and right channels at ±30 and the channels are mirrors of these on the. The height information is derived from the diffuse content as described in sections , where no steered content is delivered to the speakers. In the case of figure 7, this 2- to-4 block is used for both the front and surround speakers, and the height channels are pairwise summed. As another example, figure 9 shows an upmix from a traditional 5.1 mix to 7.1 essentially adding two back surround channels. The back block is the 2-to-N matrix decoder configured to output two inner and two outer front channels. The outer front channels are configured as the traditional left and right channels at ±30 (which become side surround channels in our upmixer) and the inner front channels are placed at ±10 and become the back surround channels in our upmixer. Notation L R C LFE Ls Rs Lss Rss Lrs Rrs Lts Rts Ltf Rtf Ltr Rtr Description Left Right Center Low Frequency Effects Left Surround Right Surround Left Side Surround Right Side Surround Left Rear Surround Right Rear Surround Left Top Surround Right Top Surround Left Top Front Right Top Front Left Top Rear Right Top Rear Table 1: Channel Notation used in the Figures

7 Lsin Rsin Lsout Rsout Ltsout Rtsout Lsin Rsin 2-to-6 back & Lssout Rssout Ltsout Rtsout Figure 7: A 5.1 to upmix scenario where the Top Surround channels are derived from the Left, Right and Surround channels. Lrsout Rrsout Ltfout Rtfout Figure 10: A 5.1 to upmix scenario where the Top Surround channels are derived from the Left, Right and Surround channels; and where the Side Surround and Rear Surround channels are derived from the Surround channels. Ltfout Rtfout Lsin Rsin Lsout Rsout Ltrout Rtrout Figure 8: A 5.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Surround channels. Lsin Rsin 2-to-6 back & Lssout Rssout Ltrout Rtrout Lrsout Rrsout Lsin Rsin back Lssout Rssout Lrsout Rrsout Figure 11: A 5.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Surround channels; and where the Side Surround and Rear Surround channels are derived from the Surround channels. Figure 9: A 5.1 to 7.1 upmix scenario where the Side Surround and Rear Surround channels are derived from the Surround channels.

8 Lssin Rssin Lrsin Rrsin Lssout Rssout Lrsout Rrsout Ltsout Rtsout Figure 12: A 7.1 to upmix scenario where the Top Surround channels are derived from the Left, Right, Side Surround and Rear Surround channels. Lssin Rssin Lrsin Rrsin Ltfout Rtfout Lssout Rssout Lrsout Rrsout Ltrout Rtrout Figure 13: A 7.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Side Surround and Rear Surround channels. 5. Validation Subjective Testing Subjective testing was conducted to determine if the new algorithm provides an improvement over legacy systems. In particular, we wished to verify Dolby Surround was better than Dolby s existing surround encoding system, Dolby Pro Logic II. The algorithms were tested operating in decode mode for matrix encoded 5.0 surround sound programs Test Design The test was designed to measure impairments to a surround audio program as a result of spatial coding (matrix encode/decode). The audio under test is easily distinguished from the source, so a MUSHRA-like testing methodology [4] was selected. The reference signal is the original multichannel program. The original was also included among the items under test as a hidden reference. Dolby Pro Logic (I) was used as an anchor signal. This is a departure from a true MUSHRA test which calls for band limited anchors (3.5 khz and optionally 7 khz band limited signals). For spatial audio coding, bandwidth limitation is not a typical artifact, and therefore not an appropriate anchor Systems under Test Table 2 lists the systems included in the subjective test along with the purpose for including the systems in the test. System Identifier Purpose Dolby Pro Logic PL1 Anchor Dolby Pro Logic II PL2 System Under Test Dolby Surround DS System Under Test Source 5.0 signal Hidden Ref Hidden Reference Table 2: Coding systems compared in the subjective test Test Items Eight items were selected for the subjective test. The items were selected from a large set of surround content. Candidate items were evaluated and selected according to the following objectives: Relevance: signals that are likely to be applied to spatial coding (i.e. real-world programs) Diversity of content: Signals from each of three categories: TV/Movies, Sports, and Music Range of performance: Signals that are good and bad cases for the Systems Under Test In addition, it was decided to include one worst-case test signal, applause, which has been commonly used in testing in the past and is therefore a good touchstone for performance. All items are 5.0. LFE signals, if present in the original program, were deleted. All files were level balanced and time aligned after encoding and coder order presentation was varied to decrease presentation order bias. Listeners were allowed to adjust playback volume Test Execution Ten listeners were briefly familiarized with the systems under test as well as the software interface used to perform the test. All of the listeners used were experienced with this type of testing; training was brief. The test was performed in purpose-built listening room equipped for 5 channel playback using Event ASP8 studio monitors, the specific acoustic properties of the listening room are detailed in

9 table 3. The test was built and run using a custom subjective testing software GUI. Parameter Measured Values Room dimensions 30.6 m 2 area l = 6.60 m, w =4.64 m, h = 2.60 m Reverberation time Hz Hz Hz Hz Hz Hz Hz Hz Speaker configuration R,L ± 30 C 0 Rs,Ls ± Analysis Table 3: Measured room parameters Listener Post-Screening All the listeners scores were included in the analysis. None of the listener s results deviate by greater than 1.5 IQR from the middle 50% of listeners for any of the items. Three listeners each had a single misdetect of the hidden reference. Removing these listeners does not significantly change the mean results Statistical Analysis The mean score with 95% confidence intervals are computed independently for each system across all listeners and test items. Figure 14 shows the results for each stimulus. The overall result across all stimuli is shown above the label Total. The trend from PL1 to PL2 to DS shows increasing mean scores. The confidence intervals do not overlap between PL1 and PL2. This is a strong indication that PL1 and PL2 are statistically distinct. The confidence intervals do overlap between PL2 and DS. To better resolve the differences between the systems, we turn to ANOVA method [5]. ANOVA is a more powerful method for determining statistical differences. The increased power is achieved by using all the data to compute a single variance for all the systems under test. An underlying assumption of ANOVA is that the systems do in fact have equal variance. Visual inspection of figure 14 shows that PL1 has noticeably lower variance. Application of the F Max Test [6] and the Bartlett Test [7] for uniformity of variance both fail, confirm our observation. Thus ANOVA is not appropriate for this data set. Figure 14: Mean and 95% confidence interval for all listeners and all 8 items. As stated earlier, the purpose of this is to verify that Dolby Surround (DS), the new surround decoder, performs better than Dolby Pro Logic 2 (PL2). As such, as simple paired t- test can be applied. This is a relatively powerful method for comparing two data sets that have paired measurements; in this test each measurement of PL2 is replicated on DS under similar conditions (same listener, same test item). Comparing PL2 and DS in this way indicates that the two systems are statistically different. Results of this analysis are in table 4 below. The p-value of indicates high statistical significance. Parameter/system DS PL2 Mean Variance Observations t Stat P(T<=t) two-tail t Critical two-tail (95% conf.) Table 4: PL2 versus DS t-test results Discussion of Test Results DS is significantly better than PL2, statistically. How might this translate to a practical difference for listeners? Looking at the means of the test scores, PL1: 56. PL2: 65. DS 72. we see that PL2 scored 9 points better than PL1 and DS is 7 points better than PL2. Thus (for Lt/Rt decoding), DS is almost as large of an improvement over PL2 as PL2 was over PL1. PL2 is widely considered a significant improvement over PL1, and has been widely adopted. This test indicates DS would likely be well received in the market place. Additional insight into DS performance can be gained by looking at figure 15. Briefly, DS shows the highest scores, and the most improvement over PL2 on a4v content ( Audio for Video =

10 TV and Movies), which is by far the most important use case. DS shows little / no improvement on music items in this test Lt/Rt encoded 5.0 music. This use case is virtually non-existent. Lastly, on Applause ( Test ) DS scored between PL2 and PL1. There is not a statistical difference, but (based on listener verbal feedback) the systems each have distinct artifacts and listener scores vary based on individual sensitivity to the impairment type (spaciousness, timbre change, etc). [5] Mann, H. B. (1949). Analysis and Design of Experiments: Analysis of Variance and Analysis of Variance Designs. New York, N. Y.: Dover Publications, Inc. [6] Hartley, H.O. (1950). The Use of Range in Analysis of Variance. Biometrika, 37, [7] Bartlett, M. S. (1937). "Properties of sufficiency and statistical tests". Proceedings of the Royal Statistical Society, Series A160, Figure 15: Mean and 95% confidence interval for all listeners, shown by content type. 6. Conclusion Conventional matrix encoding was examined and a generalized encoding equation was formulated for arbitrary speaker locations. A new matrix-decoder was discussed that separates steered and diffuse signals in both time and frequency. The separation of steered and diffuse signals allows for accurate panning of the steered signals to an arbitrary output channel configuration, while the diffuse signal can be spread to all output channels for increased envelopment. The use of multiple instantiations of the core matrix decoder was discussed as a way to re-render multichannel content to even high channel configurations. Subjective testing procedures and results were presented that show the matrix decoder is statistically better than previous matrix decoders. 7. References [1] K. Gundry, A New Matrix Decoder for Surround Sound, AES 19 th International Conf., Schloss Elmau, Germany, [2] B. C. J. Moore and B. R. Glasberg, Suggested Formulae for Calculating Audatory-Filter Bandwidths and Excitation Patterns Journal of Accoustic Society of America74: , [3] C. Faller, Matrix Surround Revisited, AES 30 th International Conf., Saariseka, Finland, [4] ITU-R Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems. ITU-R Recommendation BS Geneva, 2003.

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting Rec. ITU-R BS.1548-1 1 RECOMMENDATION ITU-R BS.1548-1 User requirements for audio coding systems for digital broadcasting (Question ITU-R 19/6) (2001-2002) The ITU Radiocommunication Assembly, considering

More information

Multichannel Audio In Cars (Tim Nind)

Multichannel Audio In Cars (Tim Nind) Multichannel Audio In Cars (Tim Nind) Presented by Wolfgang Zieglmeier Tonmeister Symposium 2005 Page 1 Reproducing Source Position and Space SOURCE SOUND Direct sound heard first - note different time

More information

A spatial squeezing approach to ambisonic audio compression

A spatial squeezing approach to ambisonic audio compression University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2008 A spatial squeezing approach to ambisonic audio compression Bin Cheng

More information

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA

Surround: The Current Technological Situation. David Griesinger Lexicon 3 Oak Park Bedford, MA Surround: The Current Technological Situation David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 www.world.std.com/~griesngr There are many open questions 1. What is surround sound 2. Who will listen

More information

A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service

A Study on Complexity Reduction of Binaural. Decoding in Multi-channel Audio Coding for. Realistic Audio Service Contemporary Engineering Sciences, Vol. 9, 2016, no. 1, 11-19 IKARI Ltd, www.m-hiari.com http://dx.doi.org/10.12988/ces.2016.512315 A Study on Complexity Reduction of Binaural Decoding in Multi-channel

More information

Sound source localization and its use in multimedia applications

Sound source localization and its use in multimedia applications Notes for lecture/ Zack Settel, McGill University Sound source localization and its use in multimedia applications Introduction With the arrival of real-time binaural or "3D" digital audio processing,

More information

Introduction. 1.1 Surround sound

Introduction. 1.1 Surround sound Introduction 1 This chapter introduces the project. First a brief description of surround sound is presented. A problem statement is defined which leads to the goal of the project. Finally the scope of

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS 20-21 September 2018, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2018) 20-21 September 2018, Bulgaria INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR

More information

Measuring impulse responses containing complete spatial information ABSTRACT

Measuring impulse responses containing complete spatial information ABSTRACT Measuring impulse responses containing complete spatial information Angelo Farina, Paolo Martignon, Andrea Capra, Simone Fontana University of Parma, Industrial Eng. Dept., via delle Scienze 181/A, 43100

More information

The analysis of multi-channel sound reproduction algorithms using HRTF data

The analysis of multi-channel sound reproduction algorithms using HRTF data The analysis of multichannel sound reproduction algorithms using HRTF data B. Wiggins, I. PatersonStephens, P. Schillebeeckx Processing Applications Research Group University of Derby Derby, United Kingdom

More information

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat Audio Transmission Technology for Multi-point Mobile Voice Chat Voice Chat Multi-channel Coding Binaural Signal Processing Audio Transmission Technology for Multi-point Mobile Voice Chat We have developed

More information

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis

Virtual Sound Source Positioning and Mixing in 5.1 Implementation on the Real-Time System Genesis Virtual Sound Source Positioning and Mixing in 5 Implementation on the Real-Time System Genesis Jean-Marie Pernaux () Patrick Boussard () Jean-Marc Jot (3) () and () Steria/Digilog SA, Aix-en-Provence

More information

Analysis of Frontal Localization in Double Layered Loudspeaker Array System

Analysis of Frontal Localization in Double Layered Loudspeaker Array System Proceedings of 20th International Congress on Acoustics, ICA 2010 23 27 August 2010, Sydney, Australia Analysis of Frontal Localization in Double Layered Loudspeaker Array System Hyunjoo Chung (1), Sang

More information

Immersive Audio Technology Available to Planetariums. Part I A paper pp presented at: II International Festival of Planetariums

Immersive Audio Technology Available to Planetariums. Part I A paper pp presented at: II International Festival of Planetariums Immersive Audio Technology Available to Planetariums. Part I A paper pp presented at: II International Festival of Planetariums By: Jeff Bowen, Bowen Technovation. Fellow-IPS, Fellow-GLPA Member of the

More information

Convention Paper 8831

Convention Paper 8831 Audio Engineering Society Convention Paper 883 Presented at the 34th Convention 3 May 4 7 Rome, Italy This Convention paper was selected based on a submitted abstract and 75-word precis that have been

More information

B360 Ambisonics Encoder. User Guide

B360 Ambisonics Encoder. User Guide B360 Ambisonics Encoder User Guide Waves B360 Ambisonics Encoder User Guide Welcome... 3 Chapter 1 Introduction.... 3 What is Ambisonics?... 4 Chapter 2 Getting Started... 5 Chapter 3 Components... 7 Ambisonics

More information

Psychoacoustic Cues in Room Size Perception

Psychoacoustic Cues in Room Size Perception Audio Engineering Society Convention Paper Presented at the 116th Convention 2004 May 8 11 Berlin, Germany 6084 This convention paper has been reproduced from the author s advance manuscript, without editing,

More information

Chapter 4 SPEECH ENHANCEMENT

Chapter 4 SPEECH ENHANCEMENT 44 Chapter 4 SPEECH ENHANCEMENT 4.1 INTRODUCTION: Enhancement is defined as improvement in the value or Quality of something. Speech enhancement is defined as the improvement in intelligibility and/or

More information

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques:

Multichannel Audio Technologies. More on Surround Sound Microphone Techniques: Multichannel Audio Technologies More on Surround Sound Microphone Techniques: In the last lecture we focused on recording for accurate stereophonic imaging using the LCR channels. Today, we look at the

More information

Multi-Loudspeaker Reproduction: Surround Sound

Multi-Loudspeaker Reproduction: Surround Sound Multi-Loudspeaker Reproduction: urround ound Understanding Dialog? tereo film L R No Delay causes echolike disturbance Yes Experience with stereo sound for film revealed that the intelligibility of dialog

More information

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING

DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING DESIGN OF ROOMS FOR MULTICHANNEL AUDIO MONITORING A.VARLA, A. MÄKIVIRTA, I. MARTIKAINEN, M. PILCHNER 1, R. SCHOUSTAL 1, C. ANET Genelec OY, Finland genelec@genelec.com 1 Pilchner Schoustal Inc, Canada

More information

Convention Paper 9740 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany

Convention Paper 9740 Presented at the 142 nd Convention 2017 May 20 23, Berlin, Germany Audio Engineering Society onvention Paper 9740 Presented at the 142 nd onvention 2017 May 20 23, Berlin, Germany This convention paper was selected based on a submitted abstract and 750-word precis that

More information

MULTICHANNEL REPRODUCTION OF LOW FREQUENCIES. Toni Hirvonen, Miikka Tikander, and Ville Pulkki

MULTICHANNEL REPRODUCTION OF LOW FREQUENCIES. Toni Hirvonen, Miikka Tikander, and Ville Pulkki MULTICHANNEL REPRODUCTION OF LOW FREQUENCIES Toni Hirvonen, Miikka Tikander, and Ville Pulkki Helsinki University of Technology Laboratory of Acoustics and Audio Signal Processing P.O. box 3, FIN-215 HUT,

More information

The Why and How of With-Height Surround Sound

The Why and How of With-Height Surround Sound The Why and How of With-Height Surround Sound Jörn Nettingsmeier freelance audio engineer Essen, Germany 1 Your next 45 minutes on the graveyard shift this lovely Saturday

More information

MPEG-4 Structured Audio Systems

MPEG-4 Structured Audio Systems MPEG-4 Structured Audio Systems Mihir Anandpara The University of Texas at Austin anandpar@ece.utexas.edu 1 Abstract The MPEG-4 standard has been proposed to provide high quality audio and video content

More information

EBU UER. european broadcasting union. Listening conditions for the assessment of sound programme material. Supplement 1.

EBU UER. european broadcasting union. Listening conditions for the assessment of sound programme material. Supplement 1. EBU Tech 3276-E Listening conditions for the assessment of sound programme material Revised May 2004 Multichannel sound EBU UER european broadcasting union Geneva EBU - Listening conditions for the assessment

More information

(temporary help file!)

(temporary help file!) a 2D spatializer for mono and stereo sources (temporary help file!) March 2007 1 Global view Cinetic section : analyzes the frequency and the amplitude of the left and right audio inputs. The resulting

More information

Rapid Formation of Robust Auditory Memories: Insights from Noise

Rapid Formation of Robust Auditory Memories: Insights from Noise Neuron, Volume 66 Supplemental Information Rapid Formation of Robust Auditory Memories: Insights from Noise Trevor R. Agus, Simon J. Thorpe, and Daniel Pressnitzer Figure S1. Effect of training and Supplemental

More information

Multichannel level alignment, part I: Signals and methods

Multichannel level alignment, part I: Signals and methods Suokuisma, Zacharov & Bech AES 5th Convention - San Francisco Multichannel level alignment, part I: Signals and methods Pekka Suokuisma Nokia Research Center, Speech and Audio Systems Laboratory, Tampere,

More information

Is My Decoder Ambisonic?

Is My Decoder Ambisonic? Is My Decoder Ambisonic? Aaron J. Heller SRI International, Menlo Park, CA, US Richard Lee Pandit Litoral, Cooktown, QLD, AU Eric M. Benjamin Dolby Labs, San Francisco, CA, US 125 th AES Convention, San

More information

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation

The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation Downloaded from orbit.dtu.dk on: Feb 05, 2018 The relation between perceived apparent source width and interaural cross-correlation in sound reproduction spaces with low reverberation Käsbach, Johannes;

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Lee, Hyunkook Capturing and Rendering 360º VR Audio Using Cardioid Microphones Original Citation Lee, Hyunkook (2016) Capturing and Rendering 360º VR Audio Using Cardioid

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011

396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011 396 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 2, FEBRUARY 2011 Obtaining Binaural Room Impulse Responses From B-Format Impulse Responses Using Frequency-Dependent Coherence

More information

Multichannel Audio Technologies: Lecture 3.A. Mixing in 5.1 Surround Sound. Setup

Multichannel Audio Technologies: Lecture 3.A. Mixing in 5.1 Surround Sound. Setup Multichannel Audio Technologies: Lecture 3.A Mixing in 5.1 Surround Sound Setup Given that most people pay scant regard to the positioning of stereo speakers in a domestic environment, it s likely that

More information

Speech Compression. Application Scenarios

Speech Compression. Application Scenarios Speech Compression Application Scenarios Multimedia application Live conversation? Real-time network? Video telephony/conference Yes Yes Business conference with data sharing Yes Yes Distance learning

More information

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis

Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis Predicting localization accuracy for stereophonic downmixes in Wave Field Synthesis Hagen Wierstorf Assessment of IP-based Applications, T-Labs, Technische Universität Berlin, Berlin, Germany. Sascha Spors

More information

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4 SOPA version 2 Revised July 7 2014 SOPA project September 21, 2014 Contents 1 Introduction 2 2 Basic concept 3 3 Capturing spatial audio 4 4 Sphere around your head 5 5 Reproduction 7 5.1 Binaural reproduction......................

More information

Haptic control in a virtual environment

Haptic control in a virtual environment Haptic control in a virtual environment Gerard de Ruig (0555781) Lourens Visscher (0554498) Lydia van Well (0566644) September 10, 2010 Introduction With modern technological advancements it is entirely

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using

More information

APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS

APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS Philips J. Res. 39, 94-102, 1984 R 1084 APPLICATIONS OF A DIGITAL AUDIO-SIGNAL PROCESSOR IN T.V. SETS by W. J. W. KITZEN and P. M. BOERS Philips Research Laboratories, 5600 JA Eindhoven, The Netherlands

More information

Spatial audio is a field that

Spatial audio is a field that [applications CORNER] Ville Pulkki and Matti Karjalainen Multichannel Audio Rendering Using Amplitude Panning Spatial audio is a field that investigates techniques to reproduce spatial attributes of sound

More information

Class Overview. tracking mixing mastering encoding. Figure 1: Audio Production Process

Class Overview. tracking mixing mastering encoding. Figure 1: Audio Production Process MUS424: Signal Processing Techniques for Digital Audio Effects Handout #2 Jonathan Abel, David Berners April 3, 2017 Class Overview Introduction There are typically four steps in producing a CD or movie

More information

Convention Paper Presented at the 128th Convention 2010 May London, UK

Convention Paper Presented at the 128th Convention 2010 May London, UK Audio Engineering Society Convention Paper Presented at the 128th Convention 21 May 22 25 London, UK 879 The papers at this Convention have been selected on the basis of a submitted abstract and extended

More information

Image Enhancement in Spatial Domain

Image Enhancement in Spatial Domain Image Enhancement in Spatial Domain 2 Image enhancement is a process, rather a preprocessing step, through which an original image is made suitable for a specific application. The application scenarios

More information

Subband Analysis of Time Delay Estimation in STFT Domain

Subband Analysis of Time Delay Estimation in STFT Domain PAGE 211 Subband Analysis of Time Delay Estimation in STFT Domain S. Wang, D. Sen and W. Lu School of Electrical Engineering & Telecommunications University of ew South Wales, Sydney, Australia sh.wang@student.unsw.edu.au,

More information

ETSI TS V ( )

ETSI TS V ( ) TECHNICAL SPECIFICATION 5G; Subjective test methodologies for the evaluation of immersive audio systems () 1 Reference DTS/TSGS-0426259vf00 Keywords 5G 650 Route des Lucioles F-06921 Sophia Antipolis Cedex

More information

PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS

PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS PRIMARY-AMBIENT SOURCE SEPARATION FOR UPMIXING TO SURROUND SOUND SYSTEMS Karim M. Ibrahim National University of Singapore karim.ibrahim@comp.nus.edu.sg Mahmoud Allam Nile University mallam@nu.edu.eg ABSTRACT

More information

Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array

Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical 2D Loudspeaker Array Journal of the Audio Engineering Society Vol. 64, No. 12, December 2016 DOI: https://doi.org/10.17743/jaes.2016.0052 Perceptual Band Allocation (PBA) for the Rendering of Vertical Image Spread with a Vertical

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER /$ IEEE

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER /$ IEEE IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 8, NOVEMBER 2009 1483 A Multichannel Sinusoidal Model Applied to Spot Microphone Signals for Immersive Audio Christos Tzagkarakis,

More information

Investigation on the Quality of 3D Sound Reproduction

Investigation on the Quality of 3D Sound Reproduction Investigation on the Quality of 3D Sound Reproduction A. Silzle 1, S. George 1, E.A.P. Habets 1, T. Bachmann 1 1 Fraunhofer Institute for Integrated Circuits IIS, Erlangen, Germany, Email: andreas.silzle@iis.fraunhofer.de

More information

Airo Interantional Research Journal September, 2013 Volume II, ISSN:

Airo Interantional Research Journal September, 2013 Volume II, ISSN: Airo Interantional Research Journal September, 2013 Volume II, ISSN: 2320-3714 Name of author- Navin Kumar Research scholar Department of Electronics BR Ambedkar Bihar University Muzaffarpur ABSTRACT Direction

More information

Enhancing 3D Audio Using Blind Bandwidth Extension

Enhancing 3D Audio Using Blind Bandwidth Extension Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,

More information

14 fasttest. Multitone Audio Analyzer. Multitone and Synchronous FFT Concepts

14 fasttest. Multitone Audio Analyzer. Multitone and Synchronous FFT Concepts Multitone Audio Analyzer The Multitone Audio Analyzer (FASTTEST.AZ2) is an FFT-based analysis program furnished with System Two for use with both analog and digital audio signals. Multitone and Synchronous

More information

RECOMMENDATION ITU-R BR.1384 *, ** Parameters for international exchange of multi-channel sound recordings ***

RECOMMENDATION ITU-R BR.1384 *, ** Parameters for international exchange of multi-channel sound recordings *** Rec. ITU-R BR.1384 1 RECOMMENDATION ITU-R BR.1384 *, ** Parameters for international exchange of multi-channel sound recordings *** (Question ITU-R 215/10) (1998) The ITU Radiocommunication Assembly, considering

More information

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois.

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. 3D and Virtual Sound. Paris Smaragdis. paris.cs.illinois. UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab 3D and Virtual Sound Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Overview Human perception of sound and space ITD, IID,

More information

Laboratory Assignment 4. Fourier Sound Synthesis

Laboratory Assignment 4. Fourier Sound Synthesis Laboratory Assignment 4 Fourier Sound Synthesis PURPOSE This lab investigates how to use a computer to evaluate the Fourier series for periodic signals and to synthesize audio signals from Fourier series

More information

THE PAST ten years have seen the extension of multichannel

THE PAST ten years have seen the extension of multichannel 1994 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 6, NOVEMBER 2006 Feature Extraction for the Prediction of Multichannel Spatial Audio Fidelity Sunish George, Student Member,

More information

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION

ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS ABSTRACT INTRODUCTION ROOM IMPULSE RESPONSES AS TEMPORAL AND SPATIAL FILTERS Angelo Farina University of Parma Industrial Engineering Dept., Parco Area delle Scienze 181/A, 43100 Parma, ITALY E-mail: farina@unipr.it ABSTRACT

More information

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008 R E S E A R C H R E P O R T I D I A P Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain Sriram Ganapathy a b Petr Motlicek a Hynek Hermansky a b Harinath

More information

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA)

Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA) H. Lee, Capturing 360 Audio Using an Equal Segment Microphone Array (ESMA), J. Audio Eng. Soc., vol. 67, no. 1/2, pp. 13 26, (2019 January/February.). DOI: https://doi.org/10.17743/jaes.2018.0068 Capturing

More information

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B.

Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya 2, B. Yamuna 2, H. Divya 2, B. Shiva Kumar 2, B. www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 4 April 2015, Page No. 11143-11147 Speech Enhancement Using Beamforming Dr. G. Ramesh Babu 1, D. Lavanya

More information

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking

Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic Masking The 7th International Conference on Signal Processing Applications & Technology, Boston MA, pp. 476-480, 7-10 October 1996. Encoding a Hidden Digital Signature onto an Audio Signal Using Psychoacoustic

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

A study on sound source apparent shape and wideness

A study on sound source apparent shape and wideness University of Wollongong Research Online aculty of Informatics - Papers (Archive) aculty of Engineering and Information Sciences 2003 A study on sound source apparent shape and wideness Guillaume Potard

More information

Auditory Localization

Auditory Localization Auditory Localization CMPT 468: Sound Localization Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University November 15, 2013 Auditory locatlization is the human perception

More information

Speech quality for mobile phones: What is achievable with today s technology?

Speech quality for mobile phones: What is achievable with today s technology? Speech quality for mobile phones: What is achievable with today s technology? Frank Kettler, H.W. Gierlich, S. Poschen, S. Dyrbusch HEAD acoustics GmbH, Ebertstr. 3a, D-513 Herzogenrath Frank.Kettler@head-acoustics.de

More information

Sound Processing Technologies for Realistic Sensations in Teleworking

Sound Processing Technologies for Realistic Sensations in Teleworking Sound Processing Technologies for Realistic Sensations in Teleworking Takashi Yazu Makoto Morito In an office environment we usually acquire a large amount of information without any particular effort

More information

A Bi-level Block Coding Technique for Encoding Data Sequences with Sparse Distribution

A Bi-level Block Coding Technique for Encoding Data Sequences with Sparse Distribution Paper 85, ENT 2 A Bi-level Block Coding Technique for Encoding Data Sequences with Sparse Distribution Li Tan Department of Electrical and Computer Engineering Technology Purdue University North Central,

More information

Multichannel level alignment, part III: The effects of loudspeaker directivity and reproduction bandwidth

Multichannel level alignment, part III: The effects of loudspeaker directivity and reproduction bandwidth Multichannel level alignment, part III: The effects of loudspeaker directivity and reproduction bandwidth Søren Bech 1 Bang and Olufsen, Struer, Denmark sbe@bang-olufsen.dk Nick Zacharov Nokia Research

More information

Parameters for international exchange of multi-channel sound recordings with or without accompanying picture

Parameters for international exchange of multi-channel sound recordings with or without accompanying picture Recommendation ITU-R BR.1384-2 (03/2011) Parameters for international exchange of multi-channel sound recordings with or without accompanying picture BR Series Recording for production, archival and play-out;

More information

Methods for Assessor Screening

Methods for Assessor Screening Report ITU-R BS.2300-0 (04/2014) Methods for Assessor Screening BS Series Broadcasting service (sound) ii Rep. ITU-R BS.2300-0 Foreword The role of the Radiocommunication Sector is to ensure the rational,

More information

Analysis of room transfer function and reverberant signal statistics

Analysis of room transfer function and reverberant signal statistics Analysis of room transfer function and reverberant signal statistics E. Georganti a, J. Mourjopoulos b and F. Jacobsen a a Acoustic Technology Department, Technical University of Denmark, Ørsted Plads,

More information

DIALOGUE ENHANCEMENT OF STEREO SOUND. Huawei European Research Center, Munich, Germany

DIALOGUE ENHANCEMENT OF STEREO SOUND. Huawei European Research Center, Munich, Germany DIALOGUE ENHANCEMENT OF STEREO SOUND Jürgen T. Geiger, Peter Grosche, Yesenia Lacouture Parodi juergen.geiger@huawei.com Huawei European Research Center, Munich, Germany ABSTRACT Studies show that many

More information

Midterm Examination CS 534: Computational Photography

Midterm Examination CS 534: Computational Photography Midterm Examination CS 534: Computational Photography November 3, 2015 NAME: SOLUTIONS Problem Score Max Score 1 8 2 8 3 9 4 4 5 3 6 4 7 6 8 13 9 7 10 4 11 7 12 10 13 9 14 8 Total 100 1 1. [8] What are

More information

The Subjective and Objective. Evaluation of. Room Correction Products

The Subjective and Objective. Evaluation of. Room Correction Products The Subjective and Objective 2003 Consumer Clinic Test Sedan (n=245 Untrained, n=11 trained) Evaluation of 2004 Consumer Clinic Test Sedan (n=310 Untrained, n=9 trained) Room Correction Products Text Text

More information

RECOMMENDATION ITU-R BS

RECOMMENDATION ITU-R BS Rec. ITU-R BS.1194-1 1 RECOMMENDATION ITU-R BS.1194-1 SYSTEM FOR MULTIPLEXING FREQUENCY MODULATION (FM) SOUND BROADCASTS WITH A SUB-CARRIER DATA CHANNEL HAVING A RELATIVELY LARGE TRANSMISSION CAPACITY

More information

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York

Audio Engineering Society. Convention Paper. Presented at the 115th Convention 2003 October New York, New York Audio Engineering Society Convention Paper Presented at the 115th Convention 2003 October 10 13 New York, New York This convention paper has been reproduced from the author's advance manuscript, without

More information

Discrimination of Virtual Haptic Textures Rendered with Different Update Rates

Discrimination of Virtual Haptic Textures Rendered with Different Update Rates Discrimination of Virtual Haptic Textures Rendered with Different Update Rates Seungmoon Choi and Hong Z. Tan Haptic Interface Research Laboratory Purdue University 465 Northwestern Avenue West Lafayette,

More information

University of Huddersfield Repository

University of Huddersfield Repository University of Huddersfield Repository Moore, David J. and Wakefield, Jonathan P. Surround Sound for Large Audiences: What are the Problems? Original Citation Moore, David J. and Wakefield, Jonathan P.

More information

Envelopment and Small Room Acoustics

Envelopment and Small Room Acoustics Envelopment and Small Room Acoustics David Griesinger Lexicon 3 Oak Park Bedford, MA 01730 Copyright 9/21/00 by David Griesinger Preview of results Loudness isn t everything! At least two additional perceptions:

More information

Development and Validation of an Unintrusive Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings

Development and Validation of an Unintrusive Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings Development and Validation of an Unintrusive Model for Predicting the Sensation of Envelopment Arising from Surround Sound Recordings Sunish George 1*, Slawomir Zielinski 1, Francis Rumsey 1, Philip Jackson

More information

Binaural auralization based on spherical-harmonics beamforming

Binaural auralization based on spherical-harmonics beamforming Binaural auralization based on spherical-harmonics beamforming W. Song a, W. Ellermeier b and J. Hald a a Brüel & Kjær Sound & Vibration Measurement A/S, Skodsborgvej 7, DK-28 Nærum, Denmark b Institut

More information

PSYCHOACOUSTIC EVALUATION OF DIFFERENT METHODS FOR CREATING INDIVIDUALIZED, HEADPHONE-PRESENTED VAS FROM B-FORMAT RIRS

PSYCHOACOUSTIC EVALUATION OF DIFFERENT METHODS FOR CREATING INDIVIDUALIZED, HEADPHONE-PRESENTED VAS FROM B-FORMAT RIRS 1 PSYCHOACOUSTIC EVALUATION OF DIFFERENT METHODS FOR CREATING INDIVIDUALIZED, HEADPHONE-PRESENTED VAS FROM B-FORMAT RIRS ALAN KAN, CRAIG T. JIN and ANDRÉ VAN SCHAIK Computing and Audio Research Laboratory,

More information

A binaural auditory model and applications to spatial sound evaluation

A binaural auditory model and applications to spatial sound evaluation A binaural auditory model and applications to spatial sound evaluation Ma r k o Ta k a n e n 1, Ga ë ta n Lo r h o 2, a n d Mat t i Ka r ja l a i n e n 1 1 Helsinki University of Technology, Dept. of Signal

More information

Reducing comb filtering on different musical instruments using time delay estimation

Reducing comb filtering on different musical instruments using time delay estimation Reducing comb filtering on different musical instruments using time delay estimation Alice Clifford and Josh Reiss Queen Mary, University of London alice.clifford@eecs.qmul.ac.uk Abstract Comb filtering

More information

Development of multichannel single-unit microphone using shotgun microphone array

Development of multichannel single-unit microphone using shotgun microphone array PROCEEDINGS of the 22 nd International Congress on Acoustics Electroacoustics and Audio Engineering: Paper ICA2016-155 Development of multichannel single-unit microphone using shotgun microphone array

More information

Speech Coding in the Frequency Domain

Speech Coding in the Frequency Domain Speech Coding in the Frequency Domain Speech Processing Advanced Topics Tom Bäckström Aalto University October 215 Introduction The speech production model can be used to efficiently encode speech signals.

More information

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback PURPOSE This lab will introduce you to the laboratory equipment and the software that allows you to link your computer to the hardware.

More information

ONE of the most common and robust beamforming algorithms

ONE of the most common and robust beamforming algorithms TECHNICAL NOTE 1 Beamforming algorithms - beamformers Jørgen Grythe, Norsonic AS, Oslo, Norway Abstract Beamforming is the name given to a wide variety of array processing algorithms that focus or steer

More information

QuantumLogic by Dr. Gilbert Soulodre. Intro: Rob Barnicoat, Director Business Development and Global Benchmarking, Harman International

QuantumLogic by Dr. Gilbert Soulodre. Intro: Rob Barnicoat, Director Business Development and Global Benchmarking, Harman International QuantumLogic by Dr. Gilbert Soulodre Intro: Rob Barnicoat, Director Business Development and Global Benchmarking, Harman International Ref:HAR-FHRB -copyright 2013 QuantumLogic Surround Technology QuantumLogic

More information

Convention Paper 7480

Convention Paper 7480 Audio Engineering Society Convention Paper 7480 Presented at the 124th Convention 2008 May 17-20 Amsterdam, The Netherlands The papers at this Convention have been selected on the basis of a submitted

More information

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS) AUDL GS08/GAV1 Auditory Perception Envelope and temporal fine structure (TFS) Envelope and TFS arise from a method of decomposing waveforms The classic decomposition of waveforms Spectral analysis... Decomposes

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Architectural Acoustics Session 2aAAa: Adapting, Enhancing, and Fictionalizing

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Performance Analysis of OFDM under DWT, DCT based Image Processing Anshul Soni soni.anshulec14@gmail.com Ashok Chandra Tiwari Abstract In this paper, the performance of conventional discrete cosine transform

More information

Sound localization with multi-loudspeakers by usage of a coincident microphone array

Sound localization with multi-loudspeakers by usage of a coincident microphone array PAPER Sound localization with multi-loudspeakers by usage of a coincident microphone array Jun Aoki, Haruhide Hokari and Shoji Shimada Nagaoka University of Technology, 1603 1, Kamitomioka-machi, Nagaoka,

More information

Discrete-Time Signal Processing (DTSP) v14

Discrete-Time Signal Processing (DTSP) v14 EE 392 Laboratory 5-1 Discrete-Time Signal Processing (DTSP) v14 Safety - Voltages used here are less than 15 V and normally do not present a risk of shock. Objective: To study impulse response and the

More information