Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications

Size: px

Start display at page:

Download "Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications"

April Davidson
5 years ago
Views:

1 Next Generation Surround Decoding and Upmixing for Consumer and Professional Applications Mark Vinton 1, David McGrath 2, Charles Robinson 3, Phil Brown 4 1 Dolby Laboratories, Inc., USA, mvint@dolby.com 2 Dolby Laboratories, Inc., USA, dxm@dolby.com 3 Dolby Laboratories, Inc., USA, cqr@dolby.com 4 Dolby Laboratories, Inc., USA, cbrow@dolby.com Abstract This paper describes a new spatial audio algorithm that creates a channel-based, three-dimensional sound scene from two or more input channels. The algorithm was designed to decode matrix encoded programs (Lt/Rt). It is also an effective stereo upmixer; the signal relationships that guide the decoding algorithm (e.g. cross correlation) also provide appropriate cues to the intended spatial scene for standard, unencoded programs we decode the artist s intent. Input channel configurations with more than two channels are decomposed into channel pairs which are then processed independently. Improvements relative to existing surround decoding systems include improved selectivity and separation due to multi-band processing; increased listener envelopment through independent processing for direct and diffuse signal components and user adjustable decorrelation; and support for an arbitrary number of output channels at user specified locations including elevation. The system described has been recently deployed in consumer and professional products for home, mobile, and cinema applications. In this paper we give a detailed description of the signal processing, and provide results from a subjective listening test which indicates significant improvement relative to legacy systems. 1. Introduction With the widespread adoption of DVD, Blu-ray, digital broadcast, as well as internet based over the solutions, the utilization of multi-channel (greater than 2 channels) audio playback devices have become common place. Despite the advent of new, high-bandwidth, formats and content delivery mechanisms there still remains a large amount of legacy stereo and matrix-encoded material being delivered to consumers. For this reason, there is a natural desire to rerender stereo content such that the benefits of multi-channel reproduction can be attained. Furthermore, as Immersive Audio content delivery becomes more prevalent, consumers will install audio playback systems that go well beyond the 5.1 or 7.1 channels supported by DVD and current digital broadcast. Therefore, there will be a growing need to not only render stereo and matrix-encoded material to these new high speaker channel configurations, but also to render 5.1 and 7.1 channel material to yet higher output channel configurations. While surround matrix decoders have been common place for more than 30 years, the availably of increasingly more powerful and affordable signal processors have led to improvements in the matrix decoding algorithms. The use of frequency based techniques and new signal models have enabled matrix-decoders to have improved selectivity and sense of envelopment. This paper introduces a new matrix decoding system with the capability to render matrix-encoded and stereo material to an arbitrary number of speaker channels. Furthermore, the matrix decoder can be used to re-render 5.1 and 7.1 channel content to even higher channel formats. Key to the operation of this new matrix decoder is the separation of steered (direct) signals from diffuse signals in both time and frequency. The steered signal components can be accurately spatially rendered irrespective of the output speaker configuration. While the diffuse signal content is spread to improve the sense of envelopment. The paper is organized as follows: in section 2, conventional matrix encoding is presented and an extension for encoding arbitrary speaker locations is formulated. The new matrix decoding system that renders a 2 channel input to arbitrary output channels is detailed in section 3. Section 4 discusses the use of the core matrix decoder to convert multi-channel content (5.1 & 7.1 content) to even higher output speaker configurations. In section 5 subjective test results are presented that demonstrate the performance improvements achieved by the new matrix decoder. 2. Matrix Encoding for Arbitrary Horizontal Directions Two-channel, matrix encoded, signals have been used to encode 4-channel, 5-channel and 7-channel surround sound, according to the "rules" of Dolby Pro-Logic encoding/decoding. The goal of this section is to formulate equations that will define the encoding (and hence, also, decoding) rules for audio objects or speaker channels at arbitrary locations in a ring around the listener. Figure 1 shows the arrangement of a number of speaker channels on the matrix-coding circle. Early versions of Dolby Surround made use of the L, C, R and S channels, and these channels appear at 90 intervals around the matrixcoding circle.

2 Figure 1: The Matrix Coding Circle L Ls θ C S 5-channel Surround sound was accommodated by replacing the S channel with Ls and Rs channels, and these are also shown in Figure 1. A traditional matrix encoder was supplied with a number of input channels, each corresponding to a specific speaker. The two channel (LtRt) output, was formed by a linear matrix operation. For example, a 5-channel matrix encoder operates according to equation (1): LT = R T Rs j j where the imaginary unit, j, indicates a R L C j (1) R j Ls Rs 90 phase shift Encoding Gains For Arbitrary Angles The following functions (given by equations (2) through (4)) are defined for generating the panning gains for arbitrary speaker locations with respect to : f ( ) = G( ) (2) G = G L R ( ) ( ) cos e 2 4 = cos e 2 4 j ( ) j ( ) This function is not yet fully defined, because it includes an additional phase shift, ( ), applied to both channels. This phase shift is optional, but without it, the equations will contain an ambiguity. For example, if () = 0, this will mean that f ( ) is anti-periodic over an interval of 2. This means that f ( 2 ) = f ( ), so a speaker channel placed at = will not have it's panning function uniquely defined, since the same channel could have been assigned to the angle, =. The solution to this ambiguity problem is to define an appropriate phase shifting function to ensure that ( ) is also anti-periodic (since the product of two anti-periodic (3) (4) ( functions will be periodic). Therefore, ) is constructed so that it has the following properties: 0 when ( ) = j when (5) 4 3 j when 4 A function for ( ) is constructed by smoothly interpolating (with a cubic spline) in the regions and So, for all in the range [, ], equations (6) and (7) are used to calculate ( ). ( ) = max (0, min(1, 2 sin )) (6) 2 3 (3 2 ) ( ) = (3 2 ) 2 0 (7) < 0 The final panning functions are shown in Figure Azimuth, Figure 2: L T and R T gains, and additional phase added outside the region from -90 to Channel Warping Phase The panning function defined above is only valid if the speaker channels are defined in an unconventional manner. For example, by placing the L and R channels at 90 and 90, respectively. In practice, the assumed (physical) location of these speakers would be 30 and 30. In order to accommodate this convention, the physical speaker locations are warped. Assuming the speaker angle is s (in the range [, ]), (the encode angle) is computed as a piecewise linear function of s as shown in equation (8): L T R T

3 6 s 4 10 = 3 s 6 s 4 10 when s < 6 when s 6 6 when s 6 3. Matrix Decoding from 2 Channels to N Output Channels One of the primary limitations of legacy matrix decoding systems such as Dolby Pro Logic and Pro Logic 2 [1] is that they can only identify and pan a single sound source at any given time. To alleviate this limitation the proposed matrix decoder operates in the frequency domain, hence allowing for improved signal separation as each analysis band can separate and pan a sound source independently. Figure 3 shows a high level schematic of the new decoding system. The 2-channel input signal is first transformed into the frequency domain. The frequency domain representation is then grouped into 20 bands each having approximately 2- ERB spacing (equivalent rectangular bandwidth [2]). Estimates of the input signal statistics in each of the 20 bands are calculated and used to drive the matrix decoding to N arbitrary output channels. Finally the spectral domain representation of the output channels is converted back to the time domain with a frequency time transform. LT(n) RT(n) Time- Frequency Transform Time- Frequency Transform LT(m,k) RT(m,k) Statistical Estimation 2 to N channel upmix Z1(m,k) Z2(m,k) ZN(m,k) (8) Frequency- Time Transform Frequency- Time Transform Frequency- Time Transform Figure 3: A high level block diagram of the matrix decoding system showing the use of spectral domain processing for matrix decoding to arbitrary output channels. The operation of the matrix decoder in each of the 20 processing bands is depicted in figure 4. In the upper path, 3 estimates of the input signal statistics are calculated. The statistical estimates are then used to separate the input signal into the left and right steered signals (s L (m,k) and s R (m,k)) and the left and right diffuse signals (d L (m,k) and d R (m,k)). The steered signals are then panned to appropriate output channels while the diffuse signals are decorrelated and spread to all of the output channels. The statistical estimation, steered and diffuse separation, steered signal panning, and diffuse signal decorrelation and spreading are described in detail in the following sections. Z1(n) Z2(n) ZN(n) LT(m,k) RT(m,k) Statistical Estimation X(m,b) Y(m,b) T(m,b) Separate Direct and Diffuse Signals sl(m,k) sr(m,k) dl(m,k) dr(m,k) X(m,b) Y(m,b) T(m,b) Panning Decorrelation and Spreading Z1(m,k) ZN(m,k) Figure 4: A block diagram showing the operation of the matrix decoder in each of the 20 processing bands Estimating Input Signal Statistics As described in the previous section the matrix decoder estimates 3 signal statistics from the 2 channel input. Specifically, the decoder estimates the real part of the crosscorrelation between the input signals (X(m,b)), the difference between the powers of the input signals (Y(m,b)), and the sum of the powers of the input signals (T(m,b)). Each of the statistical estimates are accumulated over the processing band (b) and then further smoothed over time blocks (m) using a frequency dependent leaky integrator (first order IIR filter). As the band grouping approximates 2-ERB spacing and due to the linear spacing of the analysis time-frequency transform, there is more data available to make the statistical estimates at high frequencies relative to the lower frequencies. Therefore, the decay times of the leaky integrators at low frequencies are longer than those used at higher frequencies. Equations (9) through (11) show the estimation of X(m,b), Y(m,b), and T(m,b) respectively. ( ) (9) ( ) (10) ( ) (11) Where: m is the time block index, k is the frequency index, b is the processing band index, Bs(b) is the frequency index k of the start of processing band b, Be(b) is the frequency index k of the end of the processing band b, and α b is the frequency dependent smoothing coefficient for band b

4 3.2. Separation of the Steered and Diffuse Signals The core of the matrix decoding is the separation of the steered signal between the input signals in each band from any diffuse signals in each of the input signals. The separation is achieved by assuming a simple signal model (similar to the model proposed by Faller in [3]) that allows for a single steered signal between the inputs L T and R T with a diffuse signal in each input signal. The signal model is represented by equations (12) through (17) for input L T and R T respectively. For simplicity, the time, frequency, and complex signal notations have been omitted. (12) (13) From equation (12), L T is constructed from a gain G L multiplied by the steered signal s plus a diffuse signal d L.R T is similarly constructed as shown in equation (13). It is further assumed that the power of the steered signal is S 2 as shown in equation (14). The cross-correlation between s, d L, and d R are all zero as shown in equation (15), and power in the left diffuse signal (d L ) is equal to the power in the right diffuse signal (d R ) which, are equal to D 2 as shown in equation (16). With these assumptions, the covariance matrix between the input signals L T and R T is given by equation (17). { } (14) { } { } { } (15) { } { } (16) { } [ ] (17) In order to separate out the steered signals from L T and R T a 2x2 signal dependent separation matrix is calculated using the least squares method as shown in equation (18). The solution to the least squares equation is given by equation (19). The separated steered signal is therefore estimated by equation (20). [ {([ ] [ ]) }] (18) [ ] (19) [ ] [ ] (20) The derivation of the signal dependent separation matrix W for time block m in processing band b with respect to the signal statistic estimations described in section 3.1 is given by equation (21). [ ](21) The 3 measured signal statistics (X, Y, and T) with respect to the assumed signal model are given by equations (22) through (24). The result of substituting equations (22), (23), and (24) into equation (21) is biased estimate of the least squares solution given by equation (25). (22) (23) (24) [ ] (25) 3.3. Panning the Steered Signal to Arbitrary Output channels In section 3.2 the separation of the steered signal components from the 2 input signals was discussed. In order to generate the output channel signals, the steered content is panned to 2 output channels that bound the sound source. Figure 5 illustrates the location of the bounding output channels for a simple 5 channel output configuration (relative to the encode angle discussed in section 2.1). In the example shown in figure 5, the sound source is located between the left output channel and the left surround output channel. Bounding output channels θ 1 L θ (m,b) Figure 5: The bounding output channels for a steered signal located between the left and left surround output channels. Ls θ 2 To calculate the angle of the steered sound source the generalized encode equations discussed in section 2.1 are substituted into the signal model covariance matrix in equation (17) to give equation (26). From equation (26) it is clear that the angle of the steered signal is given by equation (27). [ { } ( ) C Rs R ( ) [ ] (26) (27) ]

5 The calculation of the steered signal angle with respect to the measured statistics from section 3.1 for time block m and processing band b is given by equation (28). ( ) (28) To construct the panned signals for the 2 bounding output channels, the separated left and right steered signals are multiplied by the inverse of the assumed encode matrix for the 2 bounding channels. From the example illustrated in figure 5, the left output channel has an encode angle θ 1 and the left surround channel has encode angle of θ 2. Hence, for this example the panned signals to the left and left surround output channels are given by equation (29). Note that the steered signal is the separation matrix W(m,b) multiplied by the 2 input signals L T and R T. [ ] ([ ( [ ( ) ( ) ( ) ]) ) ] (29) The steered signal for all the output channels (Z S (m,k)) is a zero vector with the panned signal from equation (29) inserted for the elements corresponding to the 2 bounding output channels as shown in equation (30). [ ] (30) Figure 6 shows the resulting panning coefficients for a circular pan to a standard 5 channel output configuration. While the examples in this section have been for a standard 5 channel output the solution is easily applied to any arbitrary output channel configuration. Gain C L Ls Rs R Angle (degrees) 3.4. Decorrelating and Spreading the Diffuse Signals to the Output Channels The last component of the matrix decoder is to separate out the diffuse signals and spread them to each of the output channels. The derivation of the left and right diffuse signals is simply the left and right total signals minus the left and right steered signals as shown in equation (31). [ ] [ ] [ ] (31) To improve the diversity of the diffuse content that will be spread to all of the output channels, 3 additional diffuse signals are created as functions of the left and right diffuse signals, derived in equation (31), such that there a total of 5 diffuse channels. The functions are designed to (ideally) ensure that there is zero correlation between any of the diffuse signals. The creation of the 5 intermediate diffuse signals is shown in equation (32). ( ) (32) ( ) [ ( )] As shown in equation (32), the functions of d L and d R are sums of frequency dependent delayed versions of d L and d R. The frequency dependent delays are chosen such that low frequencies are delayed more than high frequencies. ( ) (33) Finally, the diffuse signals are spread to all of the output channels to create the output diffuse signal (Z d (m,k)). To achieve this, the 5 element column vector from equation (33) is multiplied by N by 5 matrix (where N is number of output channels) as shown in equation (34). The spreading matrix O from equation (34) is designed such that each row is a unit vector. [ ] ( ) (34) ( ) [ ( )] The diffuse signal is applied to all output channels; this includes channels that are not on the horizontal plane (e.g. surround channels). From figure 2 in section 2, the steered signals (Z s (m,k)) from equation (30) and the diffuse signals (Z d (m,k)) from equation (34) are added together as shown in equation (35) prior converting the matrix decoded signal back to the time domain. (35) Figure 6: The resulting gains applied to a standard 5 channel output configuration for a circular pan.

6 4. Upmixing and Extension to Multichannel Input 4.1. Upmixing Stereo Signals The methods described in section 3 for separating and processing the steered and diffuse content of matrix encoded content work very well for standard stereo mixes. Audio elements panned between the left and right channels can be identified, extracted, and re-panned to the loudspeakers available in the playback environment. Extraction of a center signal that is then played out through a center loudspeaker is an important example. Likewise, extraction of the diffuse signal component and the creation of multiple uncorrelated loudspeaker signals for playback generates a strong sense of envelopment from a stereo program The process of creating a higher channel count program from non-encoded content is commonly referred to as upmixing. Previous matrix decoders have demonstrated efficacy operating in an upmixing mode. Many of the features introduced in the algorithm to improve matrix decoding multi-band processing, arbitrary output channel count and location, and added diversity for the diffuse signal component also add benefit when operating in an upmixing mode Extention to Multi-channel Input A generalized M input channel to N output channel (M-to- N) upmixing system can be constructed by employing 1 or more instantiations of the 2-to-N upmixer. The basic premise is to use the 2-to-N upmixer as a building block to expand the existing surround channels to additional rear channels as well as expand the front and surround channels to surround channels ( surround channels provide audio signals from above the listener). Up to this point in the document, the input to the 2-to-N upmixer/matrix decoder has been assumed to be matrix encoded L T /R T or stereo signals. If instead, the input is the surround channels from a 5.1 mix, side and rear surround channels can be created by simply specifying the correct output configuration that mimics the positions of the side surround and rear surround channels. In this example, the upmixer is fed Ls/Rs from 5.1 and configured with 4 output speaker channels; 2 at ±10 and 2 at ±30 (referring back to section 2, 0 is center front). The output of the upmixer is the side surround channels (the outer pair) as well as the rear surround channels (the inner pair). Furthermore, the 2-to-N upmixer can be used to create surround channels from a legacy 5.1 mix. While a legacy 5.1 mix does not intrinsically carry height information, the upmixer can be used to extract ambient or diffuse information from the input signals and steer it toward the surround channels. The result of spreading ambient content to surround channels greatly improves the sense of envelopment. In the case of 2 surround channels, the diffuse signal from both the left and right channels as well as the diffuse signal from left and right surround channels can be combined. In the case of four surround channels, the diffuse signals can be combined or kept separate. If the diffuse signals from the front and back are kept separate then the front receives only left and right ambience and the rear only receives the left and right surround ambience. Figures 7-13 show various M-to-N upmix scenarios. Table 1 shows the channel notations used in the figures. As an example, figure 7 shows an upmix from a traditional 5.1 mix to essentially adding two height channels. The 2-to- 4 block is the 2-to-N matrix decoder configured to output two front channels and two channels. The front channels are simply the traditional left and right channels at ±30 and the channels are mirrors of these on the. The height information is derived from the diffuse content as described in sections , where no steered content is delivered to the speakers. In the case of figure 7, this 2- to-4 block is used for both the front and surround speakers, and the height channels are pairwise summed. As another example, figure 9 shows an upmix from a traditional 5.1 mix to 7.1 essentially adding two back surround channels. The back block is the 2-to-N matrix decoder configured to output two inner and two outer front channels. The outer front channels are configured as the traditional left and right channels at ±30 (which become side surround channels in our upmixer) and the inner front channels are placed at ±10 and become the back surround channels in our upmixer. Notation L R C LFE Ls Rs Lss Rss Lrs Rrs Lts Rts Ltf Rtf Ltr Rtr Description Left Right Center Low Frequency Effects Left Surround Right Surround Left Side Surround Right Side Surround Left Rear Surround Right Rear Surround Left Top Surround Right Top Surround Left Top Front Right Top Front Left Top Rear Right Top Rear Table 1: Channel Notation used in the Figures

7 Lsin Rsin Lsout Rsout Ltsout Rtsout Lsin Rsin 2-to-6 back & Lssout Rssout Ltsout Rtsout Figure 7: A 5.1 to upmix scenario where the Top Surround channels are derived from the Left, Right and Surround channels. Lrsout Rrsout Ltfout Rtfout Figure 10: A 5.1 to upmix scenario where the Top Surround channels are derived from the Left, Right and Surround channels; and where the Side Surround and Rear Surround channels are derived from the Surround channels. Ltfout Rtfout Lsin Rsin Lsout Rsout Ltrout Rtrout Figure 8: A 5.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Surround channels. Lsin Rsin 2-to-6 back & Lssout Rssout Ltrout Rtrout Lrsout Rrsout Lsin Rsin back Lssout Rssout Lrsout Rrsout Figure 11: A 5.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Surround channels; and where the Side Surround and Rear Surround channels are derived from the Surround channels. Figure 9: A 5.1 to 7.1 upmix scenario where the Side Surround and Rear Surround channels are derived from the Surround channels.

8 Lssin Rssin Lrsin Rrsin Lssout Rssout Lrsout Rrsout Ltsout Rtsout Figure 12: A 7.1 to upmix scenario where the Top Surround channels are derived from the Left, Right, Side Surround and Rear Surround channels. Lssin Rssin Lrsin Rrsin Ltfout Rtfout Lssout Rssout Lrsout Rrsout Ltrout Rtrout Figure 13: A 7.1 to upmix scenario where the Top Front channels are derived from the Left and Right channels; and where the Top Rear channels are derived from the Side Surround and Rear Surround channels. 5. Validation Subjective Testing Subjective testing was conducted to determine if the new algorithm provides an improvement over legacy systems. In particular, we wished to verify Dolby Surround was better than Dolby s existing surround encoding system, Dolby Pro Logic II. The algorithms were tested operating in decode mode for matrix encoded 5.0 surround sound programs Test Design The test was designed to measure impairments to a surround audio program as a result of spatial coding (matrix encode/decode). The audio under test is easily distinguished from the source, so a MUSHRA-like testing methodology [4] was selected. The reference signal is the original multichannel program. The original was also included among the items under test as a hidden reference. Dolby Pro Logic (I) was used as an anchor signal. This is a departure from a true MUSHRA test which calls for band limited anchors (3.5 khz and optionally 7 khz band limited signals). For spatial audio coding, bandwidth limitation is not a typical artifact, and therefore not an appropriate anchor Systems under Test Table 2 lists the systems included in the subjective test along with the purpose for including the systems in the test. System Identifier Purpose Dolby Pro Logic PL1 Anchor Dolby Pro Logic II PL2 System Under Test Dolby Surround DS System Under Test Source 5.0 signal Hidden Ref Hidden Reference Table 2: Coding systems compared in the subjective test Test Items Eight items were selected for the subjective test. The items were selected from a large set of surround content. Candidate items were evaluated and selected according to the following objectives: Relevance: signals that are likely to be applied to spatial coding (i.e. real-world programs) Diversity of content: Signals from each of three categories: TV/Movies, Sports, and Music Range of performance: Signals that are good and bad cases for the Systems Under Test In addition, it was decided to include one worst-case test signal, applause, which has been commonly used in testing in the past and is therefore a good touchstone for performance. All items are 5.0. LFE signals, if present in the original program, were deleted. All files were level balanced and time aligned after encoding and coder order presentation was varied to decrease presentation order bias. Listeners were allowed to adjust playback volume Test Execution Ten listeners were briefly familiarized with the systems under test as well as the software interface used to perform the test. All of the listeners used were experienced with this type of testing; training was brief. The test was performed in purpose-built listening room equipped for 5 channel playback using Event ASP8 studio monitors, the specific acoustic properties of the listening room are detailed in

9 table 3. The test was built and run using a custom subjective testing software GUI. Parameter Measured Values Room dimensions 30.6 m 2 area l = 6.60 m, w =4.64 m, h = 2.60 m Reverberation time Hz Hz Hz Hz Hz Hz Hz Hz Speaker configuration R,L ± 30 C 0 Rs,Ls ± Analysis Table 3: Measured room parameters Listener Post-Screening All the listeners scores were included in the analysis. None of the listener s results deviate by greater than 1.5 IQR from the middle 50% of listeners for any of the items. Three listeners each had a single misdetect of the hidden reference. Removing these listeners does not significantly change the mean results Statistical Analysis The mean score with 95% confidence intervals are computed independently for each system across all listeners and test items. Figure 14 shows the results for each stimulus. The overall result across all stimuli is shown above the label Total. The trend from PL1 to PL2 to DS shows increasing mean scores. The confidence intervals do not overlap between PL1 and PL2. This is a strong indication that PL1 and PL2 are statistically distinct. The confidence intervals do overlap between PL2 and DS. To better resolve the differences between the systems, we turn to ANOVA method [5]. ANOVA is a more powerful method for determining statistical differences. The increased power is achieved by using all the data to compute a single variance for all the systems under test. An underlying assumption of ANOVA is that the systems do in fact have equal variance. Visual inspection of figure 14 shows that PL1 has noticeably lower variance. Application of the F Max Test [6] and the Bartlett Test [7] for uniformity of variance both fail, confirm our observation. Thus ANOVA is not appropriate for this data set. Figure 14: Mean and 95% confidence interval for all listeners and all 8 items. As stated earlier, the purpose of this is to verify that Dolby Surround (DS), the new surround decoder, performs better than Dolby Pro Logic 2 (PL2). As such, as simple paired t- test can be applied. This is a relatively powerful method for comparing two data sets that have paired measurements; in this test each measurement of PL2 is replicated on DS under similar conditions (same listener, same test item). Comparing PL2 and DS in this way indicates that the two systems are statistically different. Results of this analysis are in table 4 below. The p-value of indicates high statistical significance. Parameter/system DS PL2 Mean Variance Observations t Stat P(T<=t) two-tail t Critical two-tail (95% conf.) Table 4: PL2 versus DS t-test results Discussion of Test Results DS is significantly better than PL2, statistically. How might this translate to a practical difference for listeners? Looking at the means of the test scores, PL1: 56. PL2: 65. DS 72. we see that PL2 scored 9 points better than PL1 and DS is 7 points better than PL2. Thus (for Lt/Rt decoding), DS is almost as large of an improvement over PL2 as PL2 was over PL1. PL2 is widely considered a significant improvement over PL1, and has been widely adopted. This test indicates DS would likely be well received in the market place. Additional insight into DS performance can be gained by looking at figure 15. Briefly, DS shows the highest scores, and the most improvement over PL2 on a4v content ( Audio for Video =

10 TV and Movies), which is by far the most important use case. DS shows little / no improvement on music items in this test Lt/Rt encoded 5.0 music. This use case is virtually non-existent. Lastly, on Applause ( Test ) DS scored between PL2 and PL1. There is not a statistical difference, but (based on listener verbal feedback) the systems each have distinct artifacts and listener scores vary based on individual sensitivity to the impairment type (spaciousness, timbre change, etc). [5] Mann, H. B. (1949). Analysis and Design of Experiments: Analysis of Variance and Analysis of Variance Designs. New York, N. Y.: Dover Publications, Inc. [6] Hartley, H.O. (1950). The Use of Range in Analysis of Variance. Biometrika, 37, [7] Bartlett, M. S. (1937). "Properties of sufficiency and statistical tests". Proceedings of the Royal Statistical Society, Series A160, Figure 15: Mean and 95% confidence interval for all listeners, shown by content type. 6. Conclusion Conventional matrix encoding was examined and a generalized encoding equation was formulated for arbitrary speaker locations. A new matrix-decoder was discussed that separates steered and diffuse signals in both time and frequency. The separation of steered and diffuse signals allows for accurate panning of the steered signals to an arbitrary output channel configuration, while the diffuse signal can be spread to all output channels for increased envelopment. The use of multiple instantiations of the core matrix decoder was discussed as a way to re-render multichannel content to even high channel configurations. Subjective testing procedures and results were presented that show the matrix decoder is statistically better than previous matrix decoders. 7. References [1] K. Gundry, A New Matrix Decoder for Surround Sound, AES 19 th International Conf., Schloss Elmau, Germany, [2] B. C. J. Moore and B. R. Glasberg, Suggested Formulae for Calculating Audatory-Filter Bandwidths and Excitation Patterns Journal of Accoustic Society of America74: , [3] C. Faller, Matrix Surround Revisited, AES 30 th International Conf., Saariseka, Finland, [4] ITU-R Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems. ITU-R Recommendation BS Geneva, 2003.

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting

RECOMMENDATION ITU-R BS User requirements for audio coding systems for digital broadcasting Rec. ITU-R BS.1548-1 1 RECOMMENDATION ITU-R BS.1548-1 User requirements for audio coding systems for digital broadcasting (Question ITU-R 19/6) (2001-2002) The ITU Radiocommunication Assembly, considering