Journal of the Acoustical Society of America 88

Size: px

Start display at page:

Download "Journal of the Acoustical Society of America 88"

Melvyn Lawson
6 years ago
Views:

1 The following article appeared in Journal of the Acoustical Society of America 88: and may be found at Copyright (1990) Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America.

Analytical expressions for the tonotopic sensory scale Hartmut Traunm011er Institutionen fsr lingvistik, Stockholms universitet, S-106 91 Stockholm, Sweden (Received 16 August 1989; accepted for

2 Analytical expressions for the tonotopic sensory scale Hartmut Traunm011er Institutionen fsr lingvistik, Stockholms universitet, S Stockholm, Sweden (Received 16 August 1989; accepted for publication 20 February 1990) Accuracy and simplicity of analytical expressions for the relations between frequency and critical bandwidth as well as critical-band rate (in Bark) are assessed for the purpose of applications in speech perception research and in speech technology. The equivalent rectangular bandwidth (ERB) is seen as a measure of frequency resolution, while the classical critical-band rate is considered a measure of tonotopic position. For the conversion of frequency to critical-band rate, and vice versa, the inversible formula z = [26.81/( / f) ] is proposed. Within the frequency range of the perceptually essential vowel formants ( khz), it agrees to within _ Bark with the Bark scale, originally published in the form of a table. PACS numbers: Cq, Ar, Fe INTRODUCTION Two processes are generally assumed to contribute to auditory frequency resolution. First, the hearing system is capable of performing an "oscillographic" analysis of the set of neural signals originating in the cochlea. This process is limited to frequencies that can be resolved in the pattern of neural responses. While single neurons are not likely to fire more frequently than 500 times per second even at high stimulus intensities, frequencies between 0.5 and 1.5 khz can still be handled in the temporal domain, albeit less efficiently, on the basis of the signals from a large number of neurons. The capability and limitations of a frequency analysis in the temporal domain are demonstrated vividly by cochlear implant patients whose sole auditory input is an undifferentiated electrical stimulation of the auditory nerve. The second process covers the whole auditory frequency range. Any sound entering a normal functioning cochlea is subjecto a spectral analysis, resulting in a frequency-toplace transformation. The cochlea can be regarded as a bank of filters whose outputs are ordered tonotopically, with the filters closest to the base responding maximally to the highest frequencies. The tonotopic order is known to be maintained in the structure of the neural network at higher levels in the hearing system. The "notch-noise method" has often been used in investigations of auditory frequency selectivity. It involves the determination of the detection threshold for a sinusoid, centered in a spectral notch of a noise, as a function of the width of the notch. On the basis of results obtained with this method, auditory frequency selecivity can be described in terms of the equivalent rectangular bandwidth (ERB) as a function of center frequency (Moore and Glasberg, 1983). Since the two processes mentioned above both contribute to the detection of the sinusoid, the ERB, or ERB rate should not be taken as a measure of the tonotopic scale as such. A quantity related to the ERB, though not identical with it, is the classical critical bandwidth (CB) (Zwicker et al., 1957). Measurement of the CB typically involves loudness summation experiments. Different summation rules have been found to hold for auditory stimuli, depending on whether their frequency components are separated by more or less than the CB. The CB and the ERB have been found to be proportional and equivalent for center frequencies above 500 Hz. For lower frequencies, there is a discrepancy, as shown in Fig. 1. In this range, the ERB decreases with decreasing center frequency, while the CB remains close to constant. The discrepancy can be explained by the reasona- ble assumption that the analysis within the temporal domain is irrelevanto loudnessummation as long as loudness variations are not audible as such, while it contributes substantially to frequency resolution for f< 500 Hz. Consequently, the CB should not be taken as a measure of frequency resolution, but CB rate may be taken as a measure of the tonotopic sensory scale. In the familiar CB-rate scale (see Fig. 2), the CB has been chosen to serve as a natural unit of the tonotopic sensory scale. Standard values for the relation between frequen- O. I ;.0 IO ß I I I I I I S S; z,.o l <f) FIG. 1. Equivalent rectangular bandwidth, according to the formula B = f2 q_ f , given by Moore and Glasberg ( 1983 ) (curve), and critical bandwidth, according to Zwicker's ( 1961 ) table (marks), as a function of frequency. 97 J. Acoust. Soc. Am. 88 (1), July /90/ Acoustical Society of America 97

24 Frequency (khz> 0 0.2 O.S 1.0 2.0 S.O 10 I I! I! I I I. ANALYTICAL EXPRESSIONS A.

3 24 Frequency (khz> O.S S.O 10 I I! I! I I I. ANALYTICAL EXPRESSIONS A. Expressions for critical-band rate In rough approximation, the relation betweenf and z is linear for f< 500 Hz (z =f/100) and logarithmic for higher frequencies. Figure 3 (a) shows the error functions of two logarithmic approximations to the CB scale. One of these, Eq. (1), has been suggested by Zwicker and Terhardt (1980). It gives values that agree with the tabulated ones to within _ 0.25 Bark in the range 0.6 <f< 7.2 khz. The other approximation, Eq. (2), satisfies our stricter standards of no more than _ 0.05-Bark deviation at the cost of a reduction in the range of validity, to 1.0 < f< 3.6 khz: z = 14.2 log(f/1000) 4-8.7, ( 1 ) z = ln(f) (2) lg(f> FIG. 2. Critical-band rate z as a function offrequencyf The plus sign ( + ) represents data from Zwicker ( 1961 ). The curve corresponds to Eq. (6). cyfand CB rate z have been proposed by Zwicker ( 1961 ) in the form of a table. The CB-rate scale has been applied extensively in research on psychoacoustics and speech perception. For most of these applications, it would be more convenient to have the relation between z and fspecified in the form of an equation instead of a table. Several equations that approximate the tabulated values have also been published (Tjomov, 1971; Schroeder, 1977; Zwicker and Terhardt, 1980; Traunm011er, 1983). In the following, the error functions of these equations will be compared. Recent studies of speech sound suggest that the tonotopic distances (CB-rate differences) between prominent peaks in their spectra are fundamental to the perception of their phonetic quality. More specifically, it has been suggested that the spectral peaks shaped by the formants and the fundamental have the same relative tonotopic locations in linguistically identical vowels uttered by speakers different in age and sex (Traunm011er, 1983, 1988; Syrdal and Gopal, 1986). While differences in speaker size appear to be reflected in a tonotopic translation of the spectral peaks, differences in vocal effort appear to be reflected in a linear tonotopic compression/expansion (Traunm011er, 1988). In order to test these hypotheses, both in theory and by means of speech synthesis, a convenient and accurate method of conversion from frequency to CB rate, and vice versa, is needed. Our requirements include that the function have a simple inverse and that it be accurate preferably to within _ 0.05 Bark in the range of essential vowel formant frequencies of men, women, and children. This rigorous claim for accuracy prevents the introduction of any avoidable error in addition to that inherent in the table (Zwicker, 1961 ). However, it should be noticed that the absolute width of the critical band, and its definition, is irrelevan to the applications we have in mind, as long as the obtained scales remain proportional. 98 J. Acoust. Soc. Am., Vol. 88, No. 1, July 1990 In these and in all the following equations, frequencyf is to be expressed in Hz and CB rate z in CB units (Bark). A mathematical function that is linear at one extreme and logarithmic at the other extreme, the sinus-hyperbolicus function, has been used by Tjomov ( 1971 ), Eq. (3), and by Schroeder (1977), Eq. (4), to calculate CB rate. The error functions of both equations are shown in Fig. 3(b). f = 600 sinh (z/6.7) 4-20, ( 3 ) z = 6.7 ln{[ (f-- 20)/600] + ([ (f- 20)/600] 2 + 1) /2} (inverse), f = 650 sinh (z?7), (4) z = 7 ln((f/650) + [ (f/650) ] /2) (inverse). As compared with the tabulated values, Tjomov's equation (3) is accurate to within to Bark for f< 4.5 khz and Schroeder's equation (4) to within 4-03 Bark for f< 4.0 khz. These equations are accurate enough for some applications in which frequency components above 4 khz may be neglected, as they are in some systems of telephonic communication. Approximations covering the whole auditory frequency range can be achieved in various ways by appropriate combinations of mathematical functions. For the most part, however, this yields equations that lack a simple inverse. The most accurate of the equations given by Zwicker and Terhardt (1980), z = 13 atn( f) atn(f/7500) 2, (5) is of this kind. It agrees with the table to within to Bark over the whole range of auditory perception [see Fig. 3 (c) ]. The waviness of the error function tells us, however, that there is room for improvement. The equation also clearly falls short of our standards. If, e.g., we want to compare the tonotopic distances between two pairs of spectral peaks, we might obtain an error of up to 0.9 Bark. An approximation that has a simple inverse and meets our standards is achieved by considering z to be related to log(f) by a logistic function, also known as "growth curve." Such an approximation, Eq. (6), has been proposed by Traunm011er (1983). Its error function is shown in Fig. 3(d): Hartmut Traunmdller: Tonotopic sensory scale 98

A. Fregue'ncy f (khz) 0.0 O.S 1.0 2.0 4.0 8.0 I I I I 0.0 0. l.o 2.0.0 8.0 I I I I I½ I I'''''1'... I... I ee e 3 o _ 4 CB-rote z (LobIe vo, lue) (a) 3 v. N 2 I - A. ß 0 -_ - -.! _ O v-2- N ' i _.

4 A. Fregue'ncy f (khz) 0.0 O.S I I I I l.o I I I I I½ I I'''''1'... I... I ee e 3 o _ 4 CB-rote z (LobIe vo, lue) (a) 3 v. N 2 I - A. ß 0 -_ - -.! _ O v-2- N ' i _. ß o. -.3! - --e... I...,I,,,,,I... I,,,,,I CB-rote z (Loble v lue) (b) a, I ß /, I I! I I I 0.0 O.S I I'''''1'''''1'... I'... I 3 2 ß 0 2,,, i,,,,, I,,,, I... I,,,,, I o z, CB-rale z (tab!e v lue) (c).3 ß 2 f - ß 0... c - o v -,2 N o _.... I,,,,, I,,,,, I,,,, I,,,,, I CB- cte z (tc le vclue) o FIG. 3. ( a)-(d) Error functions of various approximations of the CB-rate scale. The error is defined as the difference between the calculated value and that in Zwicker's ( 1961 ) table. It is plotted in steps of 0.5 Bark for each frequency value in that table. (a) Logarithmic approximations: curve with marks, Eq. ( 1 ) [given by Zwicker and Terhardt (1980)]; curve without marks, Eq. (2). (b) Sinus-hyperbolicus approximations: lower curve, Eq. (3) [given by Tjomov ( 1971 )]; upper curve, Eq. (4) [given by Schroeder (1977)]. (c) An overall approximation, Eq. ( 5 ), given by Zwicker and Terhardt (1980).(d) A logistic "growth-curve" approximation: lower curve with error scale at the left, Eq. (6) [given by Traunmiiller (1983)]; upper curve, shown vertically displaced, with error scale at the right, Eq. (6) with corrections (7) and (8). z = [26.81f/( f) ] , (6) f= 1960(z )/( z) (inverse). The values obtained with Eq. (6) deviate from the tabulated ones by less than +_ 0.05 Bark for 0.2 <f< 6.7 khz. At the low-frequency end of the scale, the deviation from the table (Zwicker, 1961 ) sums up to Bark for f= 0 Hz ( Bark for f= 20 Hz). At least in part, this deviation is due to biased rounding of the bandwidth values in Zwicker's table. For frequencies below 400 Hz, the standard width of the critical band was set uniformly equal to 100 Hz. This appears to have been done in order to obtain the mnemonically simple relation z = f? 100. The original bandwidth data (Zwicker et al., 1957 ) indicate B 90 Hz for the lower frequencies in that range. The values listed in the table for f< 100 Hz are particularly questionable because they can hardly be said to be based on any reliable experimental evi- dence. Equation (6) may representhe tonotopic scale well enough down to the lowest frequencies for which it can be determined experimentally. The deviation at the high-frequency end of the scale remains unaccounted for. Calculating z with Eq. (6), close agreement with the table can be achieved over the whole auditory frequency range by added corrections, bending the error function straight at both ends of the scale, in the following way: for calculated z < 2.0 Bark: z'=z+o. 15(2--z), (7) for calculated z > 20 Bark: z' = z q (z ). (8) Since this is an easily inverted procedure, the calculation off for a given zis not a problem. The error function obtained with these corrections is also shown in Fig. 3 (d). The values calculated in this way agree with the table for f> 100 Hz to within -F 0.05 Bark. Correction (7), however, simulates also the above-mentioned bias at low frequencies. 99 J. Acoust. $oc. Am., Vol. 88, No. 1, July 1990 Hartmut Traunm Jller: Tonotopic sensory scale 99

5 F e u, ency f (khz) 0.0 O.E; A..O 8.0 for critical bands centered at z obtained by Eq. (6) without corrections. The values calculated by Eq. (10) agree with Zwicker's table to within + 6% for 0.27 <f<5.8 khz. Within that range, the error function is similar to that obtained by Eq. (9). The error functions of both equations are shown in Fig. 4. ß 0 ' v rn 2 ACKNOWLEDGMENT The preparation of this paper has been supported by a grant from HSFR, the Swedish Council for Research in the Humanities and Social Sciences. u.i I,, I,,,, I,,,,, I,,,, I 0 G , CB-rote z (Loble value) FIG. 4. Error functions for critical bandwidth calculated with Eq. (9) (curve with marks) and Eq. (10) (curve without marks), as compared with Zwicker's ( 1961 ) table values (see also Fig. 1 ). B. Expressions for critical bandwidth Zwicker and Terhardt (1980) proposed the equation B ( f2) 0'69 (9) to calculate critical bandwidth B as a function of center fre- quencyf While Eq. (9) is very accurate, it cannot easily be integrated to obtain CB rate. The authors' equation for CB rate (5)'is not compatible with Eq. (9). Proceeding from Eq. (6), critical bandwidths B can be calculated as B = 52548/(z z ) (lo) Moore, B.C. J., and Glasberg, B. R. (1983). "Suggested formulae for calculating auditory-filter bandwidths and excitation patterns," J. Acoust. Soc. Am. 74, Schroeder, M. R. (1977). "Recognition of complex acoustic signals," in Life Sciences Research Report 5 (Dahlem Konferenzen), edited by T. H. Bullock (Abakon Verlag, Berlin), pp Syrdal, A. K., and Gopal, H. S. (1986). "A perceptual model of vowel recognition based on the auditory representation of American English vowels," J. Acoust. Soc. Am. 79, Tjomov, V. L. (1971). "A model to describe the results of psychoacoustical experiments on steady-state stimuli," in Analiz Rechevykh $ignalov Chelovekom, edited by G. V. Gershuni (Nauka, Leningrad), pp Traunmiiller, H. (1983). "On vowels: Perception of spectral features, related aspects of production and sociophonetic dimensions," Ph.D. thesis, University of Stockholm. Traunmiiller, H. (1988). "Paralinguistic variation and invariance in the characteristic frequencies of vowels," Phonetica 45, Zwicker, E. ( 1961). "Subdivision of the audible frequency range into critical bands (Frequenzgruppen)," J. Accoust. Soc. Am. 33, 248. Zwicker, E., Flottorp, G., and Stevens, S.S. (1957). "Critical bandwidth in loudness summation," J. Acoust. Soc. Am. 29, Zwicker, E., and Terhardt, E. (1980). "Analytical expressions for criticalband rate and critical bandwidth as a function of frequency," J. Acoust. Soc. Am. 68, J. Acoust. Soc. Am., Vol. 88, No. 1, July 1990 Hartmut Traunm(Jller: Tonotopic sensory scale 100

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure