Improving Speech Intelligibility in Fluctuating Background Interference

Size: px
Start display at page:

Download "Improving Speech Intelligibility in Fluctuating Background Interference"

Transcription

1 Improving Speech Intelligibility in Fluctuating Background Interference 1 by Laura A. D Aquila S.B., Massachusetts Institute of Technology (2015), Electrical Engineering and Computer Science, Mathematics Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology June 2016 Massachusetts Institute of Technology All rights reserved. Author: Department of Electrical Engineering and Computer Science May 20, 2016 Certified by: Dr. Charlotte M. Reed, Senior Research Scientist Research Laboratory of Electronics May 20, 2016 Certified by: Professor Louis D. Braida, Henry Ellis Warren Professor Electrical Engineering and Health Sciences and Technology May 20, 2016 Accepted by: Dr. Christopher J. Terman, Chairman, Masters of Engineering Thesis Committee

2 Improving Speech Intelligibility in Fluctuating Background Interference 2 by Laura A. D Aquila Submitted to the Department of Electrical Engineering and Computer Science on May 20, 2016, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science. ABSTRACT The masking release (MR; i.e., better speech recognition in fluctuating compared to continuous noise backgrounds) that is evident for normal-hearing (NH) listeners is generally reduced or absent in hearing-impaired (HI) listeners. In this study, a signal-processing technique was developed to improve MR in HI listeners and offer insight into the mechanisms influencing the size of MR. This technique compares short-term and long-term estimates of energy, increases the level of short-term segments whose energy is below the average energy, and normalizes the overall energy of the processed signal to be equivalent to that of the original long-term estimate. In consonant-identification tests, HI listeners achieved similar scores for processed and unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior performance was obtained for the processed speech in some of the fluctuating background noises. Thus, the energy-normalized signals led to larger values of MR compared to that obtained with unprocessed signals.

3 3 ACKNOWLEDGMENTS This research was supported by the National Institute on Deafness and Other Communication Disorders of the National Institutes of Health under Award Number R01 DC I would like to extend a big thank you to my advisors, Dr. Charlotte M. Reed and Professor Louis D. Braida. From when I first began doing research in their lab as a sophomore to now as I wrap up my M.Eng thesis, they have always made themselves available and offered much guidance, instruction, and support. Their analysis and ideas for moving forward were crucial to the success of this project. Their kindness made me look forward to coming into lab every day. I am also extremely grateful for the RA funding that they provided me with as I worked on the project. Additionally, I would like to heartily thank Dr. Joseph G. Desloge, the signal processing mastermind of the project. During the spring of my senior year, his help was critical as I coded the different components of this project. Despite having since taken a new job on the West Coast, he still kindly spoke with me weekly on the phone throughout the year to discuss my project and offer his very valuable insight, ideas, and feedback. I could not have asked for a better group of mentors than Dr. Reed, Professor Braida, and Dr. Desloge. As part of the Sensory Communication Group, the three of them performed much of the previous work that led to this project, and this project would also not have been possible without their continued involvement. I would lastly like to thank my family, who have provided me with countless opportunities throughout my life without which I would not be at where I am today. I am very grateful for the love and confidence that they have had in me throughout it all and for their shaping me into the person I am. It is comforting to know that I can always turn to them no matter what happens.

4 4 I. BACKGROUND Many hearing-impaired (HI) listeners with sensorineural hearing loss who are able to understand speech in quiet environments without much difficulty encounter more problems in noisy situations, such as in a cafeteria or at a social gathering. Indeed, it has been shown that these listeners require a higher speech-to-noise ratio (SNR) to achieve a given level of performance than do normal-hearing (NH) listeners (Festen and Plomp, 1990). This is the case regardless of whether the noise is temporally fluctuating, such as interfering voices in the background, or is steady-state, such as a motor. Festen and Plomp (1990) measured the SNR required for 50%-correct sentence reception in different types of background interference. Whereas HI listeners required a similar SNR regardless of the type of interfering noise, NH listeners performed better (i.e., required a lower SNR) in temporally fluctuating interference than in steady-state interference. Listeners who perform better with fluctuating interference are said to experience a release from masking. This release from masking occurs when listeners are able to perceive audible glimpses of the target speech during dips in the fluctuating noise (Cooke, 2006) and it aids in the ability to converse normally in the noisy social situations mentioned above. One possible explanation of reduced release from masking in HI listeners is based on the effects of reduced audibility in HI listeners, who are less likely to be able to receive the target speech in the noise gaps (Desloge et al., 2010). Léger et al. (2015) looked at release from masking in greater depth, particularly with respect to consonant recognition with different types of speech processing. The processing allowed for the examination of the roles played by the signal s slowly varying component, known as its envelope (ENV), and rapidly varying component, known as its temporal fine structure (TFS), on release from masking. The consonant

5 5 speech stimuli were processed using the Hilbert Transform to convey ENV cues, TFS cues, or ENV cues recovered from TFS speech. Consonant identification was measured in the presence of steady-state and 10-Hz square-wave interrupted speech-shaped noise. The percent-correct scores were used to calculate masking release (MR) in percentage points, defined as the difference in scores in interrupted noise and in continuous noise at a given SNR. The results showed that HI listeners generally experienced MR for TFS and recovered-env speech but not for unprocessed or ENV speech. The study concluded that the increase in MR may be related to the way the TFS processing interacts with the interrupted noise signal, rather than to the presence of TFS itself. Under certain circumstances, the removal of amplitude-envelope variation in TFS speech may amplify the higher SNR glimpses of the speech signal during gaps in a fluctuating noise. Reed et al. (2016) further investigated the conclusions of Léger et al regarding the role of reduced amplitude variation in MR. The study tested an infinite peak-clipped (IPC) speech condition, which used the sign of each sample point of the input signal to convert positive terms to +1, convert negative terms to -1, and leave zero terms unchanged. This processing thus also removed much of the amplitude variation. Speech intelligibility in noise and MR were compared for TFS, IPC, and unprocessed speech for HI listeners. Outcomes for TFS and IPC speech were very similar, leading to the conclusion that the removal of amplitude variation can indeed lead to MR. Because both the TFS and IPC speech contained fine-structure cues, however, it was still possible that TFS was responsible for the observed MR. Another condition was created in which both TFS and amplitude-variation cues were eliminated by passing an ENV signal through the TFS processing stage. Greater MR was observed for this condition than for the original ENV speech, thus lending support to the hypothesis that reduced amplitude variation can lead to improved MR in HI listeners. This MR arose as the less-intense portions of the speech stimulus,

6 6 which occurred in the noise gaps, became more audible to HI listeners when the amplitude was normalized to remove variation. These studies proved promising in understanding a potential way to improve MR in HI listeners; however, the improvement in MR was mainly due to a decreased performance in continuous noise rather than an increased performance in fluctuating noise. To address these issues, Desloge et al. (2016) developed a signal-processing technique designed to achieve similar reductions in signal amplitude variation without suffering a loss in intelligibility in continuous background noise. Using non-real-time processing over the broadband signal, the technique compared short-term and long-term estimates of energy, increased the level of short-term segments whose energy was below the average energy, and normalized the overall energy of the processed signal to be equivalent to that of the original long-term estimate. In consonant-identification tests, HI listeners achieved similar scores for processed and unprocessed stimuli in quiet and in continuous-noise backgrounds, while superior performance was obtained for the processed speech in fluctuating background noises. Thus, the energy-normalized signals led to larger values of MR compared to that obtained with unprocessed signals. The work described in this paper builds upon Desloge et al. by implementing and evaluating a real-time and multi-band version of the signal processing algorithm in a broader range of noises. II. GOALS This study investigates a novel signal processing technique, called energy equalization (EEQ), for the reduction of amplitude variation, which Reed et al. (2016) had concluded could contribute to MR in HI listeners. EEQ processing normalizes the fluctuating short-term signal energy to be equal to the long-term average signal energy. This technique is thus another way of

7 7 removing the rapid amplitude variation that occurs in speech. The goal is for this signal processing to improve the performance of HI listeners in fluctuating background noise without leading to a drop in performance in continuous background noise. This change in performance would thus result in greater MR for HI listeners. Energy equalization is applicable in the area of hearing aid and cochlear implant processing, and it could potentially also be used to benefit NH listeners and even machine listening systems that use automatic speech recognition. Other potential applications of EEQ processing include cell-phone or teleconferencing systems where an individual is speaking in a noisy environment and in speech recognition in interfering backgrounds. Thus, wherever speech reception is needed in noise, energy equalization could be used. The short-term signal energy for speech varies at a syllabic rate as intervals fluctuate between being more intense (usually during vowels), less intense (usually during consonants), and silent. Meanwhile, the long-term signal energy remains relatively constant and reflects the overall loudness at which a speaker is talking. These overall properties of speech persist even when background noise is added to the signal. The quiet portions of the speech signal are the most troublesome ones to HI listeners and lead to reduced speech comprehension. Energy equalization is a way for combatting this difficulty by amplifying the quieter parts of the signal (that may be present during gaps in the background noise) relative to the louder parts of the signal (that occur when background noise is fully present). This technique makes speech content present during the dips in background noise more audible and hence useful for speech comprehension.

8 8 III. SIGNAL PROCESSING ALGORITHM The EEQ processing seeks to reduce short-term amplitude fluctuations of a speech-plusnoise (S+N) stimulus while operating blindly and without introducing excessive distortion. The following is a general description of the steps the EEQ processing performs in real-time on a S+N signal x(t): Form running short-term and long-term moving averages of the signal energy, Eshort(t) and Elong(t): Eshort(t) = AVGshort[x 2 (t)] and Elong(t) = AVGlong[x 2 (t)], where AVG is a moving-average operator that utilizes specified short and long time constants to provide an estimate of the signal s energy. In this implementation, the AVG operators are single-pole infinite impulse response (IIR) low pass filters applied to the instantaneous signal energy, x 2 (t), with time constants of 5 ms and 200 ms for the short and long averages, respectively. The magnitude and phase of the square root of the ratio of the frequency response of AVGlong to the frequency response of AVGshort are shown in Figure 1, which is useful in understanding the scale factor computed in the next step of the processing. Determine the scale factor, SC(t): SC(t) = E long (t) / E short (t), where attention is made to avoid dividing by zero during quiet intervals. To prevent over-amplification of the noise floor, SC(t) had an upper limit of 20 db.

9 9 To prevent attenuation of stronger signal components, SC(t) had a lower limit of 0 db. Apply the scale factor to the original signal: y(t) = SC(t)x(t). Form the output z(t) by normalizing y(t) to have the same energy as x(t): z(t) = K(t)y(t), where K(t) is chosen such that AVGlong[z 2 (t)] = AVGlong[x 2 (t)]. The processing described above can be applied either to a broadband signal or independently to bandpass filtered components. The current implementation operated on both the broadband signal (EEQ1) and a signal divided into four contiguous frequency bands (EEQ4). These conditions are described in more detail in Section IV-D. Figure 2 depicts block diagrams of the EEQ1 (Figure 2A) and EEQ4 (Figure 2B) algorithms. The EEQ algorithm that was implemented follows the outline of the steps described above to process x[n], a sampled version of the original signal x(t). The original signal is first multiplied by SC[n], as shown in Figure 2A, and the resulting EEQ signal is then multiplied by K[n] to ensure that the long-term energy of the EEQ signal is equal to the long-term energy of the original signal at every sample point. SC[n] is restricted to lie in the range of 0 db to 20 db. Appendix I describes a modification to the computation of the scale factor that could be used without this lower limit in place. IV. METHODS The experimental protocol for testing human subjects was approved by the internal review board of the Massachusetts Institute of Technology. All testing was conducted in

10 10 compliance with regulations and ethical guidelines on experimentation with human subjects. All listeners provided informed consent and were paid for their participation in the experiments. A. Participants Six male and three female HI listeners with bilateral, symmetric, mild-to-severe sensorineural hearing loss participated in the experiment. They were all native speakers of American English and ranged in age from 20 to 69 years with an average age of 36.7 years. Six of the listeners were younger (33 years or less) and three were older (58-69 years). Five of the listeners had sloping high-frequency losses (HI-1, HI-2, HI-4, HI-5, and HI-7), three had relatively flat losses (HI-6, HI-8, and HI-9), and one had a cookie-bite loss (HI-3). Seven of the listeners (all but HI-1 and HI-3) were regular users of bilateral hearing aids. The fivefrequency (0.25, 0.5, 1, 2, and 4 khz) audiometric pure-tone average (PTA) ranged from 27 db HL to 75 db HL across listeners with an average of 45.3 db HL. The test ear, age, and five frequency PTA for each HI listener are listed in Table 1 along with the speech levels and SNRs employed in the experiment. The pure-tone thresholds of the HI listeners in db SPL are shown in Figure 3. The pure-tone threshold measurements were obtained with Sennheiser HD580 headphones for 500 ms stimuli in a three-alternative forced-choice adaptive procedure which estimates the threshold level required for 70.7%-correct detection (see Léger et al., 2015). Four NH listeners (defined as having pure-tone thresholds of 15 db HL or better in the octave frequencies between 250 and 8000 Hz) also participated in the study. They were native speakers of American English, included three males and one female, and ranged in age from 19 to 54 years, with an average age of 30.0 years. A test ear was selected for each listener (2 left ear

11 11 and 2 right ear). The mean adaptive thresholds across test ears of the NH listeners are provided in the first panel of Figure 3. B. Speech Stimuli The speech materials were Vowel-Consonant-Vowel (VCV) stimuli, with C=/p t k b d g f s ʃ v z dʒ m n r l/ and V=/a/, taken from the corpus of Shannon et al. (1999). The set used for testing consisted of 64 VCV tokens (one utterance of each of the 16 disyllables by two male and two female speakers). The mean VCV duration was 945 ms with a range of 688 to 1339 ms across the 64 VCVs in the test set. The recordings were digitized with 16-bit precision at a sampling rate of 32 khz and filtered to a bandwidth of Hz for presentation. C. Interference Conditions Noises from two broad categories of maskers were added to the speech stimuli prior to processing for presentation. Four background interference conditions were derived from speechshaped noise but did not come from actual speech samples. Three additional background interference conditions, referred to as vocoded modulated noises, were derived from actual speech samples. The RMS level of each of the noises except for the baseline condition was adjusted to be equal to that of the continuous noise, whose level was set as described in Section IV-F. The maskers used in the study are shown in Figure 4 and are summarized below. Maskers derived from randomly-generated speech-shaped noise (spectrogram shown in Figure 5) but not coming from actual speech samples. This paper refers to these as non-speech-derived noises: o Baseline Noise (BAS): Continuous speech-shaped noise at 30 db SPL. o Continuous Noise (CON): Additional continuous noise added to BAS.

12 12 o Square-Wave Interrupted Noise (SQW): 10-Hz square-wave interruption with 50% duty cycle added to BAS. o Sinusoidal Amplitude Modulation Noise (SAM): 10-Hz sinusoidal amplitude modulation noise added to BAS. Maskers derived from actual speech samples (referred to as vocoded modulated noise). These maskers were designed to exhibit fluctuations realistic of speech without the informational masking component. This paper refers to these as speech-derived noises: o 1-Speaker Vocoded Modulated Noise (VC-1) o 2-Speaker Vocoded Modulated Noise (VC-2) o 4-Speaker Vocoded Modulated Noise (VC-4) Appendix II describes the steps used to generate the vocoded modulated noises. D. Speech Conditions Listeners were presented with S+N signals with three different kinds of processing applied: Unprocessed Condition (UNP): The S+N signals were presented as described above with no further processing beyond per-subject NAL-RP (Dillon, 2001) amplification. 1-band Energy Equalized Condition (EEQ1): EEQ processing was applied to the broadband S+N signal over the range of Hz. As described in Section III, the EEQ processing compared short-term and long-term estimates of S+N signal energy, increased the level of short-term segments whose energy was below the average signal energy, and normalized the overall energy of the processed signal to be equivalent to that of the original long-term estimate (see Figure 2A).

13 13 4-band Energy Equalized Condition (EEQ4): The same technique as in the EEQ1 condition was applied independently to 4 logarithmically-equal bands of the S+N signal in the range of Hz. In doing so, order 6 (36 db/octave) passband filters divided the input signals into bands with frequency ranges of Hz, Hz, Hz, and Hz, respectively, and the EEQ1 processing was applied independently to each band prior to reconstructing the signal by summing across bands (see Figure 2B). E. Speech and Noise Signals Figure 6 shows the waveform of one the VCV tokens used in the experiment, APA, for UNP speech in BAS noise. The vowel components, which have more energy than the consonant component, constitute the two higher-energy sections of the speech that surround the weaker consonant component in the center. These sections of the speech are annotated at the top of the figure. Figure 7 shows the waveforms of this token in the different speech and noise conditions (Figure 7A for the UNP condition, Figure 7B for the EEQ1 condition, and Figure 7C for the recombined EEQ4 condition). In every type of interference except for BAS, the SNR is set to -4 db. The left panels show the S+N waveforms, and the right panels show the distribution of the amplitude of the S+N signal in db. These amplitude distributions were generated by sampling points of the S+N signal and, based on their amplitudes in db, placing them into buckets of length 1 db in the range of -10 db to 85 db. The RMS value of the signal in db is shown by the blue vertical line, and the median amplitude is shown by the green vertical line. The gaps of the noise in the plots of the S+N waveforms make evident the reduction in short-term amplitude fluctuations by the EEQ processing. For example, a comparison of the S+N waveforms in SQW between UNP and either EEQ1 or EEQ4 shows that the lower-energy speech

14 14 components that are present during the gaps in the fluctuating interference are greater in energy in the EEQ processed signals. The reduction in amplitude is also seen in the amplitude distributions in the right panels. The low-energy tails of the amplitude distributions in the UNP condition are reduced or absent in the EEQ1 and EEQ4 conditions. As a result, the median amplitudes (given by the green vertical lines) in the EEQ1 and EEQ4 conditions are shifted to the right, despite the RMS values (given by the blue vertical lines) remaining constant between UNP and EEQ (as a result of the final normalization step in the EEQ processing that sets the long-term energy of the output equal to the long-term energy of the input at every sample point). These effects are analyzed in more detail in Section VI-A of the paper. Figure 8 shows the EEQ4 waveforms and amplitude distributions in the CON (Figure 8A) and SQW (Figure 8B) conditions on a band-by-band basis. As in Figure 7, an SNR of -4 db is used. The boundaries of the four bands, which are logarithmically spaced, are Hz (Band 1), Hz (Band 2), Hz (Band 3), and Hz (Band 4). Band 2 has the largest RMS value, followed by, in decreasing order, Band 3, Band 4, and Band 1. The EEQ1 processing was applied independently in each band. F. Test Procedure Experiments were controlled by a desktop PC using Matlab TM software. The digitized speech-plus-noise stimuli were played through a 24-bit PCI sound card (E-MU 0404 by Creative Professional) and then passed through a programmable attenuator (Tucker-Davis PA4) and a headphone buffer (Tucker-Davis HB6) before being presented monaurally to the listener in a soundproof booth via a pair of headphones (Sennheiser HD580). A monitor, keyboard, and mouse located within the soundproof booth allowed the listener to interact with the control PC.

15 15 Consonant identification was tested using a one-interval, 16-alternative, forced-choice procedure without correct-answer feedback. On each 64-trial run, one of the 64 tokens from the test set was selected randomly without replacement. Depending on the noise condition, a randomly selected noise segment equal in duration to that of the speech token was scaled to achieve the desired SNR and then added to the speech token. The resulting stimulus was either presented unprocessed (for the UNP conditions) or processed according to EEQ1 or EEQ4 before being presented to the listener for identification. The listener s task was to identify the medial consonant of the VCV token that had been presented by selecting a response (using a computer mouse) from a 4x4 visual array of orthographic representations associated with the consonant stimuli. No time limit was imposed on the listeners responses. Each run typically lasted 3-5 minutes depending on the listener s response times. Chance performance was 6.25%- correct. Experiment 1. NH listeners were tested using a speech level of 60 db SPL. The SNR was set to -10 db (selected to yield roughly 50%-correct performance for UNP speech in CON noise) for all noise conditions (except for BAS). For the HI listeners, a linear-gain amplification was applied to the speech-plus-noise stimuli using the NAL-RP formula (Dillon, 2001). Each HI listener selected a comfortable speech level when listening to UNP speech in the BAS condition. For these listeners, the SNR was selected to yield roughly 50%-correct performance for UNP speech in CON noise. The speech levels and SNRs for each HI listener are listed in Table 1. The noise levels in db are the differences between the speech levels and the SNRs. The three speech conditions were tested in the order of UNP first, followed by EEQ1 and EEQ4 in a random order. The seven noise conditions were tested in order of BAS first, followed by a randomized order of the remaining six noises (CON, SQW, SAM, VC-1, VC-2, and VC-4).

16 16 Five 64-trial runs were presented for each of the 21 conditions (3 speech types x 7 noises). The first run was considered as practice and discarded. The final four test runs were used to calculate the percent-correct score in each condition. Experiment 2. Four of the HI listeners (HI-2, HI-4, HI-5, and HI-7) were tested at two additional values of SNR after completing Experiment 1. As shown in Table 2, one SNR was 4 db lower than that employed in Experiment 1 and the other was 4 db higher. This testing was conducted with UNP and EEQ1 speech in six types of noise: CON, SQW, SAM, VC-1, VC-2, and VC-4. The test order for UNP and EEQ1 speech was selected randomly for each listener. For each speech type, the two additional values of SNR were presented in random order. Within each SNR, the test order of the six types of noises was selected at random. Five 64-trial runs were presented at each condition using the tokens from the test set. The first run was discarded as practice and the final four runs were used to calculate the percent-correct score on each of the 24 additional conditions (2 speech types x 6 noises x 2 SNRs). Other than the SNR, all other experimental parameters remained the same as in Experiment 1. G. Data Analysis For each condition, percent-correct scores were averaged over the final 4 runs (consisting of 4*64=256 trials). Analysis of Variance (ANOVA) tests were performed on rationalized arcsine units (RAU; Studebaker, 1985) scores to examine the effects of speech type and noise condition on these percent-correct scores. MR in percentage points was calculated as the difference between scores in fluctuating noise and in continuous noise: MR = Score in Fluctuating Noise - Score in Continuous Noise.

17 17 Additionally, as was done by Léger et al., a normalized measure of masking release (NMR) was calculated as the quotient of MR and the difference between scores in quiet and in continuous noise: NMR = Score in Fluctuating Noise - Score in Continuous Noise Score in Quiet - Score in Continuous Noise. NMR thus represents the fraction of baseline performance lost in continuous noise that can be recovered in interrupted noise. Listeners who perform just as well in fluctuating noise as in quiet have an NMR of 1, and listeners who do not perform any better in fluctuating noise than in continuous noise have an NMR of 0. The metric is useful for comparing performance among HI listeners whose scores in quiet are different. By using baseline performance as a reference, NMR emphasizes the differences in performance with interrupted and continuous noise as opposed to the differences due to factors such as the severity of the hearing loss of the listener or the distorting effects of the processing on the speech itself. The MR and NMR calculations in SQW and SAM noises used CON noise as the continuous noise, and the MR and NMR calculations in VC-1 and VC-2 noises used VC-4 noise as the continuous noise. These NMR formulas are listed here: NMR SQW = NMR SAM = NMR VC 1 = NMR VC 2 = SQW Score - CON Score BAS Score - CON Score SAM Score - CON Score BAS Score - CON Score VC-1 Score - VC-4 Score BAS Score - VC-4 Score VC-2 Score - VC-4 Score BAS Score - VC-4 Score

18 18 V. RESULTS A. Experiment 1 The scores from Experiment 1 are reported in Appendix III-A and Appendix III-B and are summarized in Figure 9, Figure 10, and Figure 11. Appendix III-A provides the scores for each NH listener in each of the seven noise conditions for UNP, EEQ1, and EEQ4 speech, and Appendix III-B provides this same information for each HI listener. In Figure 9, the scores are plotted to highlight the differences in the average scores of the NH and HI listeners across conditions. In Figure 10 and Figure 11, the scores are plotted to highlight the differences of speech types within each noise for the NH and HI listeners (Figure 10 for the average NH results and the average HI results and Figure 11 for the average NH results and the individual HI results). First, consider average NH and HI performance, as shown in Figure 9. As expected, the performance for both groups was greatest in the BAS condition. Performance was lowest in CON (and was approximately 50%-correct by design of the experiment) and VC-4 (which was derived from samples of enough speakers to behave similarly to continuous noise). Performance was intermediate for the remaining noises. Other than in CON noise, scores were greater for NH than for HI listeners across noise conditions for all three speech types. The differences between the two groups were relatively small in the BAS condition (where the average differences between NH and HI listeners were 5.3% in UNP, 7.8% in EEQ1, and 12.2% in EEQ4), showing that the two groups diverge the most in fluctuating noise conditions where NH listeners were able to listen in the gaps, unlike HI listeners. In fact, across the five noises other than BAS and CON, NH scores were on average 17.9, 15.9, and 17.1 percentage points higher than HI scores for the UNP, EEQ1, and EEQ4 conditions, respectively. HI listeners exhibited slightly more

19 19 variability in their results than did NH listeners: the mean standard deviations across listeners (computed as the average of the standard deviations in each of the seven noises 1 ) in percentage points were 3.59 for UNP, 3.23 for EEQ1, and 4.38 in EEQ4 for NH listeners and 4.67 in UNP, 4.86 in EEQ1, and 4.59 in EEQ4 for HI listeners. Next, consider NH and HI performance across the different speech types, as is shown in Figures 10 and 11. Both figures show the mean scores for the NH listeners. Figure 10 shows the mean scores for the HI listeners, whereas Figure 11 shows the scores for the individual HI listeners. Note that the data depicted here are the same as that shown in Figure 9 and are replotted to highlight differences in speech types within a given noise. In general, both NH and HI listeners scored best in UNP followed by EEQ1 followed by EEQ4. Averaged across the different listeners and noise types, the NH scores were 78.7% in UNP, 75.5% in EEQ1, and 73.4% in EEQ4, and the HI scores were 65.3% in UNP, 63.0% in EEQ1, and 59.1% in EEQ4. By noise type, the scores of NH listeners generally followed the pattern of CON = VC-4 < VC-2 < VC-1 < SAM < SQW < BAS, and those of HI listeners generally followed the pattern of VC-4 < CON = VC-2 < VC-1 = SAM < SQW < BAS. Averaged across the different listeners and speech types, the NH scores were 98.3% in BAS, 52.1% in CON, 92.4% in SQW, 86.2% in SAM, 81.1% in VC-1, 68.6% in VC-2, and 52.4% in VC-4, and the HI scores were 89.8% in BAS, 51.5% in CON, 72.1% in SQW, 64.5% in SAM, 61.7% in VC-1, 52.1% in VC-2, and 45.7% in VC-4. EEQ1 processing was effective in improving the scores of HI and NH listeners in SQW noise: the average NH listener and eight of the nine individual HI listeners (all but HI-3) 1 Note that here, the standard deviation for a given noise and processing condition is calculated as n i=1 σ i 2 2, where σ n i is the variance of the four recorded runs on listener i and n is the number of listeners.

20 20 scored higher with EEQ1 than with UNP in SQW noise. EEQ1 processing also yielded improved performance for SAM noise in six of the nine HI listeners (all but HI-3, HI-7, and HI-9). For EEQ4 processing, no improvements over UNP were seen in SQW noise for NH listeners; however, all but one HI listener (HI-3) showed an improvement. For EEQ4 in SAM noise, there was no evidence for improvements over UNP for either NH or HI listeners. For all remaining noise conditions, for both HI and NH, scores were highest with UNP and lowest with EEQ4, with EEQ1 in between. The results obtained on each individual NH and HI listener were analyzed using a twoway ANOVA with main factors of speech type and noise condition. The ANOVAs were conducted at the significance level of 0.01 on the RAU of the 84 percent-correct scores obtained on each listener (3 speech x 7 noise x 4 repetitions) and are reported in Table 3. All but one of the NH listeners (NH-1) and all of the HI listeners had a significant effect of speech type, and all of the NH and HI listeners had a significant effect of noise type. One of the NH listeners (NH-2) and all but three of the HI listeners (HI-2, HI-3, and HI-7) had a significant speech by noise interaction. Post-hoc Tukey-Kramer comparisons at the significance level of 0.05 were conducted for cases of significant main factor effects, and the results are listed in Table 4. By speech type, most listeners had UNP = EEQ1 > EEQ4 (NH-2, HI-2, HI-4, HI-8, HI-9) or UNP > EEQ1 = EEQ4 (NH-3, NH-4, HI-1, HI-5, HI-7). The exceptions were HI-3, who had UNP > EEQ1 > EEQ4, and HI-6, who had EEQ1 > UNP > EEQ4. By noise type, BAS, SQW, SAM, and VC-1 were greater than VC-2, VC-4, and CON. Most listeners had BAS > SQW > SAM > VC-1 (NH-2, NH-3, NH- 4, HI-1, and HI-3) or BAS > SQW > SAM = VC-1 (NH-1, HI-2, HI-4, HI-5, HI-6, HI-8, and HI- 9). The exception was HI-7, who had BAS > SQW = SAM = VC-1. All NH listeners had VC-2 >

21 21 VC-4 = CON, and the order of VC-2, VC-4, and CON in HI listeners varied with each listener, with five of the nine HI listeners (HI-2, HI-4, HI-5, HI-7, and HI-8) having no significant differences among the three conditions. As discussed in the preceding paragraph, the significant speech by noise interaction present in many of the HI listeners is largely due to improved performance with EEQ1 processing relative to UNP in the SQW and SAM conditions but not in the other noises. The NMR data calculated from the scores of Experiment 1 are reported in Appendix III-C and Appendix III-D and are summarized in Figure 12. Appendix III-C provides the NMR for each NH listener in the SQW, SAM, VC-1, and VC-2 noise conditions for UNP, EEQ1, and EEQ4 speech, and Appendix III-D provides this same information for each HI listener. In Figure 12, the NMR results are plotted for the average NH listener and the individual HI listeners to highlight the differences of speech types within each noise. As shown in Figure 12, for the HI listeners, NMR was generally similar in EEQ1 and EEQ4 speech in the various noises and was greater in EEQ1 and EEQ4 than in UNP speech. Averaged over the HI listeners and the noise types, these NMR values were in UNP, in EEQ1, and in EEQ4. NMR for HI listeners by noise type was generally greatest in SQW interference, smallest in VC-2 interference, and between the two and equivalent in SAM and VC-1 interference. As such, NMR was generally greater in the non-speech derived noises than in the speech-derived noises. Averaged over the HI listeners and speech types, these NMR values were in SQW, in SAM, in VC-1, and in VC-2. EEQ processing yielded the largest improvement in NMR for HI listeners in the SQW conditions. This improvement decreased in the SAM condition and disappeared in the VC-1 and VC-2 conditions. Averaged across HI listeners, NMR values for UNP, EEQ1, and EEQ4, respectively, were 0.320, 0.639,

22 22 and in SQW; 0.227, 0.400, and in SAM; 0.391, 0.376, and in VC-1; and 0.126, 0.125, and in VC-2. NH listeners generally achieved greater NMR than did the HI listeners with little effect of speech type. Averaged across speech type for NH listeners, NMR decreases in the order of SQW, SAM, VC-1, and VC-2. Averaged across NH listeners, NMR for UNP, EEQ1, and EEQ4, respectively, were 0.861, 0.907, and in SQW; 0.792, 0.735, and in SAM; 0.673, 0.600, and in VC-1; and 0.356, 0.351, and in VC-2. Both within and across listeners, HI listeners exhibited greater variability in their results than did NH listeners. B. Experiment 2 The scores of Experiment 2 are reported in Appendix IV-A and are summarized in Figure 13. Appendix IV-A provides the scores for each HI listener in the non-bas noise conditions for UNP and EEQ1 speech at each of the three SNRs that were tested. Figure 13A plots the results in non-speech derived noises (except BAS) as a function of SNR and fits sigmoidal functions to the data, and Figure 13B does the same for the speech-derived noises. The sigmoidal fits to the psychometric functions in Figure 13 assumed a lower bound corresponding to chance performance on the consonant-identification task (6.25%-correct) and an upper bound corresponding to a given listener s score on the BAS condition for UNP or EEQ. The fitting process found the slope and midpoint values of a logistic function that minimized the error between the fit and the data points. The results of the fits are summarized in Table 5 in terms of their midpoints (SNR in db yielding a 50%-correct score) and slopes around the midpoint (in percentage points per db). For the CON noise conditions, the midpoints and slopes were similar for UNP and EEQ1 signals for each of the HI listeners. In CON, averaged across listeners, midpoints were -3.9 db and -2.8 db for UNP and EEQ1, respectively, and slopes were 5.2

23 23 percent per db and 4.3 percent per db, respectively. In the two non-speech derived fluctuating background noises, the midpoints were lower for EEQ1 than for UNP for each of the HI listeners. Averaged across listeners and for UNP and EEQ1, respectively, midpoints were db and db in SQW (a difference of 84.9 db) and -7.3 db and db in SAM (a difference of 8.6 db). 2 For the speech-derived noise conditions, the midpoints and slopes were similar for UNP and EEQ1 signals for each of the HI listeners. Averaged across listeners and for UNP and EEQ1, respectively, midpoints were -9.0 db and db in VC-1 (a difference of 2.0 db), -5.8 db and -4.5 db in VC-2 (a difference of -1.3 db), and -3.4 db and -2.4 db in VC-4 (a difference of -1.0 db). Slopes were similar for both types of processing, where they were ordered as SQW < SAM < CON for the non-speech derived noises and VC-1 < VC-2 < VC-4 for the speech-derived noises. In Figure 14, MR in percentage points is plotted as a function of SNR for SQW, SAM, VC-1, and VC-2. Here, MR was computed as the difference in sigmoid fits between fluctuating and continuous noises. Note that this metric differs from NMR in that it is not normalized by the difference between scores in quiet and in continuous noise (i.e., MR is the numerator in the NMR quotient). Similarly to what was done in the NMR calculations, MR was computed with CON as the continuous noise when SQW and SAM were the fluctuating noises and with VC-4 as the continuous noise when VC-1 and VC-2 were the fluctuating noises. In SQW interference, these plots indicate greater MR with EEQ1 than with UNP for all subjects across the SNRs. For SAM interference, MR with EEQ1 generally exceeded that with UNP, although the increase was generally smaller than in SQW interference. The trend was similar with VC-1 interference, 2 It should be noted that the midpoint of HI-5 ( db) was highly deviant relative to the remaining 3 HI listeners (whose midpoints ranged from to db). The SQW midpoint average falls to db if HI-5 is eliminated, leading to a difference of 17.6 db with UNP.

24 24 although the increase in MR with EEQ1 was smaller than that observed for SAM interference. With VC-2, there was no clear trend showing greater MR for either EEQ1 or UNP. These observations were generally consistent with the NMR findings discussed in the next paragraph. NMR was calculated from the scores of Experiment 2 and is reported in Appendix IV-B and summarized in Figure 15. Appendix IV-B provides the calculated NMR data for each HI listener in each of the seven noise conditions for UNP and EEQ1 speech at each of the three SNRs that were tested. In Figure 15A, NMR for EEQ1 is plotted as a function of NMR for UNP for the individual HI listeners in SQW and SAM noise at the various SNRs, and in Figure 15B, this same information is plotted for VC-1 and VC-2 noise. In Figure 15A, every NMR data point lies above the 45-degree reference line, showing a strong tendency in HI listeners for larger NMR with EEQ1 processing in non-speech derived noises at all SNRs tested. Additionally, NMR was greater with SQW interference than with SAM interference. In SQW noise, NMR averaged across subjects at the low, mid, and high SNRs was 0.431, 0.314, and , respectively, for UNP and 0.765, 0.657, and 0.564, respectively, for EEQ1. These same numbers in SAM noise were 0.284, 0.210, and 0.136, respectively, for UNP and 0.505, 0.501, and 0.467, respectively, for EEQ1. As shown in Figure 15B, there was less of a difference in NMR for UNP and EEQ1 for the speech-derived noises than seen in Figure 15A for the non-speech derived noises. However, more data points were above the reference line with VC-1 than with VC-2. Additionally, NMR with both UNP and EEQ1 was greater with VC-1 interference than with VC-2 interference. In VC-1 noise, NMR averaged across subjects for UNP was at the low SNR, at the mid SNR, and at the high SNR. These numbers for EEQ1 were at the low SNR, at

25 the mid SNR, and at the high SNR. These same numbers in VC-2 noise were 0.223, 0.117, and 0.019, respectively, for UNP and 0.177, 0.119, and 0.247, respectively, for EEQ1. 25 VI. DISCUSSION This section discusses the results of the experiments in greater detail and analyzes potential explanations for the outcomes. Section VI-A begins by examining the effects that the EEQ processing has on the amplitude variability of the waveforms. In Section VI-B, the EEQ effect on NMR is explored. Models are introduced in Section VI-C that attempt to predict the performance benefit gained with the EEQ processing. In Section VI-D, EEQ1 is compared to EEQ4 in an attempt to understand differences in performance. Finally, in Section VI-E, different types of background interference are subjected to a glimpse analysis to explain the different effects of the EEQ processing with the speech-derived versus non-speech-derived noises. A. Effect of EEQ on Signal Amplitude The waveform and amplitude distribution plots of the various S+N signals in Figures 7A, 7B, and 7C are now examined in more detail to assess how EEQ achieves its goal of equalizing the energy of an S+N signal. The amplitude distribution plots depict RMS values with blue vertical lines and amplitude medians with green vertical lines. Median amplitudes were plotted because of their resilience to outliers as compared to the means. As shown in the figures, the RMS values are constant between UNP and EEQ1 and between UNP and EEQ4 within each type of interference. This is because the final step of the EEQ processing normalizes the output signal at every sample point to have a long-term energy equal to that of the input signal. Note also that the RMS value, which is determined by the levels of the speech and the noise, is equal in all types of interference except BAS. This is because, in the figure, the SNR is -4 db in all non-bas

26 26 conditions. However, despite the RMS values being the same within a type of interference, the median amplitudes are not the same. The median amplitude is greater with EEQ1 and EEQ4 than with UNP. For the VCV token depicted in the figure, the differences in median amplitudes in db between EEQ1 and UNP are 2.10 for BAS, 0.42 for CON, 1.78 for SQW, 2.05 for SAM, 1.36 for VC-1, 1.11 for VC-2, and 0.80 for VC-4. Note that except in CON and VC-4, these values are smaller than the differences in mean amplitudes between EEQ1 and UNP, which are 4.98 for BAS, 0.39 for CON, 4.25 for SQW, 2.82 for SAM, 2.97 for VC-1, 1.79 for VC-2, and 0.65 for VC-4. The fact that the differences in mean amplitudes are greater than the differences in median amplitudes highlights the fact that the UNP histograms contain tails of low-energy components that are not present with EEQ. The rightwards shift of the amplitude distribution with the EEQ processing occurs because the lower energy speech components which are present during the gaps in the noise are amplified with the processing. The movement of the tail of the amplitude distribution towards the center of the histogram corresponds to the reduction in amplitude variation in the processed speech. The waveform and amplitude distributions of EEQ1 and EEQ4 look approximately the same when examining the broadband signals. In db, the absolute values of the differences in mean amplitudes between EEQ1 and EEQ4 are 0.21 for BAS, 0.27 for CON, 0.32 for SQW, 0.33 for SAM, 0.18 for VC-1, 0.73 for VC-2, and 0.57 for VC-4. Further discussion of the differences between EEQ1 and EEQ4 is found in Section VI-D below. B. Effect of EEQ on NMR It was stated that the goals of this research are to increase NMR in HI listeners by increasing performance in fluctuating interference while maintaining performance in baseline and continuous noise conditions. For HI listeners, EEQ1 processing yielded improved performance in SQW and SAM noises (average scores increased by 7.2 and 1.6 percentage

27 27 points, respectively) but not for the speech-derived noises. For HI listeners, EEQ1 processing resulted on average in 2.4 and 6.3 percentage point reductions in performance for BAS and CON noises, respectively. As such, for HI listeners, NMR was greater in SQW and SAM noises with EEQ1 compared to UNP. This was brought about both by an increase in performance in fluctuating noise and a greater decrease in performance in CON noise than in BAS. For HI listeners with EEQ4 processing, NMR also increased, but compared to UNP there was a bigger performance drop in all noise conditions except SQW, which had a slight performance increase. Meanwhile, for NH listeners, EEQ1 processing yielded a slight improvement in performance in SQW noise (average score increased by 1.4 percentage points) but not in the remaining noises. The overall effect on NMR was minimal both in EEQ1 and EEQ4. The benefits of EEQ processing for HI listeners in SQW interference are evident through the NMR results, which are shown in Figure 12 for Experiment 1 and in Figure 15 for Experiment 2. Figure 16 re-plots the results from Figure 12 to show NMR as a function of the 5- frequency PTA hearing loss of each of the nine HI listeners. In the figure, NMR decreases as a function of PTA with UNP speech, which demonstrates the increasing effects of reduced audibility in the SQW noise gaps with severity of hearing loss. However, with EEQ1 and EEQ4 processing, NMR is much more constant relative to PTA, which highlights the benefits provided to HI listeners by making the speech component of the signal more audible in the SQW noise gaps. Additionally, as shown in Figure 15, the increase in NMR with EEQ1 relative to UNP for SQW and SAM holds at various SNRs: with UNP in these types of interference, NMR becomes close to zero or even negative at the high SNRs, whereas with EEQ1, NMR is always positive.

28 28 C. Modelling Psychometric Functions Two analyses were performed to explore the percent-correct performance shown in Figure 13 for each speech type and noise as a function of SNR and to attempt to account for the changes in performance, especially the performance boost in SQW noise. 1) Local Changes in SNR The first analysis investigated whether the performance improvement in fluctuating noises with EEQ processing can be explained solely by changes to the SNR. Specifically, for low-to-moderate SNRs and fluctuating noise, EEQ tends to amplify the higher-snr stimulus segments present in the gaps when noise energy is low relative to the lower-snr stimulus segments when the noise energy is high. By doing this, EEQ changes the effective SNR of the stimulus, and so it is possible that the observed increase in NMR, which depends on the observer, might be explained simply by an increase in SNR, which is independent of the observer. The first analysis addressed this question by estimating the change in SNR as a result of EEQ processing and looking at scores as a function of this changed SNR. Although EEQ processing is nonlinear, the scale factor is applied linearly to the speech and to the noise. Thus, it is possible to determine its effect on the speech and noise components of the signal separately for a particular stimulus at a particular input SNR, thus allowing computation of the post-processing SNR for that input. The output SNR for a particular input sample (consisting of specific speech and noise samples s(t) and n(t) and a known input SNR, SNRUNP) may be calculated as follows: (1) Compute the EEQ scale factor SC(t) applied to an input of x(t) = s(t) + n(t). (2) The EEQ output signal is given as y(t) = x(t) * SC(t) = ys(t) + yn(t), where: ys(t) = s(t) * SC(t) and

29 29 yn(t) = n(t) * SC(t). (3) The post-processed SNR (in db) for this combination of s(t), n(t), and SNRUNP is SNR EEQ = 10log 10 ( y s2 (t) / y n2 (t) ), where y 2 s (t) and y 2 n (t) are the mean values of y 2 s (t) and y 2 n (t), respectively. Each of the 64 speech tokens used in the experiments was examined with six noise types (CON, SQW, SAM, VC-1, VC-2, and VC-4) and values of SNRUNP (ranging from -40 to +40 db). For every combination of speech token, noise type, and SNRUNP, 10 noise samples n(t) of length equal to s(t) were randomly generated. The above procedure was used to calculate SNREEQ1 as a function of SNRUNP and noise type averaged across each of 10 noise samples combined with each of the 64 speech tokens. These averages were used to formulate a pre-topost-processing SNR mapping function SNREEQ1 = F(SNRUNP, noise type), shown in Figure 17, where a diagonal reference line is included to show the case of SNRUNP = SNREEQ1. When SNRUNP is negative, EEQ1 processing provides an SNR boost by raising the level of the signal present in the dips in the noise. Interestingly, when SNRUNP is positive, EEQ1 processing actually lowers the SNR because fluctuations in the signal, as opposed to the noise, drive the equalization. The CON, SQW, SAM, VC-1, VC-2, and VC-4 curves cross the reference line at SNRUNP values of -5.8, 4.8, 3.1, 2.5, -0.6, -2.3 db, respectively. Using the pre-to-post-processing SNR mapping function, the psychometric functions in Figure 13 were replotted. Scores for UNP were plotted versus SNRUNP and scores for EEQ1 were plotted versus SNREEQ1. These plots can be seen in Figure 18A for the non-speech-derived noises and in Figure 18B for the speech-derived noises. Had the performance boost with EEQ1 processing been able to be explained solely by the change in SNR, the score for UNP and EEQ1 in a given noise type should be the same at a given SNR. For the non-speech-derived noises, this

30 30 prediction fits well for the data of HI-2 and HI-7, especially in the SQW and SAM conditions, and for HI-5 in the SAM condition. For the speech-derived noises, this prediction fits well for the data of HI-2 and HI-7 in the VC-1, VC-2, and VC-4 conditions, for HI-4 in the VC-1 condition, and for HI-5 in the VC-2 condition. Other than these cases, the modelling of performance based on the SNR mapping function is less effective. It should be noted that the local SNR analysis was computed using an entire VCV token, which is dominated by the vowel components in both duration and level. It is assumed that for the consonant portion alone, the SNREEQ1 vs SNRUNP curves cross the diagonal reference line at more positive SNRs than are shown in Figure 17 for the whole VCV token. This is because the low-energy consonant component is often the beneficiary of the short-term amplification provided by the EEQ processing algorithm. As such, the EEQ processing does not have a negative impact on local consonant SNR until more positive SNRs, at which point the speech is dominant enough that a slight decrease in SNR would not hurt performance. 2) Crest Factor The second analysis explored whether the performance improvements with the EEQ processing can be explained by the changes in amplitude variation of an S+N signal. The crest factor, defined as the peak amplitude of a waveform x divided by its RMS value, gives a sense of the amplitude variation of the signal. In db, the crest factor is given as: Crest Factor = 20log 10 ( x peak x rms ). Because EEQ processing reduces amplitude variation, it is expected that the processing would reduce the crest factor as the maximum value of the signal moves closer to the RMS value of the signal. In a manner similar to what was done for the SNR analysis above, each of the 64 speech tokens used in the experiments was examined with various noise types (CON, SQW,

31 31 SAM, VC-1, VC-2, and VC-4), processing conditions (UNP and EEQ1), and values of SNRUNP (ranging from -40 to +40 db). For every combination of speech token, processing, and SNRUNP, 10 noise samples n(t) of length equal to s(t) were randomly generated. The average maximum value and the average RMS value across these 10 S+N samples were then recorded. Using these two average values, an average crest factor in db was calculated for each of the 64 test syllables at each noise type, processing type, and value of SNRUNP. Finally, the 64 crest factors (in db) calculated for each condition were averaged to formulate a function of Crest Factor = F(SNRUNP, noise type, processing type). This function is shown in Figure 19A for the non-speech-derived noises and in Figure 19B for the speech-derived noises, where it can be seen that the EEQ1 crestfactor curves lie below the corresponding UNP curves of the same noise type. This effect is consistent with the reduced amplitude variability (and therefore decrease in the ratio of its maximum value to its RMS value) in the EEQ1 processed signal. Note that the crest factor for speech in the speech-derived noises is more variable than that in the non-speech-derived noises, as shown by the jagged curves across the SNRs in Figure 19B as compared to Figure 19A. This behavior comes from the greater variability in the speech-derived noises in general. In a manner similar to what was done with the SNR analysis described above, the psychometric functions in Figure 13 were plotted on a crest-factor scale. Scores for UNP were plotted versus the pre-processing crest factor and scores for EEQ1 were plotted versus the postprocessing crest factor. These plots can be seen in Figure 20. The percent-correct curves still do not lie on top of each other, indicating that crest factor by itself also cannot be used to explain the performance benefits with the EEQ1 processing. In fact, because pure noise has a lower crest factor than pure speech (as seen by the crest factor curves for CON being lower at negative SNRs than at positive SNRs), one would expect processing whose performance benefits can be

32 32 explained solely by crest factor changes to result in signals that have higher crest factors than UNP signals. However, as stated above, EEQ1 processing lowers the crest factor and thus cannot be used to explain the psychometric data. This analysis was also performed by using different percentiles of signal amplitude in the numerator of the crest factor formula (for example, by using the 95 th or 99 th percentile rather than the maximum value), but this mapping did not fit the data well either. D. EEQ1 vs EEQ4 Processing EEQ1 proved to be more effective than EEQ4 processing for HI listeners; with the average HI data, the mean difference between EEQ1 and EEQ4 scores across the seven noises was 4.0 percentage points. It had been hypothesized that processing frequency bands independently could provide benefit particularly with speech-derived noises for HI listeners with non-uniform losses. However, by applying different scale factors to different frequency bands, such independent-band processing may have interfered with the spectral shape, resulting in decreased effectiveness. To see if this might be the case, outputs of the three processing schemes were examined in each of the four bands used for EEQ4. Figure 21 compares RMS values and median amplitudes for UNP, EEQ1, and EEQ4 within each of the four bands used in the EEQ4 processing as a function of SNR. As seen in Figures 21A, 21B, and 21C, the RMS values for the three different kinds of processing do not differ much on a band-by-band basis. This is because the EEQ processing normalizes the RMS mean of the output signal to be equal to that of the input signal. However, an obvious difference can be seen between the median values of the UNP and both EEQ processing schemes, as shown in Figures 21D, 21E, and 21F. For UNP, the median amplitudes have a generally linear decrease with an increase in SNR, whereas the slopes of the EEQ1 and EEQ4 functions level off at around

33 33 0 db. This is consistent with the EEQ processing amplifying the low energy speech components. The shapes of these functions are generally similar for EEQ1 and EEQ4. However, at the lower SNRs, bands 1 and 4, and to a lesser extent bands 2 and 3, show greater overlap with EEQ4 than with EEQ1. At the higher SNRs, bands 1 and 4 and bands 2 and 3 show greater separation for EEQ4 compared to EEQ1. It is possible that these differences in spectral shape led to the overall 4.0 percentage point decrease in performance with EEQ4 relative to EEQ1. It is possible that other metrics besides RMS values and median amplitudes might reveal a larger difference in spectral shape between the two processing schemes. It is also possible that the additional processing involved in the multi-band scheme introduced additional distortions to the signal, which led to the observed decreases in performance with EEQ4 compared to EEQ1. E. Glimpse Analysis of Vocoded and Non-Vocoded Noises The EEQ processing scheme proved to be more effective, both in terms of improving scores and NMR, with the non-speech-derived noises compared to the speech-derived noises. An analysis was conducted on the differences in occurrences of noise glimpses between these two categories of noises to explore why this may be the case. This analysis was similar to one done by Cooke (2006), who looked at glimpse percentages and counts for a number of competing background speakers. Cooke s analysis defined a glimpse to be a connected region of the spectrotemporal excitation pattern in which the energy of the speech token exceeded that of the background by at least 3 db in each time-frequency element. Unlike Cooke s analysis, the current analysis looked at noise alone and examined its envelope as opposed to its spectrotemporal pattern. The analysis used here defines a noise glimpse to be a section of the noise where the envelope drops more than 3 db below the RMS value of the noise for at least 10 ms. Gaps present at the immediate start or end of the noise were not counted because these

34 34 durations might be truncated from their actual duration. The threshold of 3 db below the RMS value was chosen to prevent steady-state noise, which has many small fluctuations in the vicinity of its RMS value, from being classified as having a significant portion of the signal spent in a glimpse. The minimum duration of 10 ms was chosen based on a study by He et al. (1999), which showed that the threshold for detecting a gap in a longer duration noise (400 ms) is roughly 5 ms, independent of the location of the gap within the noise or whether the gap location is randomly selected on each trial. Therefore, the analysis described in this paper chose a minimum duration of 10 ms that was twice as long as the threshold where gaps can be perfectly discriminated. Figure 22 depicts the waveforms and envelopes of VC-1 (Figure 22A), VC-2 (Figure 22B), and VC-4 (Figure 22C) noises. The envelope was computed by passing the absolute value of the signal s Hilbert transform through 16 logarithmically spaced low-pass filters in the range of 80 Hz to 8020 Hz with cutoffs of 64 Hz. The red lines represent the RMS values, and the green lines represent 3 db below the RMS values. An interval is considered to be a noise gap if the envelope (shown in blue) drops below the green threshold line for at least 10 ms. As shown in the figures, as the number of speakers increases in the speech-derived noises, the envelope hovers closer to the RMS value. The analysis considered six of the noises used in the experiment (eliminating only the BAS noise). Additionally, it considered speech-vocoded modulated noises derived from more than 4 speaker samples; VC-8, VC-16, VC-32, VC-64, VC-128, VC-256, and VC-512 were also examined. Five hundred samples of each of the noise types were generated to have a duration equal to an arbitrarily chosen VCV token of 1.29 seconds. Note that these additional noise types were generated from multiple samples of the same eight speakers as were used to generate VC-1,

35 35 VC-2, and VC-4. Half of these samples came from combinations of female speakers and half from combinations of male speakers. For each noise sample, the occurrences of the glimpses using the above definition were determined. This information was then used to calculate the percentage that the glimpses constitute of the overall noise duration, the number of glimpses per second, and the average length of the glimpses. For the first two quantities, the averages over the 500 noise samples generated are shown in Figures 23 and 24: Figure 23 depicts the percentage of glimpses information, and Figure 24 depicts the measured glimpses-per-second information. Cooke s paper contains plots of these same quantities which are similar in shape to the results obtained with this study s slightly different metric of a noise glimpse. Figure 25 shows a histogram of the final quantity, the average glimpse duration in each of the 500 noise samples. As shown in Figure 23, the average fraction of time spent in a glimpse decreased as the number of speakers in the vocoded modulated noise increased. As the number of speakers increased, this fraction approached zero, the value in CON noise. SQW and SAM had values of and 0.419, respectively. Note that had the opening and closing gaps been counted and had the RMS value been used as a threshold instead of 3 db below the RMS value, these numbers would have been closer to 0.5 by nature of the symmetry of the noises. VC-1, VC-2, and VC-4, the three speech-derived noises used in the experiment, had fractions of 0.423, 0.336, and 0.257, respectively. VC-1 was therefore very similar to SQW and SAM in terms of fraction of the time spent in a gap, whereas VC-4 had more gaps than CON using the current metric. The variability of this measure was greater for the speech-derived noises than the non-speech-derived noises and decreased as the number of speakers in the speech-derived noises increased. Standard deviations within the 500 samples were for VC-1, for VC-2, and for VC-4, whereas these values were for CON, for SQW, and for SAM.

36 36 As shown in Figure 24, the average number of glimpses per second increased from 1 speaker to 2 speakers and then decreased from there on as the number of speakers in the vocoded noises increased. This quantity approached zero, the value in CON noise. SQW and SAM had glimpse per second values of 9.20 and 9.22, respectively, and VC-1, VC-2, and VC-4 had glimpse per second values of 2.57, 3.46, and 3.43, respectively. Thus, all of the speech-derived noises had fewer number of glimpses per second than did the non-speech-derived noises. The variability in the number of glimpses per second was greater for the speech-derived noises than for the non-speech-derived noises, and the variability first increased and then decreased as the number of speakers in the speech-derived noises increased. VC-1, VC-2, and VC-4 had standard deviations of 1.14, 1.33, and 1.34, respectively, between the 500 samples, whereas these values were 0.000, 0.318, and for CON, SQW, and SAM, respectively. To generate Figure 25, the 500 average glimpse durations (corresponding to the average glimpse duration in each of the 500 noise samples generated for a given noise) were placed into buckets of length 10 ms. As shown in the figure, the average glimpse duration between samples of the same type of speech-derived noise varies quite a bit, especially for the ones which were tested in the experiments (VC-1, VC-2, and VC-4). The histograms for these noises span many buckets. Meanwhile, for the non-speech-derived noises (CON, SQW, and SAM), there is very little variability in the average glimpse duration between noise samples. The histograms for these noises occupy a single bucket: for CON, the bin from 0 to 0.01 seconds and for SQW and SAM, the bin from 0.04 to 0.05 seconds. Almost every single sample for VC-1, VC-2, and VC-4 falls into a bucket of greater duration than that for SQW and SAM. Figures 23, 24, and 25 offer insight into why the EEQ processing performed better with the non-speech-derived noises than with the speech-derived noises. Although VC-1, SQW, and

37 37 SAM have similar amounts of total time spent in glimpses, these times are distributed over a greater number of glimpses in SQW and SAM. With VC-2 and VC-4, there are both fewer total times spent in glimpses and total number of glimpses than with SQW and SAM. Additionally, the variability is much greater in the speech-derived noises than the non-speech-derived noises. The EEQ processing performs best with short, frequent glimpses, as this gives it the best opportunity to amplify speech exposed during the gaps in the noise by normalizing with the ratio of the long and short term energies. With VC-1, there are fewer, longer glimpses. Therefore, the listener only gets a few samples of the speech stimulus rather than little bits throughout. During the longer glimpses, the running long-term average would be reduced, leading to smaller changes in gain in these sections. With fewer and longer glimpses (and therefore fewer and longer nonglimpses as well), it is also possible that the entirety of the low-energy consonant portion of the speech stimuli would be covered by noise. Thus, the EEQ processing may have less of an opportunity to operate effectively on the portion of the speech where HI listeners require the most amplification and could instead end up amplifying noise during these parts. Finally, the predictability of the non-speech-derived noises (as evidenced by the low standard deviation in percentage glimpses and number of glimpses) may make it easier for HI listeners to benefit from EEQ processing with those noises. With the speech-derived noises, the standard deviations are high, and each sample of noise is quite different from the others. This unpredictable pattern makes it harder for HI listeners to benefit from EEQ processing in the speech-derived noises. To examine the role of glimpsing on performance in the different types of noise, Figure 26 plots the mean NH and HI scores for UNP and EEQ1 as a function of the average fraction of the noise spent in a glimpse. For both speech types and groups of listeners, scores increased with an increase in the fraction of glimpses once this measure exceeded approximately Below

38 38 this fraction, scores were roughly constant at the level observed in CON. For both NH and HI listeners, the UNP curve lies above the EEQ1 curve for the smaller fractions of glimpses. However, as the fraction of glimpses increases, the difference between the curves gets smaller and even reverses at the highest fractions of glimpses. These trends are consistent with the concept that the EEQ processing is most effective when there are many glimpses available throughout the duration of the noise signal. Another explanation for why EEQ processing results in less of an NMR improvement in speech-derived noises for HI listeners lies in the fact that many HI listeners begin with a greater NMR in VC-1 and VC-2 compared to SQW and SAM. As shown by Figure 12, the HI listeners with the most severe hearing losses (HI-6, HI-7, HI-8, and HI-9) have almost no NMR in the UNP condition in SQW but do have a non-zero NMR in VC-1. In fact, in the UNP condition, the NMR is much more constant in VC-1 interference as the listener s PTA increases than is the case in SQW. For UNP, this implies that in VC-1 noise, HI listeners were able to recover more of the performance that was lost in VC-4 noise than they were in SQW with their performance lost in CON. Thus, there is less room for NMR improvement with EEQ processing in the speechderived noises, and it is less surprising that there is not as much of an increase compared to the non-speech-derived noises. VII. CONCLUSIONS EEQ processing was effective in improving NMR for HI listeners in SQW and SAM interference. The EEQ effect on NMR was less apparent in VC-1 and VC-2 interference. These observations held across various SNRs.

39 39 NMR improvements for EEQ resulted primarily from increased performance in fluctuating noise, especially in SQW interference, although there was also a smaller decrease in performance in BAS and CON for EEQ. EEQ processing is more effective with regular and frequent gaps in the fluctuating noises, as is seen in SQW and SAM. VC-1 and VC-2 have gaps that are more variable in length and therefore limit the effectiveness of the EEQ processing in using the short and long window to normalize energy. EEQ1 processing was more effective than EEQ4 processing. EEQ4 may have interfered with the spectral shape, resulting in decreased effectiveness. NMR decreased with increasing hearing loss for UNP speech but was roughly independent of degree of loss for EEQ speech. This resulted in a large increase in NMR for HI listeners with the most severe hearing losses. Although EEQ processing increases the local SNR and decreases the amplitude variation of a noisy speech signal, neither of these effects provided a complete explanation of behavioral performance with EEQ signals over a wide range of SNR. VIII. FUTURE WORK This study arose out of the desire to study and understand the factors that influence NMR in HI listeners and to explore a signal processing technique to improve NMR. Future work relates to these long-term goals. The work reported here evaluated the effectiveness of the EEQ processing scheme in a consonant identification task. Future studies will explore how the EEQ processing scheme fares in other speech types, specifically vowels and sentences. A model to predict the differences in performance exhibited by HI listeners with UNP and EEQ processing

40 40 in the different types of background noise, building on the SNR and crest factor analyses described in this thesis, will be further investigated. Also, the EEQ processing scheme will continue to be analyzed for the potential for improvements in a broader range of fluctuating noises, most notably in noises with irregular gap lengths. Additionally, the factors causing NMR to be greater in the speech-derived noises than in the non-speech derived noises with UNP will be investigated. Ways to restrict the EEQ4 processing from resulting in as much spectral alteration will also be explored, potentially by restricting the scale factor applied to adjacent bands to be within a fixed range of each other. Additional signal processing techniques will also be examined for the improvement of NMR in HI listeners. These techniques will perhaps make use of what was learned from the EEQ processing, and they could together lead to an increased understanding of the mechanisms contributing to masking release in both NH and HI listeners.

41 41 References Cooke, M. (2006). A glimpsing model of speech perception in noise, J. Acoust. Soc. Am. 119, Desloge, J. G., Reed, C. M., Braida, L. D., Perez, Z. D., and D Aquila, L. A. (2016). Technique to improve speech intelligibility in fluctuating background noise by normalizing signal energy over time. Manuscript in preparation. Desloge, J. G., Reed, C. M., Braida, L. D. Perez, Z. D., and Delhorne, L. A. (2010). Speech reception by listeners with real and simulated hearing impairment: Effects of continuous and interrupted noise, J. Acoust. Soc. Am. 128, Dillon, H. (2001). Hearing Aids (Thieme, New York), pp Festen, J. M., and Plomp, R. (1990). "Effects of fluctuating noise and interfering speech on the speech reception threshold for impaired and normal hearing,'' J. Acoust. Soc. Am. 88, He, N., Horwitz, A. R., Dubno, J. R., and Mills, J. H. (1999). Psychometric functions for gap detection in noise measured from young and aged subjects, J. Acoust. Soc. Am. 106, Léger, A. C., Reed, C. M., Desloge, J. G., Swaminathan, J., and Braida, L. D. (2015). Consonant identification in noise using Hilbert-transform temporal fine-structure speech and recovered-envelope speech for listeners with normal and impaired hearing, J. Acoust. Soc. Am. 136, Phatak, S, and Grant, K. W. (2014). Phoneme recognition in vocoded maskers by normalhearing and aided hearing-impaired listeners, J. Acoust. Soc. Am. 136,

42 42 Reed, C. M., Desloge, J. G., Braida, L. D., Léger, A. C., and Perez, Z. D. (2016). Level variations in speech: Effect on masking release in hearing-impaired listeners. Under review for J. Acoust. Soc. Am. Shannon, R. V., Jensvold, A., Padilla, M., Robert, M. E., and Wang, X. (1999). "Consonant recordings for speech testing," J. Acoust. Soc. Am. 106, L Studebaker, G. A. (1985). A Rationalized Arcsine Transform, J. Speech Lang. Hear. Res. 28,

43 43 Figure 1: The magnitude and phase of the square root of the ratio of the frequency response of AVGlong to the frequency response of AVGshort. AVGshort and AVGlong are the moving average operators used by the EEQ processing in the computation of the running short- and long-term energies, respectively, of the signal. They are single-pole IIR low pass filters applied to the instantaneous signal energy with time constants of 5 ms and 200 ms for the short and long averages, respectively.

44 44 Figure 2: Block diagrams of the EEQ processing algorithm used in this implementation. Figure 2A outlines the steps of the EEQ1 processing. Eshort[n] and Elong[n] are computed with single pole IIR filters applied to the instantaneous signal energy with time constants of 5 ms and 200 ms, respectively, and the scale factor SC[n] is restricted to lie in the range of 0 db to 20 db. Figure 2B shows the EEQ1 processing applied independently in each of the four frequency bands to yield EEQ4. Figure 2A: Figure 2B:

45 45 Table 1: Test ear, age, and 5-frequency pure-tone average (PTA) for each HI listener. The final two columns provide the comfortable speech presentation levels chosen by each listener with NAL amplification and the SNR used in testing all speech conditions. The SNR was chosen to yield 50%-correct in continuous noise. Listener Test Ear Age 5-Frequency PTA (db HL) Speech Level (db SPL) SNR (db) HI-1 R HI-2 R HI-3 L HI-4 L HI-5 L HI-6 L HI-7 L HI-8 R HI-9 L

46 46 Figure 3: Pure-tone detection thresholds in db SPL measured for 500 ms tones in a threealternative forced-choice adaptive procedure. A green line representing the average thresholds of the test ears of the NH listeners is shown in the upper left box, and the thresholds for the HI listeners are shown in the remaining boxes. For the HI listeners, thresholds are shown for the right ear (red circles) and left ear (blue x s), with the points of the test ear connected using a solid line and the points of the non-test ear connected using a dashed line.

47 47 Figure 4: Waveforms of the 7 background interference conditions used in testing. To make it easier to see the effects of the modulation, the same underlying noise sample was used to generate all four non-speech derived noises in this figure.

48 48 Figure 5: The spectrogram of the randomly generated speech-shaped noise used to create the BAS, CON, SQW, and SAM interference conditions. The speech-shaped noise had a total duration of 30 seconds, and a random segment of the speech-shaped noise (of the desired interference duration) was chosen every time a sample of BAS, CON, SQW, or SAM was generated.

49 49 Figure 6: Waveform of the VCV token APA for UNP speech in the BAS noise condition. The high-energy vowel components and the low energy consonant component are annotated at the top of the figure.

50 50 Figure 7: Waveforms and amplitude distribution plots for the VCV token APA presented with the seven different kinds of background interference (BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4) with UNP (Figure 7A), EEQ1 (Figure 7B), and EEQ4 (Figure 7C) processing. The speech is presented at a level of 65 db SPL, and the noise (other than BAS) is presented at a level of 69 db SPL, leading to an SNR of -4 db. The blue line in the amplitude distribution plot represents the RMS value. The green line is the median of the amplitude distribution.

51 Figure 7A 51

52 Figure 7B 52

53 Figure 7C 53

54 54 Figure 8: Waveforms and amplitude distribution plots for the VCV token APA presented with CON (Figure 8A) and SQW (Figure 8B) interference with EEQ4 processing. Each of the four rows in the figure corresponds to a different logarithmically equal band in the range of 80 Hz to 8020 Hz. The speech is presented at a level of 65 db SPL, and the noise is presented at a level of 69 db SPL, leading to an SNR of -4 db. The blue line in the amplitude distribution plot represents the RMS value. The green line is the median of the amplitude distribution. Figure 8A:

55 Figure 8B: 55

56 56 Table 2: The SNRs employed in Experiment 2. The Mid SNR was equivalent to the one used in Experiment 1. The Low SNR was 4 db lower than the Mid SNR, and the High SNR was 4 db higher than the Mid SNR. Listener Low SNR Mid SNR High SNR HI HI HI HI

57 57 Figure 9: The mean percent correct scores achieved by the four NH (green bars) and nine HI listeners (gold bars) in Experiment 1. The scores were measured with UNP (top panel), EEQ1 (middle panel), and EEQ4 (bottom panel) processing in BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4 background interference conditions. The error bars associated with each bar are +/- 1 standard deviation.

58 58 Figure 10: The mean percent correct scores achieved by the four NH (upper panel) and nine HI listeners (lower panel) in Experiment 1. The scores were measured with UNP (purple bars), EEQ1 (orange bars), and EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC- 2, and VC-4 background interference conditions.

59 59 Figure 11: The mean percent correct scores achieved by the four NH listeners (upper left panel) and the individual percent correct scores achieved by the nine HI listeners (other nine panels) in Experiment 1. The scores were measured with UNP (purple bars), EEQ1 (orange bars), and EEQ4 (green bars) processing in BAS, CON, SQW, SAM, VC-1, VC-2, and VC-4 background interference conditions. The error bars associated with each bar are +/- 1 standard deviation.

60 60 Table 3: Analysis of Variance results conducted on the rationalized arcsine units for the percent correct scores of each of the four NH and nine HI listeners. The F-statistic (along with the degrees of freedom) and probability are shown for each listener by speech type, noise type, and speech by noise interaction. The probabilities below the 0.01 significance level are bolded and annotated with an asterisk. Speech Type Noise Type Speech x Noise F(2, 63) p F(6, 63) p F(12, 63) p NH < * NH < * < * * NH < * < * NH < * < * Speech Type Noise Type Speech x Noise F(2, 63) p F(6, 63) p F(12, 63) p HI * < * * HI * < * HI < * < * HI < * < * 5.22 < * HI < * < * * HI < * < * * HI * < * HI < * < * 4.16 < * HI * < * *

61 61 Table 4: Tukey-Kramer post-hoc multiple comparison results among the four NH and nine HI listeners using a significance level of The ordering is shown for each listener by speech type (1 for UNP, 2 for EEQ1, and 3 for EEQ4), noise type (1 for BAS, 2 for CON, 3 for SQW, 4 for SAM, 5 for VC-1, 6 for VC-2, and 7 for VC-4), and speech by noise interaction. When two conditions are not significantly different, they are listed in decreasing order of mean value observed. Note that there were some cases where conditions X and Y and conditions Y and Z were not significantly different, but conditions X and Z were significantly different. In these cases, Y was listed in the table as being not significantly different to whichever of X and Z it was closer in mean value to. Speech Type Noise Type NH-1 1 = 2 = 3 1 > 3 > 4 = 5 > 6 > 7 = 2 NH-2 1 = 2 > 3 1 > 3 > 4 > 5 > 6 > 7 = 2 NH-3 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2 NH-4 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 7 = 2 Speech Type Noise Type HI-1 1 > 2 = 3 1 > 3 > 4 > 5 > 6 > 2 = 7 HI-2 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 7 = 2 HI-3 1 > 2 > 3 1 > 3 > 4 > 5 > 2 > 6 > 7 HI-4 1 = 2 > 3 1 > 3 > 4 = 5 > 6 = 2 = 7 HI-5 1 > 2 = 3 1 > 3 > 4 = 5 > 6 = 2 = 7 HI-6 2 > 1 > 3 1 > 3 > 4 = 5 = 2 > 6 = 7 HI-7 1 > 2 = 3 1 > 3 = 4 = 5 > 6 = 2 = 7 HI-8 1 = 2 > 3 1 > 3 > 4 = 5 > 2 = 6 = 7 HI-9 1 = 2 > 3 1 > 3 > 4 = 2 = 5 > 6 = 7

62 62 Figure 12: The mean NMR achieved by the NH listeners (first group of bars) and the individual NMR for each of the HI listeners (remaining nine groups of bars) with UNP (purple bars), EEQ1 (orange bars), and EEQ4 (green bars) processing. The NMR for the SQW (upper left panel) and SAM (lower left panel) noises was calculated relative to the CON condition, whereas the NMR for the VC-1 (upper right panel) and VC-2 (lower right panel) noises was calculated relative to the VC-4 noise condition.

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 MODELING SPECTRAL AND TEMPORAL MASKING IN THE HUMAN AUDITORY SYSTEM PACS: 43.66.Ba, 43.66.Dc Dau, Torsten; Jepsen, Morten L.; Ewert,

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based

More information

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference

More information

Introduction to cochlear implants Philipos C. Loizou Figure Captions

Introduction to cochlear implants Philipos C. Loizou Figure Captions http://www.utdallas.edu/~loizou/cimplants/tutorial/ Introduction to cochlear implants Philipos C. Loizou Figure Captions Figure 1. The top panel shows the time waveform of a 30-msec segment of the vowel

More information

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920

2920 J. Acoust. Soc. Am. 102 (5), Pt. 1, November /97/102(5)/2920/5/$ Acoustical Society of America 2920 Detection and discrimination of frequency glides as a function of direction, duration, frequency span, and center frequency John P. Madden and Kevin M. Fire Department of Communication Sciences and Disorders,

More information

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Psycho-acoustics (Sound characteristics, Masking, and Loudness) Psycho-acoustics (Sound characteristics, Masking, and Loudness) Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University Mar. 20, 2008 Pure tones Mathematics of the pure

More information

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O.

Tone-in-noise detection: Observed discrepancies in spectral integration. Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Tone-in-noise detection: Observed discrepancies in spectral integration Nicolas Le Goff a) Technische Universiteit Eindhoven, P.O. Box 513, NL-5600 MB Eindhoven, The Netherlands Armin Kohlrausch b) and

More information

Effect of fast-acting compression on modulation detection interference for normal hearing and hearing impaired listeners

Effect of fast-acting compression on modulation detection interference for normal hearing and hearing impaired listeners Effect of fast-acting compression on modulation detection interference for normal hearing and hearing impaired listeners Yi Shen a and Jennifer J. Lentz Department of Speech and Hearing Sciences, Indiana

More information

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback PURPOSE This lab will introduce you to the laboratory equipment and the software that allows you to link your computer to the hardware.

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution AUDL GS08/GAV1 Signals, systems, acoustics and the ear Loudness & Temporal resolution Absolute thresholds & Loudness Name some ways these concepts are crucial to audiologists Sivian & White (1933) JASA

More information

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin Hearing and Deafness 2. Ear as a analyzer Chris Darwin Frequency: -Hz Sine Wave. Spectrum Amplitude against -..5 Time (s) Waveform Amplitude against time amp Hz Frequency: 5-Hz Sine Wave. Spectrum Amplitude

More information

6.551j/HST.714j Acoustics of Speech and Hearing: Exam 2

6.551j/HST.714j Acoustics of Speech and Hearing: Exam 2 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science, and The Harvard-MIT Division of Health Science and Technology 6.551J/HST.714J: Acoustics of Speech and Hearing

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz Rec. ITU-R F.240-7 1 RECOMMENDATION ITU-R F.240-7 *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz (Question ITU-R 143/9) (1953-1956-1959-1970-1974-1978-1986-1990-1992-2006)

More information

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)].

Results of Egan and Hake using a single sinusoidal masker [reprinted with permission from J. Acoust. Soc. Am. 22, 622 (1950)]. XVI. SIGNAL DETECTION BY HUMAN OBSERVERS Prof. J. A. Swets Prof. D. M. Green Linda E. Branneman P. D. Donahue Susan T. Sewall A. MASKING WITH TWO CONTINUOUS TONES One of the earliest studies in the modern

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Sampling and Reconstruction

Sampling and Reconstruction Experiment 10 Sampling and Reconstruction In this experiment we shall learn how an analog signal can be sampled in the time domain and then how the same samples can be used to reconstruct the original

More information

Technical University of Denmark

Technical University of Denmark Technical University of Denmark Masking 1 st semester project Ørsted DTU Acoustic Technology fall 2007 Group 6 Troels Schmidt Lindgreen 073081 Kristoffer Ahrens Dickow 071324 Reynir Hilmisson 060162 Instructor

More information

Distortion products and the perceived pitch of harmonic complex tones

Distortion products and the perceived pitch of harmonic complex tones Distortion products and the perceived pitch of harmonic complex tones D. Pressnitzer and R.D. Patterson Centre for the Neural Basis of Hearing, Dept. of Physiology, Downing street, Cambridge CB2 3EG, U.K.

More information

Intensity Discrimination and Binaural Interaction

Intensity Discrimination and Binaural Interaction Technical University of Denmark Intensity Discrimination and Binaural Interaction 2 nd semester project DTU Electrical Engineering Acoustic Technology Spring semester 2008 Group 5 Troels Schmidt Lindgreen

More information

Rec. ITU-R F RECOMMENDATION ITU-R F *,**

Rec. ITU-R F RECOMMENDATION ITU-R F *,** Rec. ITU-R F.240-6 1 RECOMMENDATION ITU-R F.240-6 *,** SIGNAL-TO-INTERFERENCE PROTECTION RATIOS FOR VARIOUS CLASSES OF EMISSION IN THE FIXED SERVICE BELOW ABOUT 30 MHz (Question 143/9) Rec. ITU-R F.240-6

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL 9th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, -7 SEPTEMBER 7 A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL PACS: PACS:. Pn Nicolas Le Goff ; Armin Kohlrausch ; Jeroen

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration

A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration A Pilot Study: Introduction of Time-domain Segment to Intensity-based Perception Model of High-frequency Vibration Nan Cao, Hikaru Nagano, Masashi Konyo, Shogo Okamoto 2 and Satoshi Tadokoro Graduate School

More information

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding. Volume 5, Issue 2, February 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Speech Enhancement

More information

Lecture 7 Frequency Modulation

Lecture 7 Frequency Modulation Lecture 7 Frequency Modulation Fundamentals of Digital Signal Processing Spring, 2012 Wei-Ta Chu 2012/3/15 1 Time-Frequency Spectrum We have seen that a wide range of interesting waveforms can be synthesized

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

3D Distortion Measurement (DIS)

3D Distortion Measurement (DIS) 3D Distortion Measurement (DIS) Module of the R&D SYSTEM S4 FEATURES Voltage and frequency sweep Steady-state measurement Single-tone or two-tone excitation signal DC-component, magnitude and phase of

More information

Part A: Question & Answers UNIT I AMPLITUDE MODULATION

Part A: Question & Answers UNIT I AMPLITUDE MODULATION PANDIAN SARASWATHI YADAV ENGINEERING COLLEGE DEPARTMENT OF ELECTRONICS & COMMUNICATON ENGG. Branch: ECE EC6402 COMMUNICATION THEORY Semester: IV Part A: Question & Answers UNIT I AMPLITUDE MODULATION 1.

More information

The role of intrinsic masker fluctuations on the spectral spread of masking

The role of intrinsic masker fluctuations on the spectral spread of masking The role of intrinsic masker fluctuations on the spectral spread of masking Steven van de Par Philips Research, Prof. Holstlaan 4, 5656 AA Eindhoven, The Netherlands, Steven.van.de.Par@philips.com, Armin

More information

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS

INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR PROPOSING A STANDARDISED TESTING ENVIRONMENT FOR BINAURAL SYSTEMS 20-21 September 2018, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2018) 20-21 September 2018, Bulgaria INVESTIGATING BINAURAL LOCALISATION ABILITIES FOR

More information

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION

CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION CHAPTER 6 INTRODUCTION TO SYSTEM IDENTIFICATION Broadly speaking, system identification is the art and science of using measurements obtained from a system to characterize the system. The characterization

More information

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope Modulating a sinusoid can also work this backwards! Temporal resolution AUDL 4007 carrier (fine structure) x modulator (envelope) = amplitudemodulated wave 1 2 Domain of temporal resolution Fine structure

More information

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009

ECMA TR/105. A Shaped Noise File Representative of Speech. 1 st Edition / December Reference number ECMA TR/12:2009 ECMA TR/105 1 st Edition / December 2012 A Shaped Noise File Representative of Speech Reference number ECMA TR/12:2009 Ecma International 2009 COPYRIGHT PROTECTED DOCUMENT Ecma International 2012 Contents

More information

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing

ESE531 Spring University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing University of Pennsylvania Department of Electrical and System Engineering Digital Signal Processing ESE531, Spring 2017 Final Project: Audio Equalization Wednesday, Apr. 5 Due: Tuesday, April 25th, 11:59pm

More information

INTERNATIONAL TELECOMMUNICATION UNION

INTERNATIONAL TELECOMMUNICATION UNION INTERNATIONAL TELECOMMUNICATION UNION ITU-T P.835 TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU (11/2003) SERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKS Methods

More information

10 Speech and Audio Signals

10 Speech and Audio Signals 0 Speech and Audio Signals Introduction Speech and audio signals are normally converted into PCM, which can be stored or transmitted as a PCM code, or compressed to reduce the number of bits used to code

More information

An unnatural test of a natural model of pitch perception: The tritone paradox and spectral dominance

An unnatural test of a natural model of pitch perception: The tritone paradox and spectral dominance An unnatural test of a natural model of pitch perception: The tritone paradox and spectral dominance Richard PARNCUTT, University of Graz Amos Ping TAN, Universal Music, Singapore Octave-complex tone (OCT)

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2

Review of Lecture 2. Data and Signals - Theoretical Concepts. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2. Review of Lecture 2 Data and Signals - Theoretical Concepts! What are the major functions of the network access layer? Reference: Chapter 3 - Stallings Chapter 3 - Forouzan Study Guide 3 1 2! What are the major functions

More information

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007 TEMPORAL ORDER DISCRIMINATION BY A BOTTLENOSE DOLPHIN IS NOT AFFECTED BY STIMULUS FREQUENCY SPECTRUM VARIATION. PACS: 43.80. Lb Zaslavski

More information

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS) AUDL GS08/GAV1 Auditory Perception Envelope and temporal fine structure (TFS) Envelope and TFS arise from a method of decomposing waveforms The classic decomposition of waveforms Spectral analysis... Decomposes

More information

Predicting the Intelligibility of Vocoded Speech

Predicting the Intelligibility of Vocoded Speech Predicting the Intelligibility of Vocoded Speech Fei Chen and Philipos C. Loizou Objectives: The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms

More information

A102 Signals and Systems for Hearing and Speech: Final exam answers

A102 Signals and Systems for Hearing and Speech: Final exam answers A12 Signals and Systems for Hearing and Speech: Final exam answers 1) Take two sinusoids of 4 khz, both with a phase of. One has a peak level of.8 Pa while the other has a peak level of. Pa. Draw the spectrum

More information

AUDITORY ILLUSIONS & LAB REPORT FORM

AUDITORY ILLUSIONS & LAB REPORT FORM 01/02 Illusions - 1 AUDITORY ILLUSIONS & LAB REPORT FORM NAME: DATE: PARTNER(S): The objective of this experiment is: To understand concepts such as beats, localization, masking, and musical effects. APPARATUS:

More information

Lecture 6. Angle Modulation and Demodulation

Lecture 6. Angle Modulation and Demodulation Lecture 6 and Demodulation Agenda Introduction to and Demodulation Frequency and Phase Modulation Angle Demodulation FM Applications Introduction The other two parameters (frequency and phase) of the carrier

More information

Keysight Technologies Pulsed Antenna Measurements Using PNA Network Analyzers

Keysight Technologies Pulsed Antenna Measurements Using PNA Network Analyzers Keysight Technologies Pulsed Antenna Measurements Using PNA Network Analyzers White Paper Abstract This paper presents advances in the instrumentation techniques that can be used for the measurement and

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

two computers. 2- Providing a channel between them for transmitting and receiving the signals through it.

two computers. 2- Providing a channel between them for transmitting and receiving the signals through it. 1. Introduction: Communication is the process of transmitting the messages that carrying information, where the two computers can be communicated with each other if the two conditions are available: 1-

More information

Pre- and Post Ringing Of Impulse Response

Pre- and Post Ringing Of Impulse Response Pre- and Post Ringing Of Impulse Response Source: http://zone.ni.com/reference/en-xx/help/373398b-01/svaconcepts/svtimemask/ Time (Temporal) Masking.Simultaneous masking describes the effect when the masked

More information

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54 A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February 2009 09:54 The main focus of hearing aid research and development has been on the use of hearing aids to improve

More information

DETERMINATION OF EQUAL-LOUDNESS RELATIONS AT HIGH FREQUENCIES

DETERMINATION OF EQUAL-LOUDNESS RELATIONS AT HIGH FREQUENCIES DETERMINATION OF EQUAL-LOUDNESS RELATIONS AT HIGH FREQUENCIES Rhona Hellman 1, Hisashi Takeshima 2, Yo^iti Suzuki 3, Kenji Ozawa 4, and Toshio Sone 5 1 Department of Psychology and Institute for Hearing,

More information

Jason Schickler Boston University Hearing Research Center, Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215

Jason Schickler Boston University Hearing Research Center, Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215 Spatial unmasking of nearby speech sources in a simulated anechoic environment Barbara G. Shinn-Cunningham a) Boston University Hearing Research Center, Departments of Cognitive and Neural Systems and

More information

Week 1. Signals & Systems for Speech & Hearing. Sound is a SIGNAL 3. You may find this course demanding! How to get through it:

Week 1. Signals & Systems for Speech & Hearing. Sound is a SIGNAL 3. You may find this course demanding! How to get through it: Signals & Systems for Speech & Hearing Week You may find this course demanding! How to get through it: Consult the Web site: www.phon.ucl.ac.uk/courses/spsci/sigsys (also accessible through Moodle) Essential

More information

Lecture 3: Data Transmission

Lecture 3: Data Transmission Lecture 3: Data Transmission 1 st semester 1439-2017 1 By: Elham Sunbu OUTLINE Data Transmission DATA RATE LIMITS Transmission Impairments Examples DATA TRANSMISSION The successful transmission of data

More information

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point. Terminology (1) Chapter 3 Data Transmission Transmitter Receiver Medium Guided medium e.g. twisted pair, optical fiber Unguided medium e.g. air, water, vacuum Spring 2012 03-1 Spring 2012 03-2 Terminology

More information

Digitally controlled Active Noise Reduction with integrated Speech Communication

Digitally controlled Active Noise Reduction with integrated Speech Communication Digitally controlled Active Noise Reduction with integrated Speech Communication Herman J.M. Steeneken and Jan Verhave TNO Human Factors, Soesterberg, The Netherlands herman@steeneken.com ABSTRACT Active

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Reprint from : Past, present and future of the Speech Transmission Index. ISBN

Reprint from : Past, present and future of the Speech Transmission Index. ISBN Reprint from : Past, present and future of the Speech Transmission Index. ISBN 90-76702-02-0 Basics of the STI measuring method Herman J.M. Steeneken and Tammo Houtgast PREFACE In the late sixties we were

More information

INTRODUCTION. Address and author to whom correspondence should be addressed. Electronic mail:

INTRODUCTION. Address and author to whom correspondence should be addressed. Electronic mail: Detection of time- and bandlimited increments and decrements in a random-level noise Michael G. Heinz Speech and Hearing Sciences Program, Division of Health Sciences and Technology, Massachusetts Institute

More information

Transmitter Identification Experimental Techniques and Results

Transmitter Identification Experimental Techniques and Results Transmitter Identification Experimental Techniques and Results Tsutomu SUGIYAMA, Masaaki SHIBUKI, Ken IWASAKI, and Takayuki HIRANO We delineated the transient response patterns of several different radio

More information

Notes on Noise Reduction

Notes on Noise Reduction Notes on Noise Reduction When setting out to make a measurement one often finds that the signal, the quantity we want to see, is masked by noise, which is anything that interferes with seeing the signal.

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Machine recognition of speech trained on data from New Jersey Labs

Machine recognition of speech trained on data from New Jersey Labs Machine recognition of speech trained on data from New Jersey Labs Frequency response (peak around 5 Hz) Impulse response (effective length around 200 ms) 41 RASTA filter 10 attenuation [db] 40 1 10 modulation

More information

Measuring the critical band for speech a)

Measuring the critical band for speech a) Measuring the critical band for speech a) Eric W. Healy b Department of Communication Sciences and Disorders, Arnold School of Public Health, University of South Carolina, Columbia, South Carolina 29208

More information

Interaction of Object Binding Cues in Binaural Masking Pattern Experiments

Interaction of Object Binding Cues in Binaural Masking Pattern Experiments Interaction of Object Binding Cues in Binaural Masking Pattern Experiments Jesko L.Verhey, Björn Lübken and Steven van de Par Abstract Object binding cues such as binaural and across-frequency modulation

More information

Lecture Fundamentals of Data and signals

Lecture Fundamentals of Data and signals IT-5301-3 Data Communications and Computer Networks Lecture 05-07 Fundamentals of Data and signals Lecture 05 - Roadmap Analog and Digital Data Analog Signals, Digital Signals Periodic and Aperiodic Signals

More information

Human Reconstruction of Digitized Graphical Signals

Human Reconstruction of Digitized Graphical Signals Proceedings of the International MultiConference of Engineers and Computer Scientists 8 Vol II IMECS 8, March -, 8, Hong Kong Human Reconstruction of Digitized Graphical s Coskun DIZMEN,, and Errol R.

More information

COM325 Computer Speech and Hearing

COM325 Computer Speech and Hearing COM325 Computer Speech and Hearing Part III : Theories and Models of Pitch Perception Dr. Guy Brown Room 145 Regent Court Department of Computer Science University of Sheffield Email: g.brown@dcs.shef.ac.uk

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY?

IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY? IS SII BETTER THAN STI AT RECOGNISING THE EFFECTS OF POOR TONAL BALANCE ON INTELLIGIBILITY? G. Leembruggen Acoustic Directions, Sydney Australia 1 INTRODUCTION 1.1 Motivation for the Work With over fifteen

More information

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking

A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking A cat's cocktail party: Psychophysical, neurophysiological, and computational studies of spatial release from masking Courtney C. Lane 1, Norbert Kopco 2, Bertrand Delgutte 1, Barbara G. Shinn- Cunningham

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Fundamentals of Digital Audio *

Fundamentals of Digital Audio * Digital Media The material in this handout is excerpted from Digital Media Curriculum Primer a work written by Dr. Yue-Ling Wong (ylwong@wfu.edu), Department of Computer Science and Department of Art,

More information

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Michael F. Toner, et. al.. Distortion Measurement. Copyright 2000 CRC Press LLC. < Michael F. Toner, et. al.. "Distortion Measurement." Copyright CRC Press LLC. . Distortion Measurement Michael F. Toner Nortel Networks Gordon W. Roberts McGill University 53.1

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading

ECE 476/ECE 501C/CS Wireless Communication Systems Winter Lecture 6: Fading ECE 476/ECE 501C/CS 513 - Wireless Communication Systems Winter 2004 Lecture 6: Fading Last lecture: Large scale propagation properties of wireless systems - slowly varying properties that depend primarily

More information

Channel Characteristics and Impairments

Channel Characteristics and Impairments ELEX 3525 : Data Communications 2013 Winter Session Channel Characteristics and Impairments is lecture describes some of the most common channel characteristics and impairments. A er this lecture you should

More information

UNIT 2. Q.1) Describe the functioning of standard signal generator. Ans. Electronic Measurements & Instrumentation

UNIT 2. Q.1) Describe the functioning of standard signal generator. Ans.   Electronic Measurements & Instrumentation UNIT 2 Q.1) Describe the functioning of standard signal generator Ans. STANDARD SIGNAL GENERATOR A standard signal generator produces known and controllable voltages. It is used as power source for the

More information

MODULATION THEORY AND SYSTEMS XI.

MODULATION THEORY AND SYSTEMS XI. XI. MODULATION THEORY AND SYSTEMS Prof. E. J. Baghdady J. M. Gutwein R. B. C. Martins Prof. J. B. Wiesner A. L. Helgesson C. Metzadour J. T. Boatwright, Jr. B. H. Hutchinson, Jr. D. D. Weiner A. ADDITIVE

More information

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters FIR Filter Design Chapter Intended Learning Outcomes: (i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters (ii) Ability to design linear-phase FIR filters according

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

EE 264 DSP Project Report

EE 264 DSP Project Report Stanford University Winter Quarter 2015 Vincent Deo EE 264 DSP Project Report Audio Compressor and De-Esser Design and Implementation on the DSP Shield Introduction Gain Manipulation - Compressors - Gates

More information

Proceedings of Meetings on Acoustics

Proceedings of Meetings on Acoustics Proceedings of Meetings on Acoustics Volume 19, 2013 http://acousticalsociety.org/ ICA 2013 Montreal Montreal, Canada 2-7 June 2013 Psychological and Physiological Acoustics Session 1pPPb: Psychoacoustics

More information

Course 2: Channels 1 1

Course 2: Channels 1 1 Course 2: Channels 1 1 "You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly

More information

Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants

Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants Kalyan S. Kasturi and Philipos C. Loizou Dept. of Electrical Engineering The University

More information

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals. XIV. SPEECH COMMUNICATION Prof. M. Halle G. W. Hughes J. M. Heinz Prof. K. N. Stevens Jane B. Arnold C. I. Malme Dr. T. T. Sandel P. T. Brady F. Poza C. G. Bell O. Fujimura G. Rosen A. AUTOMATIC RESOLUTION

More information

Improving Loudspeaker Signal Handling Capability

Improving Loudspeaker Signal Handling Capability Design Note 04 (formerly Application Note 104) Improving Loudspeaker Signal Handling Capability The circuits within this application note feature THAT4301 Analog Engine to provide the essential elements

More information

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015 Final Exam Study Guide: 15-322 Introduction to Computer Music Course Staff April 24, 2015 This document is intended to help you identify and master the main concepts of 15-322, which is also what we intend

More information

Data Communications & Computer Networks

Data Communications & Computer Networks Data Communications & Computer Networks Chapter 3 Data Transmission Fall 2008 Agenda Terminology and basic concepts Analog and Digital Data Transmission Transmission impairments Channel capacity Home Exercises

More information

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution Acoustics, signals & systems for audiology Week 9 Basic Psychoacoustic Phenomena: Temporal resolution Modulating a sinusoid carrier at 1 khz (fine structure) x modulator at 100 Hz (envelope) = amplitudemodulated

More information

Noise and Distortion in Microwave System

Noise and Distortion in Microwave System Noise and Distortion in Microwave System Prof. Tzong-Lin Wu EMC Laboratory Department of Electrical Engineering National Taiwan University 1 Introduction Noise is a random process from many sources: thermal,

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

ON WAVEFORM SELECTION IN A TIME VARYING SONAR ENVIRONMENT

ON WAVEFORM SELECTION IN A TIME VARYING SONAR ENVIRONMENT ON WAVEFORM SELECTION IN A TIME VARYING SONAR ENVIRONMENT Ashley I. Larsson 1* and Chris Gillard 1 (1) Maritime Operations Division, Defence Science and Technology Organisation, Edinburgh, Australia Abstract

More information

Modulation analysis in ArtemiS SUITE 1

Modulation analysis in ArtemiS SUITE 1 02/18 in ArtemiS SUITE 1 of ArtemiS SUITE delivers the envelope spectra of partial bands of an analyzed signal. This allows to determine the frequency, strength and change over time of amplitude modulations

More information

Data Transmission (II)

Data Transmission (II) Agenda Lecture (02) Data Transmission (II) Analog and digital signals Analog and Digital transmission Transmission impairments Channel capacity Shannon formulas Dr. Ahmed ElShafee 1 Dr. Ahmed ElShafee,

More information