Effect of bandwidth extension to telephone speech recognition in cochlear implant users

Similar documents
Introduction to cochlear implants Philipos C. Loizou Figure Captions

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

HCS 7367 Speech Perception

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Bandwidth Extension for Speech Enhancement

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Enhancing 3D Audio Using Blind Bandwidth Extension

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Speech Synthesis using Mel-Cepstral Coefficient Feature

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Effect of filter spacing and correct tonotopic representation on melody recognition: Implications for cochlear implants

651 Analysis of LSF frame selection in voice conversion

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

REVISED. Minimum spectral contrast needed for vowel identification by normal hearing and cochlear implant listeners

BANDWIDTH EXTENSION OF NARROWBAND SPEECH BASED ON BLIND MODEL ADAPTATION

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Enhanced Waveform Interpolative Coding at 4 kbps

High-speed Noise Cancellation with Microphone Array

Predicting the Intelligibility of Vocoded Speech

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

A new sound coding strategy for suppressing noise in cochlear implants

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

ARTIFICIAL BANDWIDTH EXTENSION OF NARROW-BAND SPEECH SIGNALS VIA HIGH-BAND ENERGY ESTIMATION

CHAPTER 2 FIR ARCHITECTURE FOR THE FILTER BANK OF SPEECH PROCESSOR

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Audio Imputation Using the Non-negative Hidden Markov Model

Modulation Domain Spectral Subtraction for Speech Enhancement

SOUND SOURCE RECOGNITION AND MODELING

Speech Synthesis; Pitch Detection and Vocoders

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

Auditory modelling for speech processing in the perceptual domain

Chapter 4 SPEECH ENHANCEMENT

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Improving Sound Quality by Bandwidth Extension

Gaussian Mixture Model Based Methods for Virtual Microphone Signal Synthesis

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Digital Speech Processing and Coding

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Speech Compression Using Voice Excited Linear Predictive Coding

Machine recognition of speech trained on data from New Jersey Labs

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

RECENTLY, there has been an increasing interest in noisy

Fei Chen and Philipos C. Loizou a) Department of Electrical Engineering, University of Texas at Dallas, Richardson, Texas 75083

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Psycho-acoustics (Sound characteristics, Masking, and Loudness)

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

EFFICIENT SUPER-WIDE BANDWIDTH EXTENSION USING LINEAR PREDICTION BASED ANALYSIS-SYNTHESIS. Pramod Bachhav, Massimiliano Todisco and Nicholas Evans

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Can binary masks improve intelligibility?

Predicting Speech Intelligibility from a Population of Neurons

RIR Estimation for Synthetic Data Acquisition

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Applications of Music Processing

You know about adding up waves, e.g. from two loudspeakers. AUDL 4007 Auditory Perception. Week 2½. Mathematical prelude: Adding up levels

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

ON THE PERFORMANCE OF WTIMIT FOR WIDE BAND TELEPHONY

Speech Signal Analysis

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Transcoding of Narrowband to Wideband Speech

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

SGN Audio and Speech Processing

Wideband Speech Encryption Based Arnold Cat Map for AMR-WB G Codec

An audio watermark-based speech bandwidth extension method

Wavelet-based Voice Morphing

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

SPEECH AND SPECTRAL ANALYSIS

Effective post-processing for single-channel frequency-domain speech enhancement Weifeng Li a

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

June INRAD Microphones and Transmission of the Human Voice

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Environmental Sound Recognition using MP-based Features

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

Wideband Speech Coding & Its Application

NOISE ESTIMATION IN A SINGLE CHANNEL

Live multi-track audio recording

An Approach to Very Low Bit Rate Speech Coding

REPORT ITU-R M Adaptability of real zero single sideband technology to HF data communications

Speech Quality Evaluation of Artificial Bandwidth Extension: Comparing Subjective Judgments and Instrumental Predictions

Design and Implementation on a Sub-band based Acoustic Echo Cancellation Approach

On the significance of phase in the short term Fourier spectrum for speech intelligibility

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

-/$5,!4%$./)3% 2%&%2%.#% 5.)4 -.25

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Bandwidth Extension of Speech Signals: A Catalyst for the Introduction of Wideband Speech Coding?

Transcription:

Effect of bandwidth extension to telephone speech recognition in cochlear implant users Chuping Liu Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089 chupingl@usc.edu Qian-Jie Fu Department of Biomedical Engineering, University of Southern California, Los Angeles, California 90089 and Department of Auditory Implants and Perception, House Ear Institute, 2100 West Third Street, Los Angeles, California 90057 qfu@hei.org Shrikanth S. Narayanan Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089 shri@sipi.usc.edu Abstract: The present study investigated a bandwidth extension method to enhance telephone speech understanding for cochlear implant (CI) users. The acoustic information above telephone speech transmission range (i.e., 3400 Hz) was estimated based on trained models describing the relation between narrow-band and wide-band speech. The effect of the bandwidth extension method was evaluated with IEEE sentence recognition tests in seven CI users. Results showed a relatively modest but significant improvement in the speech recognition with the proposed method. The effect of bandwidth extension method was also observed to be highly dependent on individual CI users. 2009 Acoustical Society of America. PACS numbers: 43.71.Ky, 43.64.Me, 43.60.Dh [DS] Date Received: July 31, 2008 Date Accepted: December 8, 2008 1. Introduction Telephone use is still challenging for many deaf or hearing-impaired individuals including cochlear implant (CI) users. According to a previous study (Kepler et al., 1992), there are three major contributors to the difficulties in telephone communication: the limited frequency range, the elimination of visual cues, and the reduced audibility of telephone signal. For example, the telephone bandwidth in use today is limited to 300 3400 Hz. Compared to speech in face-toface conversational settings, telephone speech does not convey information above 3400 Hz, which is useful in the identification of many speech sounds, notably certain consonants such as fricatives. Since CI users generally receive frequency information up to approximate 8 khz or even higher, the narrow-band telephone speech may present an obstacle even when they can achieve a fairly good wide-band speech perception. Previous studies have assessed the capability of CI patients to communicate over telephones. While many CI patients were capable of certain degree of communication over the phones, speech understanding was significantly worse than with broad-band speech (Milchard and Cullington, 2004; Ito et al., 1999; Fu and Galvin, 2006). For example, word discrimination score obtained from telephone speech was decreased by 17.7% than those with wide-band speech. Analysis of the word errors revealed that the place of articulation was the predominant type of error (Milchard and Cullington, 2004). On the other hand, investigation of telephone use among CI recipients reported that 70% of the respondents communicated via the telephone, of which 30% used cellular phones (Cray et al., 2004). Hence, improved capability to understand telephone speech using just auditory cues will increase the opportunities for the use of the J. Acoust. Soc. Am. 125 2, February 2009 2009 Acoustical Society of America. EL77

telephone and will promote independent living, employment, socialization, and self-esteem in CI users. To improve the telephone communication ability of hearing-impaired people, one solution, albeit expensive, is to change the current public switched telephone network to transmit wide-band speech and to enrich the spoken information with videos. This is, however, difficult to accomplish in the near future. A more economical and near term approach is to add external equipment to enhance the audibility of telephone speech. For example, the telephone adapter, which was used to reduce noise level in the telephone and to record telephone speech into a tape recorder, was found to boost speech-tracking scores in CI users (Ito et al., 1999). Yet, such auxiliary instruments may not be easy to obtain, especially in mobile communication. Another potential approach is to improve speech processing and transmission technique. A previous study (Terry et al., 1992) investigated frequency-selective amplification and compression via digital signal processing techniques to compensate for high-frequency hearing loss in hearingimpaired people. Nevertheless, the approach required audiometric data from individual users to achieve the best performance. On the other hand, to overcome the deficit of telephone speech in terms of narrow bandwidth, bandwidth extension as a front end processing was studied (e.g., Nilsson and Kleijn, 2001; Jax and Vary, 2003). For example, Jax and Vary (2003) proposed an approach to extend telephone bandwidth to 7 khz based on hidden Markov model. Nilsson and Kleijn (2001) studied a bandwidth extension approach to avoid overestimation of high-band energy. Through listening tests, the method was shown to reduce the degree of artifacts. Yet, it is not clear how much gain the bandwidth-extension method can actually bring to speech recognition with listeners, especially for CI users. In this study, we propose a bandwidth-extension method to enhance telephone speech. Gaussian mixture model (GMM) was used to model the spectrum distribution of narrow-band speech. The relationship between wide-band and narrow-band speech was learned a priori in a data driven fashion and was used to recover the missing information based on the available telephone band speech. Such an approach does not require auxiliary instruments and patient data for its implementation. We then studied the effect of the proposed bandwidth-extension method on speech recognition performance in CI users. 2. Methods The step to expanding narrow-band speech to wide-band speech basically consists of two parts: spectral envelope extension and excitation spectrum extension, which are introduced in Secs. 2.1 and 2.2, respectively. 2.1 GMM-based spectral envelope extension A GMM represents the distribution of the observed parameters by m mixture Gaussian components in the form of m p x = i N x,µ i, i, i=1 1 where i denotes the prior probability of component i ( m i=1 i =1 and i 0) and N x,µ i, i denotes the normal distribution of the ith component with mean vector µ i and covariance matrix i in the form of N x,µ i, i = 1 2 p/2 i 1/2 exp 1 2 x µ i T i 1 x µ i, 2 where p is the vector dimension. The parameters of the model,µ, can be estimated using the well-known expectation maximization algorithm. EL78 J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension

Let x= x 1 x 2 x n be the sequence of n spectral vectors produced by the narrow-band telephone speech, and let y= y 1 y 2 y n be the time-aligned spectral vectors produced by the wide-band speech. The objective of the bandwidth-extension method was to define a conversion function F x t such that the total conversion error of spectral vectors n = y t F x t 2 t=1 3 was minimized over the entire training spectral feature set, using the trained GMM that represents the feature distribution of the telephone speech. A minimum mean square error method was used to estimate the conversion function. The conversion function was (Stylianou et al., 1998; Kain and Macon, 1998) m F x t = P C i x t v i + T i 1 i x t µ i, i=1 4 where P C i x t is the posterior probability that the ith Gaussian component generates x t ; v i and T i are the mean wide-band spectral vector and the cross-covariance matrix of the wide-band and narrow-band spectral vectors, respectively. When a diagonal conversion is used (i.e., T i and i are diagonal), the above optimization problem simplifies into a scalar optimization problem and the computation cost is greatly decreased. 2.2 Excitation spectrum extension Two methods are considered for excitation spectrum extension in this study (Makhoul and Berouti, 1979): spectral folding and spectral translation. Spectral folding simply generates a mirror image of the narrow-band spectrum for high-band spectrum. The implementation of spectral mirroring was equivalent to upsampling the excitation signal in the time domain by zero padding. This almost added no extra cost in the processing. Yet, the energy in the reconstructed high band is typically overestimated with this approach; the harmonic pattern of the restored high band is a flipped version of the original narrow-band spectrum, centered around the highest frequency of the narrow-band speech. Spectral translation, on the other hand, did not have these problems, but involves more expensive computation. The excitation spectrum of the narrowband speech, obtained from Fourier transformation of the time domain signal, is translated to the high-frequency part and padded to fill the desired whole band. A low pass filter is applied to do spectral whitening, such that the discontinuities between the translations are smoothed. The extended wide-band excitation in the time domain is then obtained from inverse Fourier transformation. 2.3 Speech analysis and synthesis In this study, Mel-scaled line spectral frequency (LSF) features (18th order) and energy were extracted to model the spectral characteristics of speech in a 19 dimensional space. The spectral features between narrow-band and wide-band speech were aligned with dynamic time warping computation. The spectral mapping function between narrow-band and wide-band speech was trained with 200 randomly selected sentences from the IEEE database (100 sentences from a female talker and the other 100 sentences from a male talker). The excitation component between 1 and 3 khz was used to construct the high-band excitation component because the spectrum in this range was relatively white. A low pass Butterworth filter (first order with cutoff frequency 3000 Hz) was used to do spectral whitening. The synthesized high-band speech (i.e., frequency information above 3400 Hz) was obtained from high pass filtering the convolution result of the extended excitation and extended spectrum. It was then appended to the original telephone speech to render the reconstructed wide-band speech that covered the frequency band from 300 to 8000 Hz. J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension EL79

Fig. 1. Implementation framework of the GMM-based bandwidth-extension method. 2.4 Implementation framework of the bandwidth-extension method Figure 1 illustrates the GMM-based bandwidth-extension method. The three major components of the model (i.e., GMM-based spectral envelope extension, excitation spectrum extension, and speech analysis/synthesis) are as detailed in Secs. 2.1 and 2.3. 2.5 Test materials and procedures The test materials in this study were IEEE (1969) sentences, recorded from one male talker and one female talker at the House Ear Institute with a sampling rate of 22 050 Hz. The narrowband telephone speech was obtained by bandpass filtering the above wide-band speech (ninth order Butterworth filter, bandpass between 300 and 3400 Hz) and was downsampled to 8 khz. Three conditions were tested: restored wide-band speech (carrying information up to 8 khz), telephone speech (carrying information up to 3.4 khz), and originally recorded wide-band speech (carrying information up to 11 khz). All sentences were normalized to have the same long-term root mean square value. Note that the GMM training sentences (i.e., 200 randomly selected sentences) were also bandwidth extended and included in the listening test to increase the available speech materials for the experiment. Seven CI subjects (two women and five men) participated in this study. Table 1 lists relevant demographics for the CI subjects. All subjects were native speakers of American English and had extensive experience in speech, recognition experiments. For all the listening conditions including restored wide-band speech, telephone speech, and originally recorded wideband speech, subjects were tested using their clinically assigned speech processor and Table 1. Subject demographics for the CI patients who participated in the present study. Subject Age Gender Etiology Implant type Strategy Duration of implant use years S1 55 M Hereditary Freedom ACE 1 S2 62 F Genetic Nucleus-24 ACE 2 S3 48 M Trauma Nucleus-22 SPEAK 13 S4 67 M Hereditary Nucleus-22 SPEAK 14 S5 64 M Trauma/unknown Nucleus-22 SPEAK 15 S6 75 M Noise Nucleus-22 SPEAK 9 induced S7 72 F Unknown Nucleus-24 ACE 5 EL80 J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension

100 Phone (3400) Phone+hf (8000) Unprocessed (11025) Percent correct(%) 80 60 40 20 S1 S2 S3 S4 S5 S6 S7 Avg Subjects Fig. 2. Sentence recognition performance for individual CI subjects with and without the bandwidth-extension method, and with the unprocessed wide-band speech. The error bars indicate one standard deviation. comfortable volume/sensitivity settings. As shown in Table 1, the subjects used ACE (Skinner et al., 2002) or SPEAK strategy (Seligman and McDermott, 1995). The maximum number of activated electrodes is typically 6 for SPEAK strategy and 8 for ACE strategy, respectively. While the number of activated electrodes is the same for both telephone speech and broad-band speech, the number of total usable electrodes is different. In general, all 20 electrodes will be used when listening to broad-band speech while only 13 electrodes will be used for telephone speech. Once testing began, these settings were not changed. Subjects were tested while seated in a double-walled sound-treated booth (IAC). Stimuli were presented via a single loud speaker at 65 dba. The test order of different conditions was randomized for each subject. No feedback was provided during the test. 3. Results and discussion The sentence recognition performance with and without the restored high-band components is shown in Fig. 2, together with the performance with the naturally recorded wide-band speech. Note that the subjects are ordered according to their performance with wide-band speech. On average, compared to the performance with the naturally recorded wide-band speech, the performance with the narrow-band telephone speech was about 16.8% lower, which was significant (paired t-test: p 0.001). The recognition score with the bandwidth-extension method was about 3.5% higher than without the bandwidth-extension method. The improvement was small but significant (paired t-test, p = 0.050). Yet, the performance with the bandwidth-extension method was still significantly lower than with the unprocessed wide-band speech (paired t-test, p 0.001). Figure 2 demonstrates substantial cross subject variability in performance. First, the cross subject variability was observed in terms of the performance for the same test materials. For example, subject S1 obtained over 80% correct under with and without the restored highband component conditions. In contrast, subject S7 obtained only about 40% in average. Second, the cross subject variability was observed in terms of the effect of the bandwidth-extension method. For example, subject S6 achieved about 10% improvement with the restored high-band information; while subject S3 had even about 3% deficit in performance with the restored highband information. 4. Discussion The present study showed a 16.8% performance drop in CI users listening to narrow-band telephone speech than listening to the originally recorded wide-band speech. This percentage drop was similar to the performance drop reported in Milchard and Cullington, 2004, although J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension EL81

the testing materials and testing procedures were different between these two studies. In the current study, seven CI subjects were tested with IEEE (1969) sentences. In Milchard and Cullington s (2004) study, ten CI subjects were tested with 80 consonant-vowel-consonant type stimuli (e.g., BAD BAG BAT BACK) using the four alternative auditory feature test procedure. The present study confirmed the findings in previous studies that the bandwidth effect was substantial in CI listeners. The observed cross subject performance difference may be due to different CI device settings and different electropsychoacoustic listening patterns across subjects. For example, for those CI users whose speech processor encoded more information on the high-band speech, the potential benefit of the bandwidth-extension method may be relatively larger than the other CI users. In the present study, a bandwidth-extension method was proposed to improve the telephone speech recognition performance in CI listeners. Although speech recognition was improved with the proposed bandwidth-extension method in a significant manner, the improvement was relatively small compared to the observed 16.8% performance drop from wide-band speech to telephone speech. There are four possible reasons for this marginal improvement. First, the proposed bandwidth-extension method only recovered information up to 8 khz, while the 16.8% performance drop was the performance difference between wide-band speech 11 khz and narrow-band telephone speech 3.4 khz. It was not clear how much the recognition benefit might be for the acoustic information between 8 and 11 khz. Second, in this study, Mel-scaled LSF features were used, which placed lower resolution on the high-frequency components. The feature order used for speech analysis was the same (18th order) for both wide-band and narrow-band speech, although their frequency ranges were different. Such signal processing procedures may not result in high accuracy in parameter estimation. Third, due to the nature of speech synthesis, it was difficult to accomplish a synthesis without perceptual distortion. The introduced artifacts may be very detrimental for CI listeners, who typically receive degraded spectrotemporal information. Finally, performance with the bandwidthextended speech was acutely measured in CI listeners in free field; the potential benefit with the bandwidth extended method might be underestimated since the training effect was not taken into account. 5. Conclusions This paper studied a bandwidth-extension method to enhance telephone speech understanding in CI users. The lost high-band acoustic information was estimated based on the available narrow-band telephone speech and a pretrained relation between narrow-band and wide-band speech. The narrow-band excitation was extended to wide-band excitation by spectral translation. A source filter model was used to synthesize estimated wide-band speech, whose highband frequency information was filtered out and appended to the original telephone speech. The effect of bandwidth-extension method was evaluated with IEEE (1969) sentence recognition tests in seven CI users. Results showed that CI speech recognition was significantly improved with the bandwidth-extension method, although it was relatively small compared to the performance drop seen from the wide-band speech to telephone speech. The benefit of the bandwidthextension method was also highly dependent on individual CI users. Acknowledgments We acknowledge all the subjects that participated in this study. Research was supported in part by NIH-NIDCD. References and links Cray, J. W., Allen, R. L., Stuart, A., Hudson, S., Layman, E., and Givens, G. D. (2004). An investigation of telephone use among cochlear implant recipients, Am. J. of Audiology 13, 200 212. Fu, Q. J., and Galvin, J. J. (2006). Recognition of simulated telephone speech by cochlear implant users, Am J. Audiol. 15, 127 32. IEEE (1969). IEEE Recommended Practice for Speech Quality Measurements (Institute of Electrical and Electronic Engineers, New York). EL82 J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension

Ito, J., Nakatake, M., and Fujita, S. (1999). Hearing ability by telephone of patients with cochlear implants, Otolaryngol.-Head Neck Surg. 121, 802 804. Jax, P., and Vary, P. (2003). On artificial bandwidth extension of telephone speech, Signal Process. 83, 1707 1719. Kain, A., and Macon, M. W. (1998). Spectral voice conversion for text-to-speech synthesis, IEEE ICASSP, pp. 285 288. Kepler, L. J., Terry, M., and Sweetman, R. H. (1992). Telephone usage in the hearing-impaired population, Ear Hear., 13, 311 319. Makhoul, J., and Berouti, M. (1979). High-frequency regeneration in speech coding systems, IEEE ICASSP, pp. 428 431. Milchard, A. J., and Cullington, H. E. (2004). An investigation into the effect of limiting the frequency bandwidth of speech on speech recognition in adult cochlear implant users. Int. J. Audiol., 43, 356 362. Nilsson, M., and Kleijn, W. B. (2001). Avoiding over-estimation in bandwidth extension of telephony speech, IEEE ICASSP, pp. 869 872. Seligman, P. M., and McDermott, H. J. (1995). Architecture of the spectra-22 speech processor, Ann. Otol. Rhinol. Laryngol. Suppl. 166, 139 141. Skinner, M. W., Arndt, P. L., and Staller, S. J. (2002). Nucleus 24 advanced encoder conversion study: Performance versus preference, Ear Hear. 23, 2S 17S. Stylianou, Y., Cappe, O., and Moulines, E. (1998). Continuous probabilistic transform for voice conversion, IEEE Trans. Commun. 6, 131 142. Terry, M., Bright, K., Durian, M., Kepler, L., Sweetman, R., and Grim, M. (1992). Processing the telephone speech signal for the hearing impaired, Ear Hear. 13, 70 79. J. Acoust. Soc. Am. 125 2, February 2009 Liu et al.: Effect of bandwidth extension EL83