Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation

Size: px
Start display at page:

Download "Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation"

Transcription

1 PAPER #2007 The Acoustical Society of Japan Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation Hideki Banno 1;, Hiroaki Hata 2, Masanori Morise 2, Toru Takahashi 2, Toshio Irino 2 and Hideki Kawahara 2 1 Faculty of Science and Technology, Meijo University, 1 501, Shiogamaguchi, Tempaku-ku, Nagoya, Japan 2 Faculty of Systems Engineering, Wakayama University, 930, Sakaedani, Wakayama, Japan ( Received 31 May 2006, Accepted for publication 31 October 2006 ) Abstract: A very high quality speech analysis, modification and synthesis system STRAIGHT has now been implemented in C language and operated in realtime. This article first provides a brief summary of STRAIGHT components and then introduces the underlying principles that enabled realtime operation. In STRAIGHT, the built-in extended pitch synchronous analysis, which does not require analysis window alignment, plays an important role in realtime implementation. A detailed description of the processing steps, which are based on the so-called just-in-time architecture, is presented. Further, discussions on other issues related to realtime implementation and performance measures are also provided. The software will be available to researchers upon request. Keywords: STRAIGHT speech manipulation system, Realtime, Pitch synchronous analysis, F 0 extraction, Voice conversion PACS number: Ja, Ar [doi: /ast ] 1. INTRODUCTION STRAIGHT [1] (Speech Transformation and Representation by Adaptive Interpolation of weighted spectrogram) was originally designed to investigate human speech perception in terms of auditorily meaningful parametric domains. STRAIGHT s design was motivated by the belief that nonlinear systems such as human speech perception should be investigated by using their normal input signals, i.e., ecologically relevant stimuli. Although the underlying structure of STRAIGHT is similar to that of the classical channel VOCODER [2], the speech sounds reproduced and/or manipulated by STRAIGHT are sometimes indistinguishable from the original speech sounds in terms of their naturalness [3,4]. This conceptual simplicity together with manipulation flexibility and a highly natural reproduced speech quality have made STRAIGHT a powerful tool for speech perception research [5 9]. In addition to the utility of the basic STRAIGHT system, its extensions to auditory morphing [3,4,10,11] opened up novel prospects in speech manipulations. Realtime STRAIGHT, which is the focus of this article, banno@ccmfs.meijo-u.ac.jp will also promote the other aspects of STRAIGHT s design objective that have not been exploited further. From the beginning, STRAIGHT was designed so that it could be applied to auditory feedback research [12] in the near future when processors could become sufficiently fast for realtime operation. This was because auditory feedback was the research topic of one of the authors. Because of this background, all the algorithms incorporated into STRAIGHT were already designed so as to be compatible with realtime operation [1]. However, the current STRAIGHT implementation in Matlab (henceforth referred to as Matlab STRAIGHT ) does not consider realtime processing so that users can operate STRAIGHT parameters easily. This article reports the first attempt at testing how the original design objective works with current technologies. When STRAIGHT is implemented in realtime, it will also be useful in various types of applications such as voice conversion, text-to-speech synthesis, musical performances, and speech style conversion. 2. INTRODUCTION TO STRAIGHT [1] STRAIGHT has been evolving through investigations with regard to the following topics: (1) Periodic excitation in voiced sounds can be inter- 140

2 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT preted as a two-dimensional sampling operation of the smooth time-frequency surface that represents articulatory [1,13]. (2) Group delay manipulation of the excitation source [13]. (3) F 0 estimation that does not require a priori knowledge for designing a relevant analysis window [1]. (4) Extended pitch synchronous analysis that does not require alignment with pitch marks [1]. (5) F 0 extraction based on fixed-point analysis of a mapping from the carrier frequencies of the analyzing wavelet to the instantaneous frequencies of the corresponding wavelet transform [14]. (6) Acoustic event extraction based on fixedpoint analysis of window center location with respect to the centroid of the windowed signal and minimumphase group delay based compensation [15]. (7) Auditory morphing [3,10]. (8) Algorithm AMALGAM [16] that can seamlessly morph different speech processing algorithms such as waveform-based synthesis, sinusoidal models and STRAIGHT. (9) Nearly defect-free F 0 extractor using multiple F 0 cues and post processing suitable for offline and quality sensitive applications [17]. These studies roughly trace the course of STRAIGHT s evolution. The components developed in the first and third topics were replaced by their counterparts developed in the fourth and fifth topics, respectively, and the former no longer exist. The modules developed in the sixth and eighth topics have not yet been integrated into STRAIGHT. At this stage, the most important topic among those presented for realtime STRAIGHT implementation is the fourth one the extended pitch synchronous spectral analysis. The topic of secondary importance is that on F 0 extraction Architecture of STRAIGHT Figure 1 shows the schematic diagram of Matlab STRAIGHT. STRAIGHT is basically a channel VOCODER with enhanced parameter modification capabilities and a very high quality. The parameters manipulated are (a) smoothed spectrogram, (b) fundamental frequency, and (c) time-frequency periodicity map. The frequency resolution of the periodicity map is set to one ERB N rate by smoothing along a nonlinear frequency axis. STRAIGHT offers a graphical interface for analysis, modification, and synthesis, and it also allows direct access to the Matlab functions. The central feature of STRAIGHT is the extended pitch synchronous spectral analysis that provides a smooth artifact-free time-frequency representation of the spectral envelope of the speech signal. ERB: equivalent rectangular bandwidth input speech F0 adaptive interference-free spectral extractor source extractor Fig. 1 F0 spectral modification source 2.2. Extended Pitch Synchronous Analysis The most unique feature of STRAIGHT is its extended pitch synchronous analysis. Unlike other pitch synchronous procedures, STRAIGHT does not require its analysis frame to be aligned with the pitch marks placed on the waveform under study. This analysis employs a compensatory set of windows. The primary window is an effectively isometric Gaussian window convoluted with a pitch adaptive Bartlett window hðtþ. The fundamental period is represented as t 0. w p ðtþ ¼e t 2 t 0 hðt=t 0 Þ ð1þ 1 jtj; jtj < 1 hðtþ ¼ 0 otherwise, where represents a temporal stretching factor for improving the frequency resolution slightly. The operator represents convolution. This convoluted window is pitch synchronized and it yields temporally constant spectral peaks at harmonic frequencies. However, periodic zeros (period is the fundamental period t 0 ) between the harmonic components still remain. The modulation of this window by a sinusoid of frequency f 0 =2 yields a compensating window that produces spectral peaks at positions where zeros were located in the original spectra. w c ðtþ ¼w p ðtþ sin t t 0 output speech minimum phase filter mixed mode excitation source with group delay manipulation Schematic structure of Matlab STRAIGHT. : ð2þ A temporarily stable composite spectrum P r ð!; t;þ is represented as a weighted squared sum of the power spectra P 2 o ð!; t;þ and P2 cð!; t;þ using the original time window and the compensatory window, respectively. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P r ð!; t;þ¼ P 2 o ð!; t;þþðþp2 cð!; t;þ ð3þ where the mixing coefficient ðþ is numerically optimized to minimize temporal variations both in the peaks and the 141

3 valleys. This provides pitch synchronous spectral analysis without the need for pitch marking. This pitch marking independence is crucial for realtime STRAIGHT implementation. In addition, STRAIGHT introduces spline-based F 0 adaptive spectral smoothing that only eliminates the interferences due to periodic excitation to finally yield a smoothed spectrum [1] the so-called STRAIGHT spectrum Instantaneous Frequency Based F 0 Extraction The quality of source extraction, namely F 0 extraction, is critical for the quality of resynthesized speech. The F 0 trajectory has to be band limited so that it does not contain any traces of the F 0 jump because such discontinuities introduce an additional temporal jitter when F 0 or/and the temporal axis are manipulated. An instantaneous frequency based source extractor was developed to meet these requirements [14] Resynthesis and Group Delay Manipulation Group delay manipulation was introduced to enable an F 0 control that is finer than the resolution determined by the sampling interval. It was also used to reduce the buzzy sound usually found in synthetic speech. Based on prior listening tests and considerations on ecological validity, the minimum phase impulse response was used. All these operations are implemented as group delay manipulations and integrated. 3. REALTIME STRAIGHT Realtime STRAIGHT, named Herium, is a multiplatform software currently available on Windows XP Home/Professional and Mac OS X 10.2 or later versions. This section introduces the issues involved in converting Matlab STRAIGHT into a C-based realtime executable system Architecture Although STRAIGHT was designed to be compatible with realtime applications, the implementation of Matlab STRAIGHT is not compatible with realtime processing. This is because the Matlab version was designed for interactive explorations and offline batch processing. Therefore, a complete reconfiguration of the program architecture was inevitable. Realtime speech processing systems, realtime STRAIGHT being one of them, require an audio interface that is capable of full-duplex recording and playback operations. To handle this duplex data stream between the audio interface and PC, realtime systems must be equipped with speech input and speech output buffers. In addition, constituent processes should communicate based on the status of the data in each buffer Buffer-based processing steps Figure 2 illustrates how the constituent tasks of realtime STRAIGHT are coordinated by sharing two buffers and how each reference pointer for synthesis is synchronized. The figure also shows how the audio interface acquires contiguous output audio data seamlessly. The following steps outline how realtime STRAIGHT processing takes place. (1) Initialize the input and output buffer contents to zero. (2) Fetch the input speech data into a portion located at the end of the input buffer. The length of this portion is referred to as the buffer shift length. (3) Estimate F 0 using the speech data stored in the input buffer. (In the current implementation, a single F 0 is calculated using all the data in the input buffer.) (4) Read a data segment from where a pointer (RPS: reference point for synthesis) points. Next, extract the STRAIGHT spectrum from the read data segment. Finally, transform the STRAIGHT spectrum into an impulse response. (5) Add the impulse response to the output buffer at the RPS of the output buffer. Then, add the fundamental period (reciprocal of F 0 ) for synthesis to the RPS position to advance the pointer. The fundamental period is modified if F 0 conversion is applied to the input speech data. (6) Repeat the previous step until the RPS location surpasses the buffer shift length from the beginning of the buffer. (7) Transmit the buffer shift length portion of the output buffer from the beginning of the buffer to the audio interface. (See Fig. 2) (8) Shift the contents of both buffers backward by a length equal to that of the buffer shift. This operation generates an empty space of buffer shift length at the end of each buffer. The RPS locations are also shifted backward by subtracting the buffer shift length from their location counters. Then, the second step is repeated. This is a pipeline process. In the current implementation, the buffer size is 32 ms and the buffer shift length is 8 ms. The intrinsic delay due to this buffering, 24 ms in this case, Fig. 2 The input buffer (left) and output buffer (right). This plot shows the status just after the sixth step (RPS: reference point for synthesis). 142

4 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT is added to the internal processing delays and latency of the audio interface F 0 extraction The STRAIGHT F 0 extractor has not been implemented in the current realtime STRAIGHT, because the F 0 extraction algorithm in Matlab STRAIGHT must be slightly modified to be implemented in realtime. Therefore, instead of using a STRAIGHT F 0 extractor, the conventional Cepstrum-based F 0 extractor was implemented using the same input buffer structure (32-ms window length with 8-ms frame shift). The quefrency of the maximum peak of the Cepstrum is used to determine the fundamental period and convert it into F 0. A voiced/unvoiced decision is made based on the peak height. The effects of F 0 extractor replacement are discussed in the evaluation section Spectral estimation The spectral extraction of Matlab STRAIGHT is excessively redundant because the main target application is the interactive exploration of speech attributes and perceptual effects. The current default analysis frame rate of Matlab STRAIGHT is 1 ms. However, thanks to the compensating windowing, the current version does not require a fine temporal resolution in the spectral analysis. Realtime STRAIGHT analyzes spectral only when it is required, in other words, when each excitation pulse is generated. This corresponds to a just-in-time analysis. Its implementation is also useful in offline processing because it enables a huge reduction in the necessary storage for STRAIGHT parameters. The F 0 adaptive analysis time window length is updated only once in a single frame. This design was selected because based on our preliminary tests, minor F 0 errors do not degrade STRAIGHT spectra and have negligible effects on the reproduction quality. The spectral analysis in realtime STRAIGHT can be switched between the Cepstrum-based method [18] and the STRAIGHT analysis. The effects of this difference between the spectral analyses are discussed in the evaluation section Synthesis from STRAIGHT spectra The following functions are eliminated in the current realtime STRAIGHT.. Minimum phase impulse response.. Group delay manipulation in a higher frequency region to reduce the buzzyness of the artificial excitation sources. An F 0 control finer than sampling pulse resolution using group delay manipulation. This elimination can slightly degrade the naturalness of resynthesized speech. The difference between realtime STRAIGHT and the conventional analysis/synthesis system is only a spectral representation Implementation Realtime STRAIGHT is a multi-platform system. It is built on a multi-platform audio processing library set splibs, [19] also developed by the first author. The current implementation uses spbase, the basic library; spaudio, the audio input and output library; spcomponent the GUI library; and splib, the signal processing library. The portability of splibs makes realtime STRAIGHT easily portable. It should be noted that the processing time of realtime STRAIGHT is significantly reduced by using fast FFT libraries distributed by chip manufacturers. It was found that in our applications, these FFTs were two to four times faster than the usual mathematical function libraries. In STRAIGHT, the analysis window size is dependent on the sampling frequency. This dependency introduces a super linear increase in the number of numerical operations and introduces an upper limit to the sampling frequency for realtime operation. Based on the tests using various types of PCs, it is safe to state that a normal realtime operation at a sampling frequency of 32 khz is possible on PCs with processors running at around 1 GHz. The latency without an audible gap was less than 100 ms, although the value really depends on the operating system, the audio interface, and so on Evaluation The preliminary evaluation of the effects of the modifications introduced for realizing realtime STRAIGHT was conducted by replacing Matlab STRAIGHT s components with modules that implemented algorithms used in realtime STRAIGHT. Figure 3 shows the trajectories extracted using the conventional Cepstrum pitch extractor and STRAIGHT F 0 extractor. The speech sample is a recorded female speech sampled at 16 khz. The conventional frame based analysis and coarse temporal resolution due to sampling introduces staircase features in the Cepstrum-based trajectory. The perceptual effects of this jaggedness were not prominent in the case of male speech; however, the degradation was noticeable in female speech, especially in the case of highpitched subjects. This degradation is easily audible when headphones are used instead of loud speakers. Voiced/ unvoiced decision errors observed around 100 ms and 200 ms in the Cepstram case can cause noticeable degradation. Figure 4 shows the spectrograms extracted using the conventional fixed frame Cepstrum analysis and STRAIGHT spectral analysis. The fixed frame rate of the conventional method failed to capture the fine and smooth spectral structures found in the STRAIGHT spectrogram. The degradation in the resynthesized speech when the Cepstrum-based spectrogram was used could be easily perceived. This degradation is mainly due to inappropriate 143

5 Fig. 3 F 0 trajectories extracted by Cepstrum-based method (upper plot) and instantaneous frequency based method used in Matlab STRAIGHT (lower plot) Fig. 4 Smoothed spectrogram extracted by Cepstrumbased method (upper plot) and extended pitch synchronous method used in Matlab STRAIGHT (lower plot). (usually too strong) spectral smoothing in the conventional method. Further, this degradation is audible in the current realtime STRAIGHT, which has an option for switching the spectral analysis between the STRAIGHT-based method and the Cepstrum-based method. It is interesting to note that when the STRAIGHT-based F 0 extraction for the Cepstrum analysis/synthesis system was used, a noticeable improvement in the speech quality was observed. Subjective evaluation has also been performed. Ten listeners with normal hearing participated in this experiment. The speech materials used are ten sentences uttered by four males and four females (80 utterances in total). The sampling frequency of the materials is khz. Figure 5 shows the mean opinion scores (MOS) of the subjective experiment. Org is original; STR, Matlab STRAIGHT; RT-STR, realtime STRAIGHT; and Cep- Voc, the conventional Cepstrum analysis/synthesis system. The MOS of the realtime STRAIGHT is approx Male Female Total CepVoc RT-STR STR Org Fig. 5 Results of subjective experiment. The error bars are the 1 standard deviation. 144

6 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT The first implementation of realtime STRAIGHT is introduced. Currently, it runs faster than realtime on PCs with processors running at around 1 GHz. However, this performance was achieved by eliminating some important components and replacing some components with conventional procedures. Our next step is to upgrade the downgraded components to Matlab STRAIGHT s latest algorithms by developing computationally efficient implementations. ACKNOWLEDGMENTS The authors would like to thank Professor Fumitada Itakura for his valuable suggestions and discussions. This research was partly supported by MEXT leading project e- Society, the grant in aid for scientific research for young researchers (B) ( ), and ERATO by JST. REFERENCES Fig. 6 STRAIGHT in realtime operation on a tablet PC (Sony VAIO-U) the black unit on a top of the silver unit (AD/DA converter, Roland UA-25). The graphical user interface (GUI) on the PC s display is shown. The left panel in the images shows the GUI of realtime STRAIGHT. Four sliders control the round trip delay, F 0 conversion ratio, frequency axis stretching coefficient and gender of the resynthesized speech. imately one point better than that of the Cepstrum-based system. This indicates that the quality of the realtime STRAIGHT is better than that of Cepstrum-based system. In comparison with Matlab STRAIGHT, realtime STRAIGHT has poorer performance, especially in the case of female speech. This is caused by voiced/unvoiced decision errors in males and females, and jaggedness of F 0 trajectories mainly in females. 4. DEMONSTRATIONS Figure 6 shows a frame from a movie demonstrating realtime STRAIGHT running on a tablet PC (Sony VAIO- U). The figure also shows the GUI of realtime STRAIGHT. In addition to sliders for F 0 conversion and frequency axis stretching, a gender control slider is also installed. This slider changes the frequency stretching factor in proportion to the cubic root of the F 0 conversion factor. This relation, ¼ 1=3, was determined in our preliminary tests, and it was found to be consistent with a number of literatures on speech perception research. The source code of this software is available upon request to researchers who already have Matlab STRAIGHT. 5. CONCLUSION [1] H. Kawahara, I. Masuda-Katsuse and A. de Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction, Speech Commun., 27, (1999). [2] H. Dudley, Remaking speech, J. Acoust. Soc. Am., 11, (1939). [3] H. Matsui and H. Kawahara, Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system, Proc. Eurospeech 2003, Geneva, pp (2003). [4] T. Yonezawa, N. Suzuki, K. Mase and K. Kogure, Gradually changing expression of singing voice based on morphing, Proc. Interspeech 2005, Lisboa, pp (2005). [5] T. Irino and R. D. Patterson, Segregating about the size and shape of the vocal tract using a time-domain auditory model: The Stabilised Wavelet Mellin Transform, Speech Commun., 36, (2002). [6] D. R. R. Smith, R. D. Patterson, H. K. R. Turner and T. Irino, The processing and perception of size in speech sounds, J. Acoust. Soc. Am., 117, (2005). [7] J. Jin, H. Banno, H. Kawahara and T. Irino, Intelligibility of degraded speech from smeared STRAIGHT spectrum, Proc. ICSLP 2004, Vol. 4, Jeju, pp (2004). [8] C. Liu and D. Kewley-Port, Vowel formant discrimination for high-fidelity speech, J. Acoust. Soc. Am., 116, (2004). [9] P. F. Assmann and W. F. Katz, Synthesis fidelity and timevarying spectral change in vowels, J. Acoust. Soc. Am., 117, (2005). [10] H. Kawahara and H. Matsui, Auditory morphing based on an elastic perceptual distance metric in an interference-free timefrequency representation, Proc. ICASSP 2003, Vol. 1, Hong Kong, pp (2003). [11] Y. Sogabe, K. Kakehi and H. Kawahara, Psychological evaluation of emotional speech using a new morphing method, Proc. ICCS 2003, Vol. 2, Melbourne, pp (2003). [12] H. Kawahara and J. C. Williams, Effects of auditory feedback on voice pitch trajectories, in Vocal Fold Physiology, P. J. Davis and N. H. Fletcher, Eds. (Singular Publishing Group, San Diego, 1996) Chap. 18, pp [13] H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited, Proc. ICASSP 97, Vol. 2, Muenich, pp (1997). [14] H. Kawahara, H. Katayose, A. de Cheveigné and R. D. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. Eurospeech 99, Vol. 6, Budapest, pp (1999). 145

7 [15] H. Kawahara, Y. Atake and P. Zolfaghari, Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay, Proc. ICSLP 2000, Beijin, pp (2000). [16] H. Kawahara, H. Banno, T. Irino and P. Zolfaghari, Algorithm AMALGAM: Morphing waveform based methods, sinusoidal models and STRAIGHT, Proc. ICASSP 2004, Vol. 1, Montreal, pp (2004). [17] H. Kawahara, A. de Cheveigné, H. Banno, T. Takahashi and T. Irino, Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT, Proc. Interspeech 2005, Lisboa, pp (2005). [18] A. V. Oppenheim, Speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am., 45, (1969). [19] Hideki Banno was born in Toukai, Japan on January 22, He received B.E., M.E., and Ph.D. degrees from Nagoya University in 1996, Nara Institute of Science and Technology in 1998, and Nagoya University in 2003, respectively. From 2001 to 2003, he was a research assistant of the Graduate School of Economics, Nagoya University. From 2003 to 2005, he was a research assistant of the Faculty of Systems Engineering, Wakayama University. From 2004 to 2005, he was also an invited researcher of ATR Spoken Language Communication Research Laboratories. He is currently a lecturer of the Faculty of Science and Technology, Meijo University, since His research interests include speech analysis and synthesis, acoustic signal processing and auditory perception. He is a member of ASJ and IEICE. Hiroaki Hata received the B.E. degree in Systems Engineering from Wakayama University in He is currently a master student of the Graduate School of Systems Engineering, Wakayama University. His research interests include implementation of speech analysis-synthesis systems. Masanori Morise received the B.E. and M.E. degrees in Systems Engineering from Wakayama University, in 2004 and 2006, respectively. In 2006, he became a Research Fellow (DC1) of JSPS. He is currently a doctoral candidate of the Graduate School of Systems Engineering of Wakayama University, since His research interests include acoustic signal processing and electro-acoustic systems. He is a member of ASJ and IEICE. Toru Takahashi was born in Akita, Japan on June 8, He received a B.E in computer science, and a M.E. and Ph.D. degree in electrical and electronic engineering from Nagoya Institute of Technology in 1996, 1998, 2004, respectively. He has been a research assistant in the Faculty of Systems Engineering at Wakayama University since His research interests include speech analysis and synthesis, acoustic signal processing, and auditory perception. He is a member of ASJ, IEICE and JPSJ. Toshio Irino was born in Yokohama, Japan, in He received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from Tokyo Institute of Technology, 1982, 1984, and 1987, respectively. From 1987 to 1997, he was a research scientist at NTT Basic Research Laboratories. From 1993 to 1994, he was a visiting researcher at the Medical Research Council-Applied Psychology Unit (MRC-APU, currently CBU) in Cambridge, UK. From 1997 to 2000, he was a senior researcher in ATR Human Information Processing Research Laboratories (ATR HIP). From 2000 to 2002, he was a senior research scientist in NTT Communication Science Laboratories. Since 2002, he has been been a professor of the Faculty of Systems Engineering, Wakayama University. He is also a visiting professor at the institute of statistical mathematics. The focus of his current research is a computational theory of the auditory system. Dr. Irino is a member of the Acoustical Society of America (ASA), the Acoustical Society of Japan (ASJ), and the Institute of Electronics, Information and Communication Engineers (IEICE). Hideki Kawahara received the B.E., M.E. and Dr.Eng. degrees in Electrical Engineering from Hokkaido University, Sapporo, Japan in 1972, 1974 and 1977, respectively. In 1977, he joined the Electrical Communications Laboratories of Nippon Telephone and Telegraph public corporation. In 1992, he joined ATR Human Information Processing research laboratories, Japan as a department head. In 1997, he became an invited researcher of ATR. He is currently a professor of the Faculty of Systems Engineering, Wakayama University, since He received the Sato award from the ASJ in 1998, the TELECOM System Technology Prize from the Telecommunications Advancement Foundation Award, Japan in 1998 and the EURASIP best paper award for his contribution to the Speech Communication journal, in His research interests include auditory signal processing models, speech analysis and synthesis, electro-acoustic systems and auditory perception. He is a member of ASA, ASJ, IEEE, IPSJ, ISCA, IEICE and JNNS. 146

STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds

STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds INVITED REVIEW STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds Hideki Kawahara Faculty of Systems Engineering, Wakayama University, 930 Sakaedani,

More information

2nd MAVEBA, September 13-15, 2001, Firenze, Italy

2nd MAVEBA, September 13-15, 2001, Firenze, Italy ISCA Archive http://www.isca-speech.org/archive Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) 2 nd International Workshop Florence, Italy September 13-15, 21 2nd MAVEBA, September

More information

Getting started with STRAIGHT in command mode

Getting started with STRAIGHT in command mode Getting started with STRAIGHT in command mode Hideki Kawahara Faculty of Systems Engineering, Wakayama University, Japan May 5, 27 Contents 1 Introduction 2 1.1 Highly reliable new F extractor and notes

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices)

Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) (Compiled: 1:3 A.M., February, 18) Hideki Kawahara 1,a) Abstract: The Velvet

More information

Abstract. 1 Introduction

Abstract. 1 Introduction Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction: Possible role of a repetitive structure in sounds Hideki Kawahara,

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Overview of Code Excited Linear Predictive Coder

Overview of Code Excited Linear Predictive Coder Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances

More information

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants

Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Zhi Zhu, Ryota Miyauchi, Yukiko Araki, and Masashi Unoki School of Information Science, Japan Advanced

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

Fundamental frequency estimation of speech signals using MUSIC algorithm

Fundamental frequency estimation of speech signals using MUSIC algorithm Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS) AUDL GS08/GAV1 Auditory Perception Envelope and temporal fine structure (TFS) Envelope and TFS arise from a method of decomposing waveforms The classic decomposition of waveforms Spectral analysis... Decomposes

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

A Comparative Performance of Various Speech Analysis-Synthesis Techniques

A Comparative Performance of Various Speech Analysis-Synthesis Techniques International Journal of Signal Processing Systems Vol. 2, No. 1 June 2014 A Comparative Performance of Various Speech Analysis-Synthesis Techniques Ankita N. Chadha, Jagannath H. Nirmal, and Pramod Kachare

More information

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS

HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS Sean Enderby and Zlatko Baracskai Department of Digital Media Technology Birmingham City University Birmingham, UK ABSTRACT In this paper several

More information

Waveform generation based on signal reshaping. statistical parametric speech synthesis

Waveform generation based on signal reshaping. statistical parametric speech synthesis INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Waveform generation based on signal reshaping for statistical parametric speech synthesis Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu,

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova

Slovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova Slovak University of Technology and Planned Research in Voice De-Identification Anna Pribilova SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA the oldest and the largest university of technology in Slovakia

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Epoch Extraction From Emotional Speech

Epoch Extraction From Emotional Speech Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

PR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan.

PR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan. XVIII. DIGITAL SIGNAL PROCESSING Academic Research Staff Prof. Alan V. Oppenheim Prof. James H. McClellan Graduate Students Bir Bhanu Gary E. Kopec Thomas F. Quatieri, Jr. Patrick W. Bosshart Jae S. Lim

More information

SPEECH AND SPECTRAL ANALYSIS

SPEECH AND SPECTRAL ANALYSIS SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs

More information

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch

High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory

More information

Spectral analysis based synthesis and transformation of digital sound: the ATSH program

Spectral analysis based synthesis and transformation of digital sound: the ATSH program Spectral analysis based synthesis and transformation of digital sound: the ATSH program Oscar Pablo Di Liscia 1, Juan Pampin 2 1 Carrera de Composición con Medios Electroacústicos, Universidad Nacional

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

Complex Sounds. Reading: Yost Ch. 4

Complex Sounds. Reading: Yost Ch. 4 Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency

More information

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

Possible application of velvet noise and its variant in psychology and physiology of hearing

Possible application of velvet noise and its variant in psychology and physiology of hearing velvet noise 64-851 93 61-1197 13-6 468-85 51 4-851 4-4-37 441-858 1-1 E-mail: {kawahara,irino}@sys.wakayama-u.ac.jp, minoru.tsuzaki@kcua.ac.jp, banno@meijo-u.ac.jp, mmorise@yamanashi.ac.jp, tmatsui@cs.tut.ac.jp

More information

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM

KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying

More information

Monaural and Binaural Speech Separation

Monaural and Binaural Speech Separation Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as

More information

III. Publication III. c 2005 Toni Hirvonen.

III. Publication III. c 2005 Toni Hirvonen. III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,

More information

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis

SOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Glottal Spectral Separation for Speech Synthesis João P. Cabral, Korin Richmond, Member, IEEE, Junichi Yamagishi, Member, IEEE, and Steve Renals,

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Enhancing 3D Audio Using Blind Bandwidth Extension

Enhancing 3D Audio Using Blind Bandwidth Extension Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,

More information

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat

Spatial Audio Transmission Technology for Multi-point Mobile Voice Chat Audio Transmission Technology for Multi-point Mobile Voice Chat Voice Chat Multi-channel Coding Binaural Signal Processing Audio Transmission Technology for Multi-point Mobile Voice Chat We have developed

More information

EXTRACTING a desired speech signal from noisy speech

EXTRACTING a desired speech signal from noisy speech IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999 665 An Adaptive Noise Canceller with Low Signal Distortion for Speech Codecs Shigeji Ikeda and Akihiko Sugiyama, Member, IEEE Abstract

More information

APPLICATIONS OF DSP OBJECTIVES

APPLICATIONS OF DSP OBJECTIVES APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel

More information

Modulation Domain Spectral Subtraction for Speech Enhancement

Modulation Domain Spectral Subtraction for Speech Enhancement Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,

More information

Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals

Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno,

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio >Bitzer and Rademacher (Paper Nr. 21)< 1 Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio Joerg Bitzer and Jan Rademacher Abstract One increasing problem for

More information

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

Multi-Band Excitation Vocoder

Multi-Band Excitation Vocoder Multi-Band Excitation Vocoder RLE Technical Report No. 524 March 1987 Daniel W. Griffin Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge, MA 02139 USA This work has been

More information

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE

SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),

More information

Psychology of Language

Psychology of Language PSYCH 150 / LIN 155 UCI COGNITIVE SCIENCES syn lab Psychology of Language Prof. Jon Sprouse 01.10.13: The Mental Representation of Speech Sounds 1 A logical organization For clarity s sake, we ll organize

More information

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING

THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING Ryan Stables [1], Dr. Jamie Bullock [2], Dr. Cham Athwal [3] [1] Institute of Digital Experience, Birmingham City University,

More information

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec

An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec Akira Nishimura 1 1 Department of Media and Cultural Studies, Tokyo University of Information Sciences,

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Cepstrum alanysis of speech signals

Cepstrum alanysis of speech signals Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals

Vocoder (LPC) Analysis by Variation of Input Parameters and Signals ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of

More information

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015 Final Exam Study Guide: 15-322 Introduction to Computer Music Course Staff April 24, 2015 This document is intended to help you identify and master the main concepts of 15-322, which is also what we intend

More information

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels A complex sound with particular frequency can be analyzed and quantified by its Fourier spectrum: the relative amplitudes

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information