Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation
|
|
- Aron Allison
- 5 years ago
- Views:
Transcription
1 PAPER #2007 The Acoustical Society of Japan Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation Hideki Banno 1;, Hiroaki Hata 2, Masanori Morise 2, Toru Takahashi 2, Toshio Irino 2 and Hideki Kawahara 2 1 Faculty of Science and Technology, Meijo University, 1 501, Shiogamaguchi, Tempaku-ku, Nagoya, Japan 2 Faculty of Systems Engineering, Wakayama University, 930, Sakaedani, Wakayama, Japan ( Received 31 May 2006, Accepted for publication 31 October 2006 ) Abstract: A very high quality speech analysis, modification and synthesis system STRAIGHT has now been implemented in C language and operated in realtime. This article first provides a brief summary of STRAIGHT components and then introduces the underlying principles that enabled realtime operation. In STRAIGHT, the built-in extended pitch synchronous analysis, which does not require analysis window alignment, plays an important role in realtime implementation. A detailed description of the processing steps, which are based on the so-called just-in-time architecture, is presented. Further, discussions on other issues related to realtime implementation and performance measures are also provided. The software will be available to researchers upon request. Keywords: STRAIGHT speech manipulation system, Realtime, Pitch synchronous analysis, F 0 extraction, Voice conversion PACS number: Ja, Ar [doi: /ast ] 1. INTRODUCTION STRAIGHT [1] (Speech Transformation and Representation by Adaptive Interpolation of weighted spectrogram) was originally designed to investigate human speech perception in terms of auditorily meaningful parametric domains. STRAIGHT s design was motivated by the belief that nonlinear systems such as human speech perception should be investigated by using their normal input signals, i.e., ecologically relevant stimuli. Although the underlying structure of STRAIGHT is similar to that of the classical channel VOCODER [2], the speech sounds reproduced and/or manipulated by STRAIGHT are sometimes indistinguishable from the original speech sounds in terms of their naturalness [3,4]. This conceptual simplicity together with manipulation flexibility and a highly natural reproduced speech quality have made STRAIGHT a powerful tool for speech perception research [5 9]. In addition to the utility of the basic STRAIGHT system, its extensions to auditory morphing [3,4,10,11] opened up novel prospects in speech manipulations. Realtime STRAIGHT, which is the focus of this article, banno@ccmfs.meijo-u.ac.jp will also promote the other aspects of STRAIGHT s design objective that have not been exploited further. From the beginning, STRAIGHT was designed so that it could be applied to auditory feedback research [12] in the near future when processors could become sufficiently fast for realtime operation. This was because auditory feedback was the research topic of one of the authors. Because of this background, all the algorithms incorporated into STRAIGHT were already designed so as to be compatible with realtime operation [1]. However, the current STRAIGHT implementation in Matlab (henceforth referred to as Matlab STRAIGHT ) does not consider realtime processing so that users can operate STRAIGHT parameters easily. This article reports the first attempt at testing how the original design objective works with current technologies. When STRAIGHT is implemented in realtime, it will also be useful in various types of applications such as voice conversion, text-to-speech synthesis, musical performances, and speech style conversion. 2. INTRODUCTION TO STRAIGHT [1] STRAIGHT has been evolving through investigations with regard to the following topics: (1) Periodic excitation in voiced sounds can be inter- 140
2 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT preted as a two-dimensional sampling operation of the smooth time-frequency surface that represents articulatory [1,13]. (2) Group delay manipulation of the excitation source [13]. (3) F 0 estimation that does not require a priori knowledge for designing a relevant analysis window [1]. (4) Extended pitch synchronous analysis that does not require alignment with pitch marks [1]. (5) F 0 extraction based on fixed-point analysis of a mapping from the carrier frequencies of the analyzing wavelet to the instantaneous frequencies of the corresponding wavelet transform [14]. (6) Acoustic event extraction based on fixedpoint analysis of window center location with respect to the centroid of the windowed signal and minimumphase group delay based compensation [15]. (7) Auditory morphing [3,10]. (8) Algorithm AMALGAM [16] that can seamlessly morph different speech processing algorithms such as waveform-based synthesis, sinusoidal models and STRAIGHT. (9) Nearly defect-free F 0 extractor using multiple F 0 cues and post processing suitable for offline and quality sensitive applications [17]. These studies roughly trace the course of STRAIGHT s evolution. The components developed in the first and third topics were replaced by their counterparts developed in the fourth and fifth topics, respectively, and the former no longer exist. The modules developed in the sixth and eighth topics have not yet been integrated into STRAIGHT. At this stage, the most important topic among those presented for realtime STRAIGHT implementation is the fourth one the extended pitch synchronous spectral analysis. The topic of secondary importance is that on F 0 extraction Architecture of STRAIGHT Figure 1 shows the schematic diagram of Matlab STRAIGHT. STRAIGHT is basically a channel VOCODER with enhanced parameter modification capabilities and a very high quality. The parameters manipulated are (a) smoothed spectrogram, (b) fundamental frequency, and (c) time-frequency periodicity map. The frequency resolution of the periodicity map is set to one ERB N rate by smoothing along a nonlinear frequency axis. STRAIGHT offers a graphical interface for analysis, modification, and synthesis, and it also allows direct access to the Matlab functions. The central feature of STRAIGHT is the extended pitch synchronous spectral analysis that provides a smooth artifact-free time-frequency representation of the spectral envelope of the speech signal. ERB: equivalent rectangular bandwidth input speech F0 adaptive interference-free spectral extractor source extractor Fig. 1 F0 spectral modification source 2.2. Extended Pitch Synchronous Analysis The most unique feature of STRAIGHT is its extended pitch synchronous analysis. Unlike other pitch synchronous procedures, STRAIGHT does not require its analysis frame to be aligned with the pitch marks placed on the waveform under study. This analysis employs a compensatory set of windows. The primary window is an effectively isometric Gaussian window convoluted with a pitch adaptive Bartlett window hðtþ. The fundamental period is represented as t 0. w p ðtþ ¼e t 2 t 0 hðt=t 0 Þ ð1þ 1 jtj; jtj < 1 hðtþ ¼ 0 otherwise, where represents a temporal stretching factor for improving the frequency resolution slightly. The operator represents convolution. This convoluted window is pitch synchronized and it yields temporally constant spectral peaks at harmonic frequencies. However, periodic zeros (period is the fundamental period t 0 ) between the harmonic components still remain. The modulation of this window by a sinusoid of frequency f 0 =2 yields a compensating window that produces spectral peaks at positions where zeros were located in the original spectra. w c ðtþ ¼w p ðtþ sin t t 0 output speech minimum phase filter mixed mode excitation source with group delay manipulation Schematic structure of Matlab STRAIGHT. : ð2þ A temporarily stable composite spectrum P r ð!; t;þ is represented as a weighted squared sum of the power spectra P 2 o ð!; t;þ and P2 cð!; t;þ using the original time window and the compensatory window, respectively. qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P r ð!; t;þ¼ P 2 o ð!; t;þþðþp2 cð!; t;þ ð3þ where the mixing coefficient ðþ is numerically optimized to minimize temporal variations both in the peaks and the 141
3 valleys. This provides pitch synchronous spectral analysis without the need for pitch marking. This pitch marking independence is crucial for realtime STRAIGHT implementation. In addition, STRAIGHT introduces spline-based F 0 adaptive spectral smoothing that only eliminates the interferences due to periodic excitation to finally yield a smoothed spectrum [1] the so-called STRAIGHT spectrum Instantaneous Frequency Based F 0 Extraction The quality of source extraction, namely F 0 extraction, is critical for the quality of resynthesized speech. The F 0 trajectory has to be band limited so that it does not contain any traces of the F 0 jump because such discontinuities introduce an additional temporal jitter when F 0 or/and the temporal axis are manipulated. An instantaneous frequency based source extractor was developed to meet these requirements [14] Resynthesis and Group Delay Manipulation Group delay manipulation was introduced to enable an F 0 control that is finer than the resolution determined by the sampling interval. It was also used to reduce the buzzy sound usually found in synthetic speech. Based on prior listening tests and considerations on ecological validity, the minimum phase impulse response was used. All these operations are implemented as group delay manipulations and integrated. 3. REALTIME STRAIGHT Realtime STRAIGHT, named Herium, is a multiplatform software currently available on Windows XP Home/Professional and Mac OS X 10.2 or later versions. This section introduces the issues involved in converting Matlab STRAIGHT into a C-based realtime executable system Architecture Although STRAIGHT was designed to be compatible with realtime applications, the implementation of Matlab STRAIGHT is not compatible with realtime processing. This is because the Matlab version was designed for interactive explorations and offline batch processing. Therefore, a complete reconfiguration of the program architecture was inevitable. Realtime speech processing systems, realtime STRAIGHT being one of them, require an audio interface that is capable of full-duplex recording and playback operations. To handle this duplex data stream between the audio interface and PC, realtime systems must be equipped with speech input and speech output buffers. In addition, constituent processes should communicate based on the status of the data in each buffer Buffer-based processing steps Figure 2 illustrates how the constituent tasks of realtime STRAIGHT are coordinated by sharing two buffers and how each reference pointer for synthesis is synchronized. The figure also shows how the audio interface acquires contiguous output audio data seamlessly. The following steps outline how realtime STRAIGHT processing takes place. (1) Initialize the input and output buffer contents to zero. (2) Fetch the input speech data into a portion located at the end of the input buffer. The length of this portion is referred to as the buffer shift length. (3) Estimate F 0 using the speech data stored in the input buffer. (In the current implementation, a single F 0 is calculated using all the data in the input buffer.) (4) Read a data segment from where a pointer (RPS: reference point for synthesis) points. Next, extract the STRAIGHT spectrum from the read data segment. Finally, transform the STRAIGHT spectrum into an impulse response. (5) Add the impulse response to the output buffer at the RPS of the output buffer. Then, add the fundamental period (reciprocal of F 0 ) for synthesis to the RPS position to advance the pointer. The fundamental period is modified if F 0 conversion is applied to the input speech data. (6) Repeat the previous step until the RPS location surpasses the buffer shift length from the beginning of the buffer. (7) Transmit the buffer shift length portion of the output buffer from the beginning of the buffer to the audio interface. (See Fig. 2) (8) Shift the contents of both buffers backward by a length equal to that of the buffer shift. This operation generates an empty space of buffer shift length at the end of each buffer. The RPS locations are also shifted backward by subtracting the buffer shift length from their location counters. Then, the second step is repeated. This is a pipeline process. In the current implementation, the buffer size is 32 ms and the buffer shift length is 8 ms. The intrinsic delay due to this buffering, 24 ms in this case, Fig. 2 The input buffer (left) and output buffer (right). This plot shows the status just after the sixth step (RPS: reference point for synthesis). 142
4 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT is added to the internal processing delays and latency of the audio interface F 0 extraction The STRAIGHT F 0 extractor has not been implemented in the current realtime STRAIGHT, because the F 0 extraction algorithm in Matlab STRAIGHT must be slightly modified to be implemented in realtime. Therefore, instead of using a STRAIGHT F 0 extractor, the conventional Cepstrum-based F 0 extractor was implemented using the same input buffer structure (32-ms window length with 8-ms frame shift). The quefrency of the maximum peak of the Cepstrum is used to determine the fundamental period and convert it into F 0. A voiced/unvoiced decision is made based on the peak height. The effects of F 0 extractor replacement are discussed in the evaluation section Spectral estimation The spectral extraction of Matlab STRAIGHT is excessively redundant because the main target application is the interactive exploration of speech attributes and perceptual effects. The current default analysis frame rate of Matlab STRAIGHT is 1 ms. However, thanks to the compensating windowing, the current version does not require a fine temporal resolution in the spectral analysis. Realtime STRAIGHT analyzes spectral only when it is required, in other words, when each excitation pulse is generated. This corresponds to a just-in-time analysis. Its implementation is also useful in offline processing because it enables a huge reduction in the necessary storage for STRAIGHT parameters. The F 0 adaptive analysis time window length is updated only once in a single frame. This design was selected because based on our preliminary tests, minor F 0 errors do not degrade STRAIGHT spectra and have negligible effects on the reproduction quality. The spectral analysis in realtime STRAIGHT can be switched between the Cepstrum-based method [18] and the STRAIGHT analysis. The effects of this difference between the spectral analyses are discussed in the evaluation section Synthesis from STRAIGHT spectra The following functions are eliminated in the current realtime STRAIGHT.. Minimum phase impulse response.. Group delay manipulation in a higher frequency region to reduce the buzzyness of the artificial excitation sources. An F 0 control finer than sampling pulse resolution using group delay manipulation. This elimination can slightly degrade the naturalness of resynthesized speech. The difference between realtime STRAIGHT and the conventional analysis/synthesis system is only a spectral representation Implementation Realtime STRAIGHT is a multi-platform system. It is built on a multi-platform audio processing library set splibs, [19] also developed by the first author. The current implementation uses spbase, the basic library; spaudio, the audio input and output library; spcomponent the GUI library; and splib, the signal processing library. The portability of splibs makes realtime STRAIGHT easily portable. It should be noted that the processing time of realtime STRAIGHT is significantly reduced by using fast FFT libraries distributed by chip manufacturers. It was found that in our applications, these FFTs were two to four times faster than the usual mathematical function libraries. In STRAIGHT, the analysis window size is dependent on the sampling frequency. This dependency introduces a super linear increase in the number of numerical operations and introduces an upper limit to the sampling frequency for realtime operation. Based on the tests using various types of PCs, it is safe to state that a normal realtime operation at a sampling frequency of 32 khz is possible on PCs with processors running at around 1 GHz. The latency without an audible gap was less than 100 ms, although the value really depends on the operating system, the audio interface, and so on Evaluation The preliminary evaluation of the effects of the modifications introduced for realizing realtime STRAIGHT was conducted by replacing Matlab STRAIGHT s components with modules that implemented algorithms used in realtime STRAIGHT. Figure 3 shows the trajectories extracted using the conventional Cepstrum pitch extractor and STRAIGHT F 0 extractor. The speech sample is a recorded female speech sampled at 16 khz. The conventional frame based analysis and coarse temporal resolution due to sampling introduces staircase features in the Cepstrum-based trajectory. The perceptual effects of this jaggedness were not prominent in the case of male speech; however, the degradation was noticeable in female speech, especially in the case of highpitched subjects. This degradation is easily audible when headphones are used instead of loud speakers. Voiced/ unvoiced decision errors observed around 100 ms and 200 ms in the Cepstram case can cause noticeable degradation. Figure 4 shows the spectrograms extracted using the conventional fixed frame Cepstrum analysis and STRAIGHT spectral analysis. The fixed frame rate of the conventional method failed to capture the fine and smooth spectral structures found in the STRAIGHT spectrogram. The degradation in the resynthesized speech when the Cepstrum-based spectrogram was used could be easily perceived. This degradation is mainly due to inappropriate 143
5 Fig. 3 F 0 trajectories extracted by Cepstrum-based method (upper plot) and instantaneous frequency based method used in Matlab STRAIGHT (lower plot) Fig. 4 Smoothed spectrogram extracted by Cepstrumbased method (upper plot) and extended pitch synchronous method used in Matlab STRAIGHT (lower plot). (usually too strong) spectral smoothing in the conventional method. Further, this degradation is audible in the current realtime STRAIGHT, which has an option for switching the spectral analysis between the STRAIGHT-based method and the Cepstrum-based method. It is interesting to note that when the STRAIGHT-based F 0 extraction for the Cepstrum analysis/synthesis system was used, a noticeable improvement in the speech quality was observed. Subjective evaluation has also been performed. Ten listeners with normal hearing participated in this experiment. The speech materials used are ten sentences uttered by four males and four females (80 utterances in total). The sampling frequency of the materials is khz. Figure 5 shows the mean opinion scores (MOS) of the subjective experiment. Org is original; STR, Matlab STRAIGHT; RT-STR, realtime STRAIGHT; and Cep- Voc, the conventional Cepstrum analysis/synthesis system. The MOS of the realtime STRAIGHT is approx Male Female Total CepVoc RT-STR STR Org Fig. 5 Results of subjective experiment. The error bars are the 1 standard deviation. 144
6 H. BANNO et al.: IMPLEMENTATION OF REALTIME STRAIGHT The first implementation of realtime STRAIGHT is introduced. Currently, it runs faster than realtime on PCs with processors running at around 1 GHz. However, this performance was achieved by eliminating some important components and replacing some components with conventional procedures. Our next step is to upgrade the downgraded components to Matlab STRAIGHT s latest algorithms by developing computationally efficient implementations. ACKNOWLEDGMENTS The authors would like to thank Professor Fumitada Itakura for his valuable suggestions and discussions. This research was partly supported by MEXT leading project e- Society, the grant in aid for scientific research for young researchers (B) ( ), and ERATO by JST. REFERENCES Fig. 6 STRAIGHT in realtime operation on a tablet PC (Sony VAIO-U) the black unit on a top of the silver unit (AD/DA converter, Roland UA-25). The graphical user interface (GUI) on the PC s display is shown. The left panel in the images shows the GUI of realtime STRAIGHT. Four sliders control the round trip delay, F 0 conversion ratio, frequency axis stretching coefficient and gender of the resynthesized speech. imately one point better than that of the Cepstrum-based system. This indicates that the quality of the realtime STRAIGHT is better than that of Cepstrum-based system. In comparison with Matlab STRAIGHT, realtime STRAIGHT has poorer performance, especially in the case of female speech. This is caused by voiced/unvoiced decision errors in males and females, and jaggedness of F 0 trajectories mainly in females. 4. DEMONSTRATIONS Figure 6 shows a frame from a movie demonstrating realtime STRAIGHT running on a tablet PC (Sony VAIO- U). The figure also shows the GUI of realtime STRAIGHT. In addition to sliders for F 0 conversion and frequency axis stretching, a gender control slider is also installed. This slider changes the frequency stretching factor in proportion to the cubic root of the F 0 conversion factor. This relation, ¼ 1=3, was determined in our preliminary tests, and it was found to be consistent with a number of literatures on speech perception research. The source code of this software is available upon request to researchers who already have Matlab STRAIGHT. 5. CONCLUSION [1] H. Kawahara, I. Masuda-Katsuse and A. de Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction, Speech Commun., 27, (1999). [2] H. Dudley, Remaking speech, J. Acoust. Soc. Am., 11, (1939). [3] H. Matsui and H. Kawahara, Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system, Proc. Eurospeech 2003, Geneva, pp (2003). [4] T. Yonezawa, N. Suzuki, K. Mase and K. Kogure, Gradually changing expression of singing voice based on morphing, Proc. Interspeech 2005, Lisboa, pp (2005). [5] T. Irino and R. D. Patterson, Segregating about the size and shape of the vocal tract using a time-domain auditory model: The Stabilised Wavelet Mellin Transform, Speech Commun., 36, (2002). [6] D. R. R. Smith, R. D. Patterson, H. K. R. Turner and T. Irino, The processing and perception of size in speech sounds, J. Acoust. Soc. Am., 117, (2005). [7] J. Jin, H. Banno, H. Kawahara and T. Irino, Intelligibility of degraded speech from smeared STRAIGHT spectrum, Proc. ICSLP 2004, Vol. 4, Jeju, pp (2004). [8] C. Liu and D. Kewley-Port, Vowel formant discrimination for high-fidelity speech, J. Acoust. Soc. Am., 116, (2004). [9] P. F. Assmann and W. F. Katz, Synthesis fidelity and timevarying spectral change in vowels, J. Acoust. Soc. Am., 117, (2005). [10] H. Kawahara and H. Matsui, Auditory morphing based on an elastic perceptual distance metric in an interference-free timefrequency representation, Proc. ICASSP 2003, Vol. 1, Hong Kong, pp (2003). [11] Y. Sogabe, K. Kakehi and H. Kawahara, Psychological evaluation of emotional speech using a new morphing method, Proc. ICCS 2003, Vol. 2, Melbourne, pp (2003). [12] H. Kawahara and J. C. Williams, Effects of auditory feedback on voice pitch trajectories, in Vocal Fold Physiology, P. J. Davis and N. H. Fletcher, Eds. (Singular Publishing Group, San Diego, 1996) Chap. 18, pp [13] H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited, Proc. ICASSP 97, Vol. 2, Muenich, pp (1997). [14] H. Kawahara, H. Katayose, A. de Cheveigné and R. D. Patterson, Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity, Proc. Eurospeech 99, Vol. 6, Budapest, pp (1999). 145
7 [15] H. Kawahara, Y. Atake and P. Zolfaghari, Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay, Proc. ICSLP 2000, Beijin, pp (2000). [16] H. Kawahara, H. Banno, T. Irino and P. Zolfaghari, Algorithm AMALGAM: Morphing waveform based methods, sinusoidal models and STRAIGHT, Proc. ICASSP 2004, Vol. 1, Montreal, pp (2004). [17] H. Kawahara, A. de Cheveigné, H. Banno, T. Takahashi and T. Irino, Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT, Proc. Interspeech 2005, Lisboa, pp (2005). [18] A. V. Oppenheim, Speech analysis-synthesis system based on homomorphic filtering, J. Acoust. Soc. Am., 45, (1969). [19] Hideki Banno was born in Toukai, Japan on January 22, He received B.E., M.E., and Ph.D. degrees from Nagoya University in 1996, Nara Institute of Science and Technology in 1998, and Nagoya University in 2003, respectively. From 2001 to 2003, he was a research assistant of the Graduate School of Economics, Nagoya University. From 2003 to 2005, he was a research assistant of the Faculty of Systems Engineering, Wakayama University. From 2004 to 2005, he was also an invited researcher of ATR Spoken Language Communication Research Laboratories. He is currently a lecturer of the Faculty of Science and Technology, Meijo University, since His research interests include speech analysis and synthesis, acoustic signal processing and auditory perception. He is a member of ASJ and IEICE. Hiroaki Hata received the B.E. degree in Systems Engineering from Wakayama University in He is currently a master student of the Graduate School of Systems Engineering, Wakayama University. His research interests include implementation of speech analysis-synthesis systems. Masanori Morise received the B.E. and M.E. degrees in Systems Engineering from Wakayama University, in 2004 and 2006, respectively. In 2006, he became a Research Fellow (DC1) of JSPS. He is currently a doctoral candidate of the Graduate School of Systems Engineering of Wakayama University, since His research interests include acoustic signal processing and electro-acoustic systems. He is a member of ASJ and IEICE. Toru Takahashi was born in Akita, Japan on June 8, He received a B.E in computer science, and a M.E. and Ph.D. degree in electrical and electronic engineering from Nagoya Institute of Technology in 1996, 1998, 2004, respectively. He has been a research assistant in the Faculty of Systems Engineering at Wakayama University since His research interests include speech analysis and synthesis, acoustic signal processing, and auditory perception. He is a member of ASJ, IEICE and JPSJ. Toshio Irino was born in Yokohama, Japan, in He received the B.S., M.S., and Ph.D. degrees in electrical and electronic engineering from Tokyo Institute of Technology, 1982, 1984, and 1987, respectively. From 1987 to 1997, he was a research scientist at NTT Basic Research Laboratories. From 1993 to 1994, he was a visiting researcher at the Medical Research Council-Applied Psychology Unit (MRC-APU, currently CBU) in Cambridge, UK. From 1997 to 2000, he was a senior researcher in ATR Human Information Processing Research Laboratories (ATR HIP). From 2000 to 2002, he was a senior research scientist in NTT Communication Science Laboratories. Since 2002, he has been been a professor of the Faculty of Systems Engineering, Wakayama University. He is also a visiting professor at the institute of statistical mathematics. The focus of his current research is a computational theory of the auditory system. Dr. Irino is a member of the Acoustical Society of America (ASA), the Acoustical Society of Japan (ASJ), and the Institute of Electronics, Information and Communication Engineers (IEICE). Hideki Kawahara received the B.E., M.E. and Dr.Eng. degrees in Electrical Engineering from Hokkaido University, Sapporo, Japan in 1972, 1974 and 1977, respectively. In 1977, he joined the Electrical Communications Laboratories of Nippon Telephone and Telegraph public corporation. In 1992, he joined ATR Human Information Processing research laboratories, Japan as a department head. In 1997, he became an invited researcher of ATR. He is currently a professor of the Faculty of Systems Engineering, Wakayama University, since He received the Sato award from the ASJ in 1998, the TELECOM System Technology Prize from the Telecommunications Advancement Foundation Award, Japan in 1998 and the EURASIP best paper award for his contribution to the Speech Communication journal, in His research interests include auditory signal processing models, speech analysis and synthesis, electro-acoustic systems and auditory perception. He is a member of ASA, ASJ, IEEE, IPSJ, ISCA, IEICE and JNNS. 146
STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
INVITED REVIEW STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds Hideki Kawahara Faculty of Systems Engineering, Wakayama University, 930 Sakaedani,
More information2nd MAVEBA, September 13-15, 2001, Firenze, Italy
ISCA Archive http://www.isca-speech.org/archive Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) 2 nd International Workshop Florence, Italy September 13-15, 21 2nd MAVEBA, September
More informationGetting started with STRAIGHT in command mode
Getting started with STRAIGHT in command mode Hideki Kawahara Faculty of Systems Engineering, Wakayama University, Japan May 5, 27 Contents 1 Introduction 2 1.1 Highly reliable new F extractor and notes
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationApplication of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices)
Application of velvet noise and its variants for synthetic speech and singing (Revised and extended version with appendices) (Compiled: 1:3 A.M., February, 18) Hideki Kawahara 1,a) Abstract: The Velvet
More informationAbstract. 1 Introduction
Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous-frequency-based F extraction: Possible role of a repetitive structure in sounds Hideki Kawahara,
More informationHCS 7367 Speech Perception
HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationOverview of Code Excited Linear Predictive Coder
Overview of Code Excited Linear Predictive Coder Minal Mulye 1, Sonal Jagtap 2 1 PG Student, 2 Assistant Professor, Department of E&TC, Smt. Kashibai Navale College of Engg, Pune, India Abstract Advances
More informationINTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)
INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) Proceedings of the 2 nd International Conference on Current Trends in Engineering and Management ICCTEM -214 ISSN
More informationPerception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.
Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions
More informationVOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL
VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence
More informationFeasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants
Feasibility of Vocal Emotion Conversion on Modulation Spectrogram for Simulated Cochlear Implants Zhi Zhu, Ryota Miyauchi, Yukiko Araki, and Masashi Unoki School of Information Science, Japan Advanced
More informationEvaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation
Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationFundamental frequency estimation of speech signals using MUSIC algorithm
Acoust. Sci. & Tech. 22, 4 (2) TECHNICAL REPORT Fundamental frequency estimation of speech signals using MUSIC algorithm Takahiro Murakami and Yoshihisa Ishida School of Science and Technology, Meiji University,,
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationIMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey
Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical
More informationPerception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.
Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,
More informationAUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)
AUDL GS08/GAV1 Auditory Perception Envelope and temporal fine structure (TFS) Envelope and TFS arise from a method of decomposing waveforms The classic decomposition of waveforms Spectral analysis... Decomposes
More informationSynchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech
INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,
More informationBetween physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz
Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation
More informationScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationSGN Audio and Speech Processing
Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations
More informationWARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS
NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationA Comparative Performance of Various Speech Analysis-Synthesis Techniques
International Journal of Signal Processing Systems Vol. 2, No. 1 June 2014 A Comparative Performance of Various Speech Analysis-Synthesis Techniques Ankita N. Chadha, Jagannath H. Nirmal, and Pramod Kachare
More informationHARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS
HARMONIC INSTABILITY OF DIGITAL SOFT CLIPPING ALGORITHMS Sean Enderby and Zlatko Baracskai Department of Digital Media Technology Birmingham City University Birmingham, UK ABSTRACT In this paper several
More informationWaveform generation based on signal reshaping. statistical parametric speech synthesis
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA Waveform generation based on signal reshaping for statistical parametric speech synthesis Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu,
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationAudio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands
Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationSlovak University of Technology and Planned Research in Voice De-Identification. Anna Pribilova
Slovak University of Technology and Planned Research in Voice De-Identification Anna Pribilova SLOVAK UNIVERSITY OF TECHNOLOGY IN BRATISLAVA the oldest and the largest university of technology in Slovakia
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationEpoch Extraction From Emotional Speech
Epoch Extraction From al Speech D Govind and S R M Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati Email:{dgovind,prasanna}@iitg.ernet.in Abstract
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationPR No. 119 DIGITAL SIGNAL PROCESSING XVIII. Academic Research Staff. Prof. Alan V. Oppenheim Prof. James H. McClellan.
XVIII. DIGITAL SIGNAL PROCESSING Academic Research Staff Prof. Alan V. Oppenheim Prof. James H. McClellan Graduate Students Bir Bhanu Gary E. Kopec Thomas F. Quatieri, Jr. Patrick W. Bosshart Jae S. Lim
More informationSPEECH AND SPECTRAL ANALYSIS
SPEECH AND SPECTRAL ANALYSIS 1 Sound waves: production in general: acoustic interference vibration (carried by some propagation medium) variations in air pressure speech: actions of the articulatory organs
More informationADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL
ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of
More informationSGN Audio and Speech Processing
SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although
More informationEnhanced Waveform Interpolative Coding at 4 kbps
Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression
More informationHigh-Pitch Formant Estimation by Exploiting Temporal Change of Pitch
High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationPattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt
Pattern Recognition Part 6: Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Institute of Electrical and Information Engineering Digital Signal Processing and System Theory
More informationSpectral analysis based synthesis and transformation of digital sound: the ATSH program
Spectral analysis based synthesis and transformation of digital sound: the ATSH program Oscar Pablo Di Liscia 1, Juan Pampin 2 1 Carrera de Composición con Medios Electroacústicos, Universidad Nacional
More informationSynthesis Algorithms and Validation
Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided
More informationComplex Sounds. Reading: Yost Ch. 4
Complex Sounds Reading: Yost Ch. 4 Natural Sounds Most sounds in our everyday lives are not simple sinusoidal sounds, but are complex sounds, consisting of a sum of many sinusoids. The amplitude and frequency
More informationNon-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationPossible application of velvet noise and its variant in psychology and physiology of hearing
velvet noise 64-851 93 61-1197 13-6 468-85 51 4-851 4-4-37 441-858 1-1 E-mail: {kawahara,irino}@sys.wakayama-u.ac.jp, minoru.tsuzaki@kcua.ac.jp, banno@meijo-u.ac.jp, mmorise@yamanashi.ac.jp, tmatsui@cs.tut.ac.jp
More informationKONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM
KONKANI SPEECH RECOGNITION USING HILBERT-HUANG TRANSFORM Shruthi S Prabhu 1, Nayana C G 2, Ashwini B N 3, Dr. Parameshachari B D 4 Assistant Professor, Department of Telecommunication Engineering, GSSSIETW,
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationNOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying
More informationMonaural and Binaural Speech Separation
Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as
More informationIII. Publication III. c 2005 Toni Hirvonen.
III Publication III Hirvonen, T., Segregation of Two Simultaneously Arriving Narrowband Noise Signals as a Function of Spatial and Frequency Separation, in Proceedings of th International Conference on
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationAccurate Delay Measurement of Coded Speech Signals with Subsample Resolution
PAGE 433 Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution Wenliang Lu, D. Sen, and Shuai Wang School of Electrical Engineering & Telecommunications University of New South Wales,
More informationSOURCE-filter modeling of speech is based on exciting. Glottal Spectral Separation for Speech Synthesis
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 Glottal Spectral Separation for Speech Synthesis João P. Cabral, Korin Richmond, Member, IEEE, Junichi Yamagishi, Member, IEEE, and Steve Renals,
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationEnhancing 3D Audio Using Blind Bandwidth Extension
Enhancing 3D Audio Using Blind Bandwidth Extension (PREPRINT) Tim Habigt, Marko Ðurković, Martin Rothbucher, and Klaus Diepold Institute for Data Processing, Technische Universität München, 829 München,
More informationSpatial Audio Transmission Technology for Multi-point Mobile Voice Chat
Audio Transmission Technology for Multi-point Mobile Voice Chat Voice Chat Multi-channel Coding Binaural Signal Processing Audio Transmission Technology for Multi-point Mobile Voice Chat We have developed
More informationEXTRACTING a desired speech signal from noisy speech
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 1999 665 An Adaptive Noise Canceller with Low Signal Distortion for Speech Codecs Shigeji Ikeda and Akihiko Sugiyama, Member, IEEE Abstract
More informationAPPLICATIONS OF DSP OBJECTIVES
APPLICATIONS OF DSP OBJECTIVES This lecture will discuss the following: Introduce analog and digital waveform coding Introduce Pulse Coded Modulation Consider speech-coding principles Introduce the channel
More informationModulation Domain Spectral Subtraction for Speech Enhancement
Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationVIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering
VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,
More informationExcitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals
Excitation source design for high-quality speech manipulation systems based on a temporally static group delay representation of periodic signals Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno,
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationEffects of Reverberation on Pitch, Onset/Offset, and Binaural Cues
Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation
More informationSpeech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationDetection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio
>Bitzer and Rademacher (Paper Nr. 21)< 1 Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio Joerg Bitzer and Jan Rademacher Abstract One increasing problem for
More informationBEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor
BEAT DETECTION BY DYNAMIC PROGRAMMING Racquel Ivy Awuor University of Rochester Department of Electrical and Computer Engineering Rochester, NY 14627 rawuor@ur.rochester.edu ABSTRACT A beat is a salient
More informationSPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester
SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis
More informationThe psychoacoustics of reverberation
The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control
More informationMulti-Band Excitation Vocoder
Multi-Band Excitation Vocoder RLE Technical Report No. 524 March 1987 Daniel W. Griffin Research Laboratory of Electronics Massachusetts Institute of Technology Cambridge, MA 02139 USA This work has been
More informationSYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE
SYNTHETIC SPEECH DETECTION USING TEMPORAL MODULATION FEATURE Zhizheng Wu 1,2, Xiong Xiao 2, Eng Siong Chng 1,2, Haizhou Li 1,2,3 1 School of Computer Engineering, Nanyang Technological University (NTU),
More informationPsychology of Language
PSYCH 150 / LIN 155 UCI COGNITIVE SCIENCES syn lab Psychology of Language Prof. Jon Sprouse 01.10.13: The Mental Representation of Speech Sounds 1 A logical organization For clarity s sake, we ll organize
More informationTHE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING
THE HUMANISATION OF STOCHASTIC PROCESSES FOR THE MODELLING OF F0 DRIFT IN SINGING Ryan Stables [1], Dr. Jamie Bullock [2], Dr. Cham Athwal [3] [1] Institute of Digital Experience, Birmingham City University,
More informationAn objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec
An objective method for evaluating data hiding in pitch gain and pitch delay parameters of the AMR codec Akira Nishimura 1 1 Department of Media and Cultural Studies, Tokyo University of Information Sciences,
More informationAuditory modelling for speech processing in the perceptual domain
ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationCepstrum alanysis of speech signals
Cepstrum alanysis of speech signals ELEC-E5520 Speech and language processing methods Spring 2016 Mikko Kurimo 1 /48 Contents Literature and other material Idea and history of cepstrum Cepstrum and LP
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationVocoder (LPC) Analysis by Variation of Input Parameters and Signals
ISCA Journal of Engineering Sciences ISCA J. Engineering Sci. Vocoder (LPC) Analysis by Variation of Input Parameters and Signals Abstract Gupta Rajani, Mehta Alok K. and Tiwari Vebhav Truba College of
More informationFinal Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015
Final Exam Study Guide: 15-322 Introduction to Computer Music Course Staff April 24, 2015 This document is intended to help you identify and master the main concepts of 15-322, which is also what we intend
More informationLab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels
Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels A complex sound with particular frequency can be analyzed and quantified by its Fourier spectrum: the relative amplitudes
More informationNOISE ESTIMATION IN A SINGLE CHANNEL
SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina
More informationYOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION
American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University
More informationReduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter
Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More information