Sound analysis, processing and synthesis tools for music research and production

Size: px
Start display at page:

Download "Sound analysis, processing and synthesis tools for music research and production"

Transcription

1 1 Sound analysis, processing and synthesis tools for music research and production Xavier Rodet Analysis-Synthesis team Ircam, 1 place Stravinsky, Paris, France Abstract We present a set of analysis, processing and synthesis tools developed at Ircam and the efforts pursued to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs which are presented are signal models: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. One of the key elements for easy usage, success and perennity of all these programs is the Sound Description Interchange Format (SDIF) invented and implemented in collaboration with other institutions to allow programs to easily deal with and exchange a/s data. Finally, examples of musical applications and pieces produced with our programs are given. 1 Introduction Since its birth in the 50 s, the domain of analysis, processing and synthesis of musical sound signals has seen a large number of computer programs being developed for research and production. Such developments have been done at the Institut de Recherche et de Coordination Acoustique/Musique (Ircam) since the end of the 70 s. However, it is too often the case that the possibilities offered by these programs, whatever good they can be, are available for only a short lifetime (even for researchers who may be more tolerant about the weaknesses of the tools) or, at worst, that these programs never reach the point where they are used by composers, musicians or researchers. There are several reasons for this loss and it is not the purpose of this paper to list and analyze all of them. But we describe here some of the efforts pursued in the Analysis- Synthesis team at Ircam to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs that will be presented are signal models, as opposed to physical models which are also developed in our team but are not presented here. These signal models are: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. In our institution, such programs are usually developed first in a research version under UNIX and only when they are found to be of interest to musicians and composers are they ported to the Macintosh platform and on FTS-jMax, Ircam s real-time system, for use by composers and musicians. Porting and keeping alive all these libraries and programs on several flavors of Unix, Linux and Macintosh is a huge task. Consequently, a great deal of effort has been done to improve software development and have unique sources for all platforms kept under the version management system CVS. On Macintosh, the environment which has been most developed for signal models (even though it is also well adapted for physical models) is Diphone Studio which will be presented in section 3. One of the key elements for easy usage, success and durability of all these programs is the Sound Description Interchange Format (SDIF) which will be presented in section 4. SDIF, which we have invented and implemented in collaboration with other institutions, allows programs to easily deal with and exchange a/s data and facilitates management and maintenance of the data as well as exchange with other music research centers. Finally, examples of musical applications and pieces produced with our programs will be given in section 5, not forgetting other applications such as post-production or acoustics research. 1

2 2 2 Analysis/synthesis methods and programs 2.1 Chant synthesis The Chant [1],[2] synthesis technique, Formant- Wave-Forms (FOFs) and filters, has been implemented in various environments [3] but a general and portable library was not available. Therefore, a Chant library for UNIX and Macintosh has been designed at IRCAM by X. Rodet and F. Iovino, and written by F. Iovino, with modifications by G. Eckel and C. Rodgers for fast synthesis. It defines a small set of module classes: Formant- Wave-Form (FOF) bank, Filter bank, Noise generator, Sound File and Channel for sound output. Instances of these modules can be arbitrarily patched to build synthesizers offering polyphony and multi-channel output. It has recently been improved and documented [4] by D. Virolle. In particular, an SDIF format has been defined for patches and for the time-varying parameters corresponding to a patch. A general synthesis program, named Chant, reads patch and parameter files and computes the corresponding sound file. This Chant library and synthesizer have been ported to Diphone Studio on Macintosh by A. Lefèvre. As for the other synthesis methods, the computation of a Chant sequence results in a Chant parameter file and the corresponding sound file is computed by the Chant synthesizer. This new extension offers interesting possibilities of FOF and source filter synthesis by using all the features and facilities of the Diphone program. 2.2 Resonance model analysis and synthesis Resonance Modeling [5] is a technique suited for percussive-like sounds or sounds which look like the free response of a linear system (however it has been recently applied to sustained sounds and has produced remarkable results). Such sounds can be considered as a sum of exponentially damped sinusoids which are the responses of the modes of musical instruments or generally of linear systems. The principle of Resonance Modeling is to look for sinusoids in two Fourier Transforms, one applied to a window located at the beginning of the sound where the resonances are of maximum amplitude, the other further in the sound where the resonances have damped. A resonance, i.e. a damped sinuoid, appears as a peak in both transforms, with similar frequency. Therefore peaks in one transform are matched to peaks in the other and couples with close enough frequencies are considered as a resonance. The amplitude and the decay rate of such a resonance are easily deduced from the amplitude of the peak in each transform. The process is done iteratively with various window sizes and positions to improve the detection and the estimation of the resonances. The final result is a Resonance Model of the sound, i.e. a set of resonances, each one characterized by its frequency, its amplitude and its decay rate. One can view the sound as the sum of exponentially damped sinusoids, or as the response to an impulse of a bank of second order filters implementing the resonances. In both cases, synthesis can be done with a Chant synthesizer. The first case is what comes out of a bank of FOF generators triggered once. The second case, bank of second order filters, is also implemented in the Chant synthesizer mentionned in section 2.1. After the first version in 1985, a C version has been written in 1989 by P.-F. Baisnée [6] and ported to UNIX and MPW. Recently, this code has been improved and ported to various UNIX platforms at Ircam by Francesc Marti [7] and ported to Macintosh in Diphone Studio by A. Lefèvre and is named ModRes. In particular, SDIF i/o has been added for storage of Resonance Models in terms of FOFs or filters. Therefore, Resonance Models can be used in Diphone Studio using the Chant synthesis plugin. Let us also mention two real-time implementations of FOF synthesis. It has been ported to Ircam s realtime system FTS-jMax and extended under the name FOG by G. Eckel and F. Iovino [8]. It has also been ported to Max-MSP by F. Iovino and R. Dudas. 2.3 Spectral envelopes A spectral envelope is a curve in the frequencymagnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a high level approach to their handling has been taken [9]. A C- library and applications [10] using the SDIF standard have been developed by D. Schwarz dealing with spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear 2

3 3 prediction, cepstrum or discrete cepstrum. In [11] the strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) have been added. For speech signals, a composite envelope is shown to be advantageous [12]. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements should be fulfilled such as stability, locality, and flexibility. As a consequence of these requirements, several representations (filter coefficients, sampled, break-point-functions, splines, formants) are available in the library. The notion of fuzzy formants based on formant regions has also been introduced. Some general forms of manipulations and morphing are offered. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are also used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 method (see section 2.4 Additve). Some features remain to be implemented. For instance, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency and the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. The SpecEnv library and applications are used in various real-time and non real-time programs on UNIX and Macintosh. On UNIX, the estimate application [13] computes spectral envelopes from signals and from additive data, while the modformat and filnor programs allows modification and application of spectral envelopes to additive+residual representations. These programs have been written by D. Schwarz, G. Poirot and S. Roux and have been ported to Diphone Studio by A. Lefèvre. Spectral envelopes for additive representation and for residual have been ported to FTS-jMax by N. Schnell. 2.4 Additive+residual method Additive synthesis, that is the summation of timevarying sinusoidal components [14], [15] is accepted as perhaps the most powerful and flexible synthesis method. To take into account nonsinusoidal signals, a residual or noise is also added to the sinusoids [16]. Additive+residual synthesis allows the pitch and length of sounds to be varied independently [17]. Furthermore, because independent control of every component is available in additive synthesis, it is possible to implement models of perceptually significant features of sound such as inharmonicity, roughness and fine control over the sound spectrum for timbre manipulations, such as continuous changes in its detailed structure. Another important aspect of additive synthesis is the simplicity of the mapping of frequency and amplitude parameters into the human perceptual space. These parameters are meaningful and easily understood by musicians. Recently, a new additive synthesis method based on spectral envelopes and Fast Fourier Transform has been developed [18]. Use of the inverse FFT reduces the computation cost by a factor in the order of 15 compared to oscillators. Furthermore, noise signals of any spectral density are also easily synthesized by this method. It is sufficient to add the desired spectral density in the spectrum prior to inverse FFT. This technique, which we name fft-1 additive, renders possible the design of low cost real-time synthesizers allowing processing of recorded and live sounds, synthesis of instruments and synthesis of speech and the singing voice. The IRCAM Additive Analysis/Synthesis software, developed by X. Rodet, Ph. Depalle, G. Garcia and R. Woehrman, has been used at IRCAM for several years by researchers [19] and by musicians. It has been rewritten by G. Garcia as a library named Pm [20] and has been ported to Macintosh by A. Lefèvre. SDIF i/o facilities have been added to Pm by D. Schwarz. The analysis of harmonic sounds relies on the 3

4 4 selection of peaks of the Short Time Fourier Transform which are close to a multiple (harmonic) of the local fundamental frequency. The analysis of inharmonic sounds relies on the building of trajectories of sinusoids with statistically optimal properties of continuity of parameter values (frequency, amplitude and phase) and/or of their slopes [21]. This is implemented in a program named hmm (Hidden Markov Model) after the method it uses. hmm [22] has been written by G. Garcia, Ph. Depalle and recently improved by P. Chose who added SDIF i/o and a graphic user interface named sview [23]. In the case of harmonic sounds, the first analysis step is a fundamental frequency estimation done by IRCAM's f0 program [24] and stored in an f0 file. In the second step, a sliding window FFT analysis is applied to the sound file and produces an FFT file. Then, the peaks of each FFT are detected and their frequency, amplitude and phase are estimated by polynomial interpolation, resulting in a peaks file. Finally, peaks of successive windows are grouped into sinusoidal partial trajectories. Trajectories are stored in a partial file which contains, for each of the successive windows, the frequency, amplitude and phase of each partial. All these data can be stored in, or retrieved from SDIF files (see section 4). The fft-1 additive method mentioned above has been written for the real-time system FTS-jMax by N. Schnell [25]. It provides the synthesis of hundreds of arbitrary sinusoidal partials and of noise with any required spectral density at a very low cost. 2.5 PSOLA analysis and synthesis PSOLA (Pitch Synchronous OverLap-Add) [26] is a method based on the decomposition of a signal into a series of elementary waveforms so that each waveform represents one of the successive pitch periods of the signal and the sum (overlap-add) of them reconstitutes the signal [27]. PSOLA works directly with the signal waveform and therefore is not expensive while not losing any detail of the signal. But in opposition to usual sampling, PSOLA allows independent control of pitch, duration and formants of the signal. PSOLA analysis consists of positioning markers [28] pitch-synchronously, i.e. the interval between two successive markers is equal to the local fundamental period, and such that each marker is close to the local maximum of the signal energy. The signal is then decomposed into a series of elementary waveforms by applying analysis windows centered on the markers. PSOLA synthesis proceeds by overlap-add of the waveforms re-positioned on time instants calculated so that the interval between two successive waveforms is equal to the desired local pitch period. In the usual PSOLA method, time stretching or compression is obtained by repeating or skipping waveforms. However, in case of strong time-stretching, repetition produces audible signal discontinuities. This is the reason why a TDI- and a FDI PSOLA (Time Domain and Frequency Domain Interpolation PSOLA) have been proposed [28], where the waveforms to be overlap-added are interpolated between two successive waveforms of the analyzed signal. By its definition, the PSOLA method allows only modification of the periodic parts of the signal. For the portions of the signal which are not periodic but random-like, the processing differs [29] in order to preserve the randomness and avoid introducing an artificial correlation in these parts, which would then be perceived as tones ( flanging effect ). It is thus necessary to estimate which parts of the signal are periodic, which are non-periodic and which are transient. In the case of the voice, the periodic part of the signal is produced by the vibration of the vocal chords and is called voiced. In our analysis algorithm, we extend this notion to any signal: at each time instant t, a voicing coefficient v(t) is estimated. This coefficient is obtained by use of a Phase Derived Sinusoidality measure [29]. For each time/frequency region, the instantaneous frequency is compared to the frequency measured from spectral peaks. If they match, the time/frequency region is said to be sinusoidal. If, for a specific time, most regions of the spectrum are sinusoidal, this time frame is said to be voiced and is therefore processed by the PSOLA algorithm. Otherwise it is considered as random. Two main advantages of the PSOLA method are preservation of phase even when the length of the sound is modified and preservation of the spectral envelope (formant positions) even when pitch is shifted. High-quality transformations of signals can be obtained at very low computational cost. For modification of spectral envelope independently of pitch, a Frequency Shifting (FS-PSOLA [29]) method has been proposed. The PSOLA method can also be viewed as close to granular synthesis in which each grain corresponds to one pitch period, or close to Chant- 4

5 5 FOF synthesis since PSOLA elementary waveforms can be considered as an approximation of Formant Waveforms but without explicit estimation of source and filter parameters. G. Peeters has developed a PSOLA analysis and synthesis package on UNIX named psolab. The synthesis algorithm has been ported to the real-time system FTS-jMax by N. Schnell. A musical application for synthesizing a choir in real-time as well as other applications using PSOLA are described in section 5. It should be noticed that overlap-add (OLA) does not need the signal to be sinusoidal and therefore gives good results for portions of the signal where additive analysis fails, for example in fast transients or random portions. This is why G. Peeters has developed an a/s scheme, known as sinola [29], which combines additive sinusoids and OLA. 3 Diphone Studio Diphone Studio, developed at Ircam by A. Lefèvre and co-workers, is the package where the above mentioned analysis and synthesis programs are ported to Macintosh to be called by the Diphone control program itself [30]. Let us first explain the principles of diphone control. Diphone control is a powerful means of building a musical phrase from dictionaries of sound segments by concatenating and articulating them. Musical phrases can be built from any sound segment (not only instrumental sounds but any recorded or synthetic sounds) stored in the form of time varying control parameters of any synthesis method such as additive synthesis or physical model. A musical sentence is obtained by the concatenation of parameters from each segment of the sequence of segments representing the desired sentence. Between two successive segments, parameter values are interpolated. The stream of parameter values so obtained is fed into a synthesis engine which computes the resulting sound signal. Dictionaries of sound segments coded into parameters are built as follows. At first, sound recordings are analyzed through a method such as additive analysis, resulting in a parameter file, e.g. time varying frequency and amplitude of sinusoidal partials. In the second step, the parameter file is cut into segments representing units of sound suited for the musical usage in view. Finally, these segments are recorded in dictionaries. The analysis methods mentioned above are implemented in separate programs with specific GUIs. For example, additive analysis is implemented in the AddAn program and resonance modeling in the ModRes program. In AddAn, for instance, the different steps of the additive analysis are specified and triggered through an analysis panel and an analysis script file. The panel allows the user to set any of the analysis parameters, such as window size, window step, etc.. All the settings are also accessible through an ASCII console and can be stored into, or restored from, an analysis script file. For the rest, i.e. control and synthesis, they are achieved through the Diphone program itself. Since Generalized Diphone Control is aimed at providing control parameters for any synthesis method and in order to facilitate extension to any synthesis engine, the program has been cut into separated modules using shared libraries and plugins. The central part is in charge of sequence concatenation, independently of the synthesis model under use. Peripheral plugins are in charge of the computation of parameters streams for specific synthesis methods and of the synthesis engines themselves, such as additive synthesis or Chant synthesis. Analysis and synthesis engines are implemented in terms of shared libraries with specific parsers accepting the exact UNIX command lines for best compatibility. The Diphone control program also has a powerful GUI. The first component of the GUI is a dictionary browser which provides graphical facilities for looking at dictionaries, displaying their contents, modifying them, etc.. The second component is a sequence editor. It allows for the building of sequences of segments and the tuning of segment parameter values such as position on the time axis, loudness or interpolation zones. Parameter evolution as stored in segments or as calculated from a sequence, are represented by break-pointfunctions (BPFs). The third component is a BPF editor which allows the user to create BPFs or to modify them, in particular the complicated data obtained from the analysis programs, frequency and amplitude of sinusoidal partials for instance. The Diphone program constitute a very flexible and inspiring tool for composition and synthesis, which musicians are using at IRCAM and outside for music creation. 4 The Sound Description Interchange Format (SDIF) The Sound Description Interchange Format (SDIF) [31] is a recently adopted standard that can store a 5

6 6 variety of sound representations including spectral, time domain, and higher-level models. SDIF consists of a specified data format and a set of standard sound descriptions and their official representation. SDIF is flexible in that new sound descriptions can be represented, and new kinds of data can be added to existing sound descriptions, facilitating innovation and research. This standard is developed in collaboration by several research centers, notably Ircam, CNMAT and Audiovisual Institute of the Pompeu Fabra University. The main goal of SDIF is to promote interchange by providing a common format for a variety of sound descriptions. For example, a panel appears at ICMC 2000 in Berlin on various Analysis/Synthesis Techniques where the analysis/synthesis data are exclusively shared, between seven different institutions, in SDIF format. The SDIF specification is open and publicly available as well as C and C++ libraries, available at no charge from and for multiple platforms, including SGI IRIX, DEC Alpha OSF, Apple MacOS, Windows, and Linux. SDIF has standardized formats to support common extant sound descriptions. SDIF allows custom versions of these representations that include all the standard data in the standard places, plus extra fields. Entirely new representations can also be added. SDIF was designed to support file storage and Internet streaming of aggregates of various sound descriptions. We hope that SDIF will encourage and facilitate the development of new tools for manipulating sound in spectral and other domains, promote the use of interesting sound descriptions in general, and facilitate sharing of work within the community. The body of an SDIF file is a sequence of timetagged frames modeled after chunks in the IFF/AIFF/RIFF formats. The time tag is an eight byte floating point number that indicates the time to which the given frame applies. By allowing any kind of frames in the same file, SDIF is also an aggregate or archive format. Data in a frame are stored in one or more 2D matrices. Matrix columns correspond to parameters like frequency or amplitude; each row represents an object like a filter, sinusoid, or noise band. Among the many SDIF sound description types (frames), the following are used by the libraries and programs quoted here above: fundamental frequency, signal windows, discrete short-time Fourier transform, picked spectral peaks, sinusoidal tracks, harmonic sinusoidal tracks, FOFs, resonance models, white noise, signal samples, PSOLA markers, voicing coefficient, time-domain envelope, spectral envelope, cepstral coefficients, autocorrelation coefficients, autoregressive coefficients, reflection coefficients. A number of utility programs have been written at Ircam by D. Schwarz, D. Virolle and P. Tisserand, among which are: an invertible SDIF to text transformation; sdifextract, which extract data from an SDIF file according to time-interval, frame, matrix, row, column, etc.; various conversion tools to other textual and binary formats, a drag&drop conversion tool for Macintosh, a reader/writer for Matlab, a real-time parameter reader/writer for jmax patches and a reader/writer for OpenMusic. Other utility programs have been written in other institutions, such as a merger and a reader/writer for Max-MSP at CNMAT. 5 Musical and other usage 5.1 Musical creation at Ircam Various pieces have been created at Ircam using the tools which we have described above. Let us quote a few: M by P. Leroux, Voile by D. Cohen, Epitafios by A. Vinao and Mountain Language by J. Wood have been done with Diphone, ModRes and AddAn. 5.2 A PSOLA virtual choir Composer P. Manoury is working on a new opera K based on F. Kafka s novel Der Prozess, with the musical assistance of S. Lemouton [32]. For several scenes of this opera, the composer has expressed the need for choral voices evoking the notion of crowd and for sounds unusual or impossible for a real choir. It was decided that the virtual choir would be the superposition of multiple well enough distinguishable voices. The PSOLA method described here above was found a proper way to create the different voices with individual differences and to allow a wide range of transformations [32]. A whole group of voices is derived from a single recorded voice by using the real-time PSOLA implementation in FTX-jMax. Each individual voice differs from the others by pitch deviation and duration change allowed by PSOLA. Other modifications are possible such as suppressing the vibrato in the recorded voice and imposing a new and different vibrato for the 6

7 7 individual voices. Interesting effects can be obtained when using the voicing coefficient. For example, voiced segments can be stretched more than unvoiced ones. Similarly, vowels and consonants can be independently processed and spatialized. Sound examples will be played at the conference. 5.3 Post-processing A few years ago, our team was asked to create the voice of a castrato for the movie Farinelli [33]. This was done essentially by morphing real soprano and counter-tenor voices into a virtual castrato voice with our Super-VP phase vocoder. Recently, another demand came from the movie industry to improve the English pronunciation of a French actor in an English movie. PSOLA was found adequate to correct the prosody by changing pitch evolution and duration and energy of phonemes. Similarly, we have been asked to work on recordings of an actor who has a noticeable German accent in French. The task is to diminish his accent and eventually to change it in the direction of a neutral accent or even an Italian accent. We found this possible with a combination of PSOLA for the prosody and Spectral Envelope modification to change the timbre of the voice as well. These different works have been done by S. Rossignol, G. Peeters and A. Lithaud. Finally, let us mention a non-musical application. Additive+residual has been used to create new samples in a psychoacoustic research project on the sound of air-conditioning devices. These devices produce sounds with well defined sinusoidal partials and a random component heard as a noise. Our programs have been used by I. Perry at Ircam to create new air-conditioning sounds located between existing ones, by interpolating sinusoidal partials on the one hand and noise spectral envelope on the other. The application of such an interpolation to musical sounds can be heard in [34], where, for instance hybrids of trombone and flute with varying factors provide a continous timbre change from one instrument to the other. 6 Conclusion We have presented in this paper a whole set of libraries and programs for sound analysis, processing and synthesis using signal models. A great deal of effort has been spent to assure the development, maintenance and easy usage of these tools for musicians. We have shown that a key component to guaranty such facilities is the Sound Description Interchange Format developed at Ircam in collaboration with CNMAT and other research centers. It appears that SDIF has gained ground as a standard. SDIF has demonstrated its utility sufficiently that it will be used for future projects, and we hope to see SDIF adopted by many other groups as well. 7 References [1] Rodet, X., "Time Domain Formant-Wave- Function Synthesis", Cambridge, Massachusets, Computer Music Journal, Vol 8, n 3, [2] d'alessandro, C. and Rodet, X. (1989). Synthèse et analyse-synthèse par fonctions d'ondes formantiques. J. Acoustique, (2): [3] Clarke, J. M., Manning, P.D., Berry, R, and Purvis, A., VOCEL: New implementations of the FOF synthesis method. Proc. In. Comp. Music Conf., ICMC88, Cologne 1988, pp [4] [5] Y. Potard, P-F. Baisnée & J-B.Barrière (1986) Experimenting with Models of Resonance Produced by a New Technique for the Analysis of Impulsive Sounds, Proceedings of 1986 International Computer Music Conference, La Haye, Berkeley: Computer Music Association, pp [6] Baisnée & J-B.Barrière Baudot ICMC89 Barriere, J.-B., Baisnee, P.-F., Freed, A., Baudot, M.-D., "A Digital Signal Multiprocessor and its Musical Application", Proceedings of the 15th International Computer Music Conference, Ohio State University, 1989, Computer Music Association. [7] anamod.html [8] Eckel, G., Iturbide, M.R. ans Becker, B., The development of GiST, a Granular Synthesis Tool Kit Based on an Extension of the FOF Generator, Proc. Int. Comp. Music Conf., ICMC95, Banff, 1995, pp [9] Schwarz, D., Rodet, X., Spectral envelope estimation, representation, and morphing for sound analysis, transformation, and synthesis. Proc. Int. Comp. Music Conf., ICMC99, Beijing, Oct. 99, pp [10] [11] Rodet, X., Schwarz, D., Spectral envelopes and additive+residual analysis-synthesis, to appear in J. Beauchamp ed. The Sound of Music. Springer N.Y. to be published. 7

8 8 [12] Oudot M., "Estimation of the spectral envelope of mixed spectrum signals using a penalized likelihood criterion", IEEE Trans. Speech and Audio Processing, Juin [13] estimate.html [14] McAulay RJ, Quatieri ThF. Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Trans. on Acoust., Speech and Signal Proc., vol ASSP pp [15] Serra X. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition. Philosophy Dissertation, Stanford University, Oct [16] Serra, Xavier. "Musical Sound Modeling with Sinusoids plus Noise," in G. D. Poli, A. Picialli, S. T. Pope, and C. Roads, editors, Musical Signal Processing. Swets & Zeitlinger Publishers, [17] Quatieri ThF, McAulay RJ. Shape Invariant Time-Scale and Pitch Modification of Speech. IEEE Trans. on Signal Processing, Vol. 40 No. 3, March [18] Depalle, P., Rodet, X, A new additive synthesis method using inverse Fourier transform, Int. Comp. Music Conf., San-Jose, Oct. 92. [19] index-e.html [20] [21] P. Depalle, G. García & X. Rodet, Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP-93, Minneapolis, Min., Apr [22] dochmm.html [23] docsview/ [24] [25] Déchelle et al. 1999a ICMC Déchelle, F., Borghesi, R., Cecco, M. D., Maggi, E., Rovan, B. and Scnell, N., jmax: An environment for real-tme musical applications, Comp. Music J., 23(3): [26] Moulines, E. and Charpentier, F. (1990). Pitch- Synchronous Waveform Processing Techniques for Text-To-Speech Synthesis using Diphones. Speech Communication, (9): [27] [28] Peeters, G. (1998). Analyse-Synthèse des sons musicaux par la méthode PSOLA. In proc. Journées Informatique Musicale, Agelonde, France. [29] Peeters, G. and Rodet, X. (1999). Non- Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum. In Proc. Int. Congr. Signal Proc. Applic. and Tech., Orlando, USA. [30] X. Rodet, A. Lefèvre: The Diphone program: New features, new synthesis methods and experience of musical use, proc. Int. Comp. Music Conference, Thessaloniki, 1997 [31] Wright, M., Chaudhary, A, Freed, A., Wessel, D., Rodet, X., Virolle, D., Woehrmann, R., Serra, X., New Applications of the Sound Description Interchange Format, proc. Int. Comp. Music Conf. ICMC98, Ann Arbor, Michigan, USA, Oct. 1998, pp [32] Schnell, N., Peeters, G., Lemouton, S., Manoury, P. and Rodet, X., Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), proc. Int. Comp. Music Conf. ICMC 2000, Berlin, Sep [33] P. Depalle, G. García & X. Rodet, A virtual Castrato (!?), Proc. Int. Computer Music Conference, Aarhus, Denmark, Oct [Schwarz 98] Schwarz, D., "Spectral Envelopes in Sound Analysis and Synthesis," ICMC [34] 8

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,

More information

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor

More information

Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA)

Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA) Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA) Norbert Schnell, Geoffroy Peeters, Serge Lemouton,! "#$ %'&(')'!+*-, Philippe Manoury, Xavier Rodet IRCAM - CENTRE GEORGES-POMPIDOU

More information

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

A Parametric Model for Spectral Sound Synthesis of Musical Sounds A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick

More information

Sound Synthesis Methods

Sound Synthesis Methods Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like

More information

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Geoffroy Peeters, avier Rodet To cite this version: Geoffroy Peeters, avier Rodet. Signal Characterization in terms of Sinusoidal

More information

Timbral Distortion in Inverse FFT Synthesis

Timbral Distortion in Inverse FFT Synthesis Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials

More information

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

Hungarian Speech Synthesis Using a Phase Exact HNM Approach Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University

More information

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of

More information

L19: Prosodic modification of speech

L19: Prosodic modification of speech L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture

More information

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING Jeremy J. Wells, Damian T. Murphy Audio Lab, Intelligent Systems Group, Department of Electronics University of York, YO10 5DD, UK {jjw100

More information

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis Cornelia Kreutzer, Jacqueline Walker Department of Electronic and Computer Engineering, University of Limerick, Limerick,

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,

More information

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation

More information

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,

More information

Analysis/Synthesis Comparison

Analysis/Synthesis Comparison Analysis/Synthesis Comparison Matthew Wright (CNMAT), James Beauchamp (UIUC), Kelly Fitz (CERL Sound Group), Xavier Rodet (IRCAM), Axel Röbel (CCRMA, now IRCAM), Xavier Serra (IUA/UPF), Gregory Wakefield

More information

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis

More information

Spectral analysis based synthesis and transformation of digital sound: the ATSH program

Spectral analysis based synthesis and transformation of digital sound: the ATSH program Spectral analysis based synthesis and transformation of digital sound: the ATSH program Oscar Pablo Di Liscia 1, Juan Pampin 2 1 Carrera de Composición con Medios Electroacústicos, Universidad Nacional

More information

Analysis/synthesis comparison

Analysis/synthesis comparison Analysis/synthesis comparison MATTHEW WRIGHT (CNMAT), JAMES BEAUCHAMP (UIUC), KELLY FITZ (CERL Sound Group), XAVIER RODET (IRCAM), AXEL RÖBEL (CCRMA, now IRCAM), XAVIER SERRA (IUA/UPF) and GREGORY WAKEFIELD

More information

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday. L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are

More information

What is Sound? Part II

What is Sound? Part II What is Sound? Part II Timbre & Noise 1 Prayouandi (2010) - OneOhtrix Point Never PSYCHOACOUSTICS ACOUSTICS LOUDNESS AMPLITUDE PITCH FREQUENCY QUALITY TIMBRE 2 Timbre / Quality everything that is not frequency

More information

Sound Modeling from the Analysis of Real Sounds

Sound Modeling from the Analysis of Real Sounds Sound Modeling from the Analysis of Real Sounds S lvi Ystad Philippe Guillemain Richard Kronland-Martinet CNRS, Laboratoire de Mécanique et d'acoustique 31, Chemin Joseph Aiguier, 13402 Marseille cedex

More information

Singing Expression Transfer from One Voice to Another for a Given Song

Singing Expression Transfer from One Voice to Another for a Given Song Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam MACLab Music and Audio Computing Introduction Introduction

More information

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics

More information

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

ADDITIVE synthesis [1] is the original spectrum modeling

ADDITIVE synthesis [1] is the original spectrum modeling IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,

More information

INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE

INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE Pierre HANNA SCRIME - LaBRI Université de Bordeaux 1 F-33405 Talence Cedex, France hanna@labriu-bordeauxfr Myriam DESAINTE-CATHERINE

More information

Converting Speaking Voice into Singing Voice

Converting Speaking Voice into Singing Voice Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech

More information

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Adaptive noise level estimation

Adaptive noise level estimation Adaptive noise level estimation Chunghsin Yeh, Axel Roebel To cite this version: Chunghsin Yeh, Axel Roebel. Adaptive noise level estimation. Workshop on Computer Music and Audio Technology (WOCMAT 6),

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

CMPT 468: Frequency Modulation (FM) Synthesis

CMPT 468: Frequency Modulation (FM) Synthesis CMPT 468: Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University October 6, 23 Linear Frequency Modulation (FM) Till now we ve seen signals

More information

Synthesis Techniques. Juan P Bello

Synthesis Techniques. Juan P Bello Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals

More information

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio

More information

DAFX - Digital Audio Effects

DAFX - Digital Audio Effects DAFX - Digital Audio Effects Udo Zölzer, Editor University of the Federal Armed Forces, Hamburg, Germany Xavier Amatriain Pompeu Fabra University, Barcelona, Spain Daniel Arfib CNRS - Laboratoire de Mecanique

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

Combining granular synthesis with frequency modulation.

Combining granular synthesis with frequency modulation. Combining granular synthesis with frequey modulation. Kim ERVIK Department of music University of Sciee and Technology Norway kimer@stud.ntnu.no Øyvind BRANDSEGG Department of music University of Sciee

More information

Speech Synthesis; Pitch Detection and Vocoders

Speech Synthesis; Pitch Detection and Vocoders Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Advanced audio analysis. Martin Gasser

Advanced audio analysis. Martin Gasser Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high

More information

Plaits. Macro-oscillator

Plaits. Macro-oscillator Plaits Macro-oscillator A B C D E F About Plaits Plaits is a digital voltage-controlled sound source capable of sixteen different synthesis techniques. Plaits reclaims the land between all the fragmented

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation

More information

Synthesis Algorithms and Validation

Synthesis Algorithms and Validation Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided

More information

GMU, A FLEXIBLE GRANULAR SYNTHESIS ENVIRONMENT IN MAX/MSP

GMU, A FLEXIBLE GRANULAR SYNTHESIS ENVIRONMENT IN MAX/MSP GMU, A FLEXIBLE GRANULAR SYNTHESIS ENVIRONMENT IN MAX/MSP Charles Bascou and Laurent Pottier GMEM Centre National de Creation Musicale 15, rue de Cassis 13008 MARSEILLE FRANCE www.gmem.org charles.bascou@free.fr

More information

SGN Audio and Speech Processing

SGN Audio and Speech Processing Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations

More information

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW

NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW Hung-Yan GU Department of EE, National Taiwan University of Science and Technology 43 Keelung Road, Section 4, Taipei 106 E-mail: root@guhy.ee.ntust.edu.tw

More information

Audio Imputation Using the Non-negative Hidden Markov Model

Audio Imputation Using the Non-negative Hidden Markov Model Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.

More information

A system for automatic detection and correction of detuned singing

A system for automatic detection and correction of detuned singing A system for automatic detection and correction of detuned singing M. Lech and B. Kostek Gdansk University of Technology, Multimedia Systems Department, /2 Gabriela Narutowicza Street, 80-952 Gdansk, Poland

More information

ADAPTIVE NOISE LEVEL ESTIMATION

ADAPTIVE NOISE LEVEL ESTIMATION Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

Band-Limited Simulation of Analog Synthesizer Modules by Additive Synthesis

Band-Limited Simulation of Analog Synthesizer Modules by Additive Synthesis Band-Limited Simulation of Analog Synthesizer Modules by Additive Synthesis Amar Chaudhary Center for New Music and Audio Technologies University of California, Berkeley amar@cnmat.berkeley.edu March 12,

More information

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-3), London, UK, September 8-11, 23 FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION Jean Laroche Creative Advanced Technology

More information

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech L. Demri1, L. Falek2, H. Teffahi3, and A.Djeradi4 Speech Communication

More information

Glottal source model selection for stationary singing-voice by low-band envelope matching

Glottal source model selection for stationary singing-voice by low-band envelope matching Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,

More information

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Linear Frequency Modulation (FM) CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 26, 29 Till now we

More information

A Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features

A Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features A Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features Noufiya Nazarudin, PG Scholar, Arun Jose, Assistant Professor Department of Electronics and Communication

More information

Transcription of Piano Music

Transcription of Piano Music Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk

More information

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)

More information

Speech Processing. Simon King University of Edinburgh. additional lecture slides for

Speech Processing. Simon King University of Edinburgh. additional lecture slides for Speech Processing Simon King University of Edinburgh additional lecture slides for 2018-19 assignment Q&A writing exercise Roadmap Modules 1-2: The basics Modules 3-5: Speech synthesis Modules 6-9: Speech

More information

Feature extraction and temporal segmentation of acoustic signals

Feature extraction and temporal segmentation of acoustic signals Feature extraction and temporal segmentation of acoustic signals Stéphane Rossignol, Xavier Rodet, Joel Soumagne, Jean-Louis Colette, Philippe Depalle To cite this version: Stéphane Rossignol, Xavier Rodet,

More information

Presented at the 93rd Convention 1992 October 1-4 SanFrancisco

Presented at the 93rd Convention 1992 October 1-4 SanFrancisco Spectral Envelopes and Inverse FFT Synthesis 3393 (H-3) X. Rodet and P. Depall IRCAM Paris, France Presented at the 93rd Convention 1992 October 1-4 SanFrancisco AuDIO This preprint has been reproduced

More information

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Myeongsu Kang School of Computer Engineering and Information Technology Ulsan, South Korea ilmareboy@ulsan.ac.kr

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Direction-Dependent Physical Modeling of Musical Instruments

Direction-Dependent Physical Modeling of Musical Instruments 15th International Congress on Acoustics (ICA 95), Trondheim, Norway, June 26-3, 1995 Title of the paper: Direction-Dependent Physical ing of Musical Instruments Authors: Matti Karjalainen 1,3, Jyri Huopaniemi

More information

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Paul Masri, Prof. Andrew Bateman Digital Music Research Group, University of Bristol 1.4

More information

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING

COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING 17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING Alexey Petrovsky

More information

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University

More information

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION

SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Robust Voice Activity Detection Based on Discrete Wavelet. Transform Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper

More information

Time-Frequency Distributions for Automatic Speech Recognition

Time-Frequency Distributions for Automatic Speech Recognition 196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Monophony/Polyphony Classification System using Fourier of Fourier Transform International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye

More information

A GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING. Martin Raspaud, Sylvain Marchand, and Laurent Girin

A GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING. Martin Raspaud, Sylvain Marchand, and Laurent Girin Proc. of the 8 th Int. Conference on Digital Audio Effects (DAFx 5), Madrid, Spain, September 2-22, 25 A GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING Martin Raspaud,

More information

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B. INTRODUCTION TO COMPUTER MUSIC FM SYNTHESIS A classic synthesis algorithm Roger B. Dannenberg Professor of Computer Science, Art, and Music ICM Week 4 Copyright 2002-2013 by Roger B. Dannenberg 1 Frequency

More information

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Mikko Parviainen 1 and Tuomas Virtanen 2 Institute of Signal Processing Tampere University

More information

RECENTLY, there has been an increasing interest in noisy

RECENTLY, there has been an increasing interest in noisy IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In

More information

Developing a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab

Developing a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab Developing a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab 2009-2010 Victor Shepardson June 7, 2010 Abstract A software audio synthesizer is being implemented in C++,

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Sinusoidal Modelling in Speech Synthesis, A Survey.

Sinusoidal Modelling in Speech Synthesis, A Survey. Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Pitch Period of Speech Signals Preface, Determination and Transformation

Pitch Period of Speech Signals Preface, Determination and Transformation Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com

More information

HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS

HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS ARCHIVES OF ACOUSTICS 29, 1, 1 21 (2004) HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS M. DZIUBIŃSKI and B. KOSTEK Multimedia Systems Department Gdańsk University of Technology Narutowicza

More information

UNIT-4 POWER QUALITY MONITORING

UNIT-4 POWER QUALITY MONITORING UNIT-4 POWER QUALITY MONITORING Terms and Definitions Spectrum analyzer Swept heterodyne technique FFT (or) digital technique tracking generator harmonic analyzer An instrument used for the analysis and

More information

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,

More information

Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation

Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation Preprint final article appeared in: Computer Music Journal, 32:2, pp. 68-79, 2008 copyright Massachusetts

More information

8.3 Basic Parameters for Audio

8.3 Basic Parameters for Audio 8.3 Basic Parameters for Audio Analysis Physical audio signal: simple one-dimensional amplitude = loudness frequency = pitch Psycho-acoustic features: complex A real-life tone arises from a complex superposition

More information

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks

Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk

More information

Localized Robust Audio Watermarking in Regions of Interest

Localized Robust Audio Watermarking in Regions of Interest Localized Robust Audio Watermarking in Regions of Interest W Li; X Y Xue; X Q Li Department of Computer Science and Engineering University of Fudan, Shanghai 200433, P. R. China E-mail: weili_fd@yahoo.com

More information

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,

More information

Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark

Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark krist@diku.dk 1 INTRODUCTION Acoustical instruments

More information

Adaptive Filters Application of Linear Prediction

Adaptive Filters Application of Linear Prediction Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing

More information

Audio Signal Compression using DCT and LPC Techniques

Audio Signal Compression using DCT and LPC Techniques Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,

More information