Sound analysis, processing and synthesis tools for music research and production

Size: px

Start display at page:

Download "Sound analysis, processing and synthesis tools for music research and production"

Jewel McDonald
6 years ago
Views:

1 1 Sound analysis, processing and synthesis tools for music research and production Xavier Rodet Analysis-Synthesis team Ircam, 1 place Stravinsky, Paris, France Abstract We present a set of analysis, processing and synthesis tools developed at Ircam and the efforts pursued to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs which are presented are signal models: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. One of the key elements for easy usage, success and perennity of all these programs is the Sound Description Interchange Format (SDIF) invented and implemented in collaboration with other institutions to allow programs to easily deal with and exchange a/s data. Finally, examples of musical applications and pieces produced with our programs are given. 1 Introduction Since its birth in the 50 s, the domain of analysis, processing and synthesis of musical sound signals has seen a large number of computer programs being developed for research and production. Such developments have been done at the Institut de Recherche et de Coordination Acoustique/Musique (Ircam) since the end of the 70 s. However, it is too often the case that the possibilities offered by these programs, whatever good they can be, are available for only a short lifetime (even for researchers who may be more tolerant about the weaknesses of the tools) or, at worst, that these programs never reach the point where they are used by composers, musicians or researchers. There are several reasons for this loss and it is not the purpose of this paper to list and analyze all of them. But we describe here some of the efforts pursued in the Analysis- Synthesis team at Ircam to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs that will be presented are signal models, as opposed to physical models which are also developed in our team but are not presented here. These signal models are: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. In our institution, such programs are usually developed first in a research version under UNIX and only when they are found to be of interest to musicians and composers are they ported to the Macintosh platform and on FTS-jMax, Ircam s real-time system, for use by composers and musicians. Porting and keeping alive all these libraries and programs on several flavors of Unix, Linux and Macintosh is a huge task. Consequently, a great deal of effort has been done to improve software development and have unique sources for all platforms kept under the version management system CVS. On Macintosh, the environment which has been most developed for signal models (even though it is also well adapted for physical models) is Diphone Studio which will be presented in section 3. One of the key elements for easy usage, success and durability of all these programs is the Sound Description Interchange Format (SDIF) which will be presented in section 4. SDIF, which we have invented and implemented in collaboration with other institutions, allows programs to easily deal with and exchange a/s data and facilitates management and maintenance of the data as well as exchange with other music research centers. Finally, examples of musical applications and pieces produced with our programs will be given in section 5, not forgetting other applications such as post-production or acoustics research. 1

2 2 2 Analysis/synthesis methods and programs 2.1 Chant synthesis The Chant [1],[2] synthesis technique, Formant- Wave-Forms (FOFs) and filters, has been implemented in various environments [3] but a general and portable library was not available. Therefore, a Chant library for UNIX and Macintosh has been designed at IRCAM by X. Rodet and F. Iovino, and written by F. Iovino, with modifications by G. Eckel and C. Rodgers for fast synthesis. It defines a small set of module classes: Formant- Wave-Form (FOF) bank, Filter bank, Noise generator, Sound File and Channel for sound output. Instances of these modules can be arbitrarily patched to build synthesizers offering polyphony and multi-channel output. It has recently been improved and documented [4] by D. Virolle. In particular, an SDIF format has been defined for patches and for the time-varying parameters corresponding to a patch. A general synthesis program, named Chant, reads patch and parameter files and computes the corresponding sound file. This Chant library and synthesizer have been ported to Diphone Studio on Macintosh by A. Lefèvre. As for the other synthesis methods, the computation of a Chant sequence results in a Chant parameter file and the corresponding sound file is computed by the Chant synthesizer. This new extension offers interesting possibilities of FOF and source filter synthesis by using all the features and facilities of the Diphone program. 2.2 Resonance model analysis and synthesis Resonance Modeling [5] is a technique suited for percussive-like sounds or sounds which look like the free response of a linear system (however it has been recently applied to sustained sounds and has produced remarkable results). Such sounds can be considered as a sum of exponentially damped sinusoids which are the responses of the modes of musical instruments or generally of linear systems. The principle of Resonance Modeling is to look for sinusoids in two Fourier Transforms, one applied to a window located at the beginning of the sound where the resonances are of maximum amplitude, the other further in the sound where the resonances have damped. A resonance, i.e. a damped sinuoid, appears as a peak in both transforms, with similar frequency. Therefore peaks in one transform are matched to peaks in the other and couples with close enough frequencies are considered as a resonance. The amplitude and the decay rate of such a resonance are easily deduced from the amplitude of the peak in each transform. The process is done iteratively with various window sizes and positions to improve the detection and the estimation of the resonances. The final result is a Resonance Model of the sound, i.e. a set of resonances, each one characterized by its frequency, its amplitude and its decay rate. One can view the sound as the sum of exponentially damped sinusoids, or as the response to an impulse of a bank of second order filters implementing the resonances. In both cases, synthesis can be done with a Chant synthesizer. The first case is what comes out of a bank of FOF generators triggered once. The second case, bank of second order filters, is also implemented in the Chant synthesizer mentionned in section 2.1. After the first version in 1985, a C version has been written in 1989 by P.-F. Baisnée [6] and ported to UNIX and MPW. Recently, this code has been improved and ported to various UNIX platforms at Ircam by Francesc Marti [7] and ported to Macintosh in Diphone Studio by A. Lefèvre and is named ModRes. In particular, SDIF i/o has been added for storage of Resonance Models in terms of FOFs or filters. Therefore, Resonance Models can be used in Diphone Studio using the Chant synthesis plugin. Let us also mention two real-time implementations of FOF synthesis. It has been ported to Ircam s realtime system FTS-jMax and extended under the name FOG by G. Eckel and F. Iovino [8]. It has also been ported to Max-MSP by F. Iovino and R. Dudas. 2.3 Spectral envelopes A spectral envelope is a curve in the frequencymagnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a high level approach to their handling has been taken [9]. A C- library and applications [10] using the SDIF standard have been developed by D. Schwarz dealing with spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear 2

3 3 prediction, cepstrum or discrete cepstrum. In [11] the strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) have been added. For speech signals, a composite envelope is shown to be advantageous [12]. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements should be fulfilled such as stability, locality, and flexibility. As a consequence of these requirements, several representations (filter coefficients, sampled, break-point-functions, splines, formants) are available in the library. The notion of fuzzy formants based on formant regions has also been introduced. Some general forms of manipulations and morphing are offered. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are also used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 method (see section 2.4 Additve). Some features remain to be implemented. For instance, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency and the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. The SpecEnv library and applications are used in various real-time and non real-time programs on UNIX and Macintosh. On UNIX, the estimate application [13] computes spectral envelopes from signals and from additive data, while the modformat and filnor programs allows modification and application of spectral envelopes to additive+residual representations. These programs have been written by D. Schwarz, G. Poirot and S. Roux and have been ported to Diphone Studio by A. Lefèvre. Spectral envelopes for additive representation and for residual have been ported to FTS-jMax by N. Schnell. 2.4 Additive+residual method Additive synthesis, that is the summation of timevarying sinusoidal components [14], [15] is accepted as perhaps the most powerful and flexible synthesis method. To take into account nonsinusoidal signals, a residual or noise is also added to the sinusoids [16]. Additive+residual synthesis allows the pitch and length of sounds to be varied independently [17]. Furthermore, because independent control of every component is available in additive synthesis, it is possible to implement models of perceptually significant features of sound such as inharmonicity, roughness and fine control over the sound spectrum for timbre manipulations, such as continuous changes in its detailed structure. Another important aspect of additive synthesis is the simplicity of the mapping of frequency and amplitude parameters into the human perceptual space. These parameters are meaningful and easily understood by musicians. Recently, a new additive synthesis method based on spectral envelopes and Fast Fourier Transform has been developed [18]. Use of the inverse FFT reduces the computation cost by a factor in the order of 15 compared to oscillators. Furthermore, noise signals of any spectral density are also easily synthesized by this method. It is sufficient to add the desired spectral density in the spectrum prior to inverse FFT. This technique, which we name fft-1 additive, renders possible the design of low cost real-time synthesizers allowing processing of recorded and live sounds, synthesis of instruments and synthesis of speech and the singing voice. The IRCAM Additive Analysis/Synthesis software, developed by X. Rodet, Ph. Depalle, G. Garcia and R. Woehrman, has been used at IRCAM for several years by researchers [19] and by musicians. It has been rewritten by G. Garcia as a library named Pm [20] and has been ported to Macintosh by A. Lefèvre. SDIF i/o facilities have been added to Pm by D. Schwarz. The analysis of harmonic sounds relies on the 3

4 4 selection of peaks of the Short Time Fourier Transform which are close to a multiple (harmonic) of the local fundamental frequency. The analysis of inharmonic sounds relies on the building of trajectories of sinusoids with statistically optimal properties of continuity of parameter values (frequency, amplitude and phase) and/or of their slopes [21]. This is implemented in a program named hmm (Hidden Markov Model) after the method it uses. hmm [22] has been written by G. Garcia, Ph. Depalle and recently improved by P. Chose who added SDIF i/o and a graphic user interface named sview [23]. In the case of harmonic sounds, the first analysis step is a fundamental frequency estimation done by IRCAM's f0 program [24] and stored in an f0 file. In the second step, a sliding window FFT analysis is applied to the sound file and produces an FFT file. Then, the peaks of each FFT are detected and their frequency, amplitude and phase are estimated by polynomial interpolation, resulting in a peaks file. Finally, peaks of successive windows are grouped into sinusoidal partial trajectories. Trajectories are stored in a partial file which contains, for each of the successive windows, the frequency, amplitude and phase of each partial. All these data can be stored in, or retrieved from SDIF files (see section 4). The fft-1 additive method mentioned above has been written for the real-time system FTS-jMax by N. Schnell [25]. It provides the synthesis of hundreds of arbitrary sinusoidal partials and of noise with any required spectral density at a very low cost. 2.5 PSOLA analysis and synthesis PSOLA (Pitch Synchronous OverLap-Add) [26] is a method based on the decomposition of a signal into a series of elementary waveforms so that each waveform represents one of the successive pitch periods of the signal and the sum (overlap-add) of them reconstitutes the signal [27]. PSOLA works directly with the signal waveform and therefore is not expensive while not losing any detail of the signal. But in opposition to usual sampling, PSOLA allows independent control of pitch, duration and formants of the signal. PSOLA analysis consists of positioning markers [28] pitch-synchronously, i.e. the interval between two successive markers is equal to the local fundamental period, and such that each marker is close to the local maximum of the signal energy. The signal is then decomposed into a series of elementary waveforms by applying analysis windows centered on the markers. PSOLA synthesis proceeds by overlap-add of the waveforms re-positioned on time instants calculated so that the interval between two successive waveforms is equal to the desired local pitch period. In the usual PSOLA method, time stretching or compression is obtained by repeating or skipping waveforms. However, in case of strong time-stretching, repetition produces audible signal discontinuities. This is the reason why a TDI- and a FDI PSOLA (Time Domain and Frequency Domain Interpolation PSOLA) have been proposed [28], where the waveforms to be overlap-added are interpolated between two successive waveforms of the analyzed signal. By its definition, the PSOLA method allows only modification of the periodic parts of the signal. For the portions of the signal which are not periodic but random-like, the processing differs [29] in order to preserve the randomness and avoid introducing an artificial correlation in these parts, which would then be perceived as tones ( flanging effect ). It is thus necessary to estimate which parts of the signal are periodic, which are non-periodic and which are transient. In the case of the voice, the periodic part of the signal is produced by the vibration of the vocal chords and is called voiced. In our analysis algorithm, we extend this notion to any signal: at each time instant t, a voicing coefficient v(t) is estimated. This coefficient is obtained by use of a Phase Derived Sinusoidality measure [29]. For each time/frequency region, the instantaneous frequency is compared to the frequency measured from spectral peaks. If they match, the time/frequency region is said to be sinusoidal. If, for a specific time, most regions of the spectrum are sinusoidal, this time frame is said to be voiced and is therefore processed by the PSOLA algorithm. Otherwise it is considered as random. Two main advantages of the PSOLA method are preservation of phase even when the length of the sound is modified and preservation of the spectral envelope (formant positions) even when pitch is shifted. High-quality transformations of signals can be obtained at very low computational cost. For modification of spectral envelope independently of pitch, a Frequency Shifting (FS-PSOLA [29]) method has been proposed. The PSOLA method can also be viewed as close to granular synthesis in which each grain corresponds to one pitch period, or close to Chant- 4

5 5 FOF synthesis since PSOLA elementary waveforms can be considered as an approximation of Formant Waveforms but without explicit estimation of source and filter parameters. G. Peeters has developed a PSOLA analysis and synthesis package on UNIX named psolab. The synthesis algorithm has been ported to the real-time system FTS-jMax by N. Schnell. A musical application for synthesizing a choir in real-time as well as other applications using PSOLA are described in section 5. It should be noticed that overlap-add (OLA) does not need the signal to be sinusoidal and therefore gives good results for portions of the signal where additive analysis fails, for example in fast transients or random portions. This is why G. Peeters has developed an a/s scheme, known as sinola [29], which combines additive sinusoids and OLA. 3 Diphone Studio Diphone Studio, developed at Ircam by A. Lefèvre and co-workers, is the package where the above mentioned analysis and synthesis programs are ported to Macintosh to be called by the Diphone control program itself [30]. Let us first explain the principles of diphone control. Diphone control is a powerful means of building a musical phrase from dictionaries of sound segments by concatenating and articulating them. Musical phrases can be built from any sound segment (not only instrumental sounds but any recorded or synthetic sounds) stored in the form of time varying control parameters of any synthesis method such as additive synthesis or physical model. A musical sentence is obtained by the concatenation of parameters from each segment of the sequence of segments representing the desired sentence. Between two successive segments, parameter values are interpolated. The stream of parameter values so obtained is fed into a synthesis engine which computes the resulting sound signal. Dictionaries of sound segments coded into parameters are built as follows. At first, sound recordings are analyzed through a method such as additive analysis, resulting in a parameter file, e.g. time varying frequency and amplitude of sinusoidal partials. In the second step, the parameter file is cut into segments representing units of sound suited for the musical usage in view. Finally, these segments are recorded in dictionaries. The analysis methods mentioned above are implemented in separate programs with specific GUIs. For example, additive analysis is implemented in the AddAn program and resonance modeling in the ModRes program. In AddAn, for instance, the different steps of the additive analysis are specified and triggered through an analysis panel and an analysis script file. The panel allows the user to set any of the analysis parameters, such as window size, window step, etc.. All the settings are also accessible through an ASCII console and can be stored into, or restored from, an analysis script file. For the rest, i.e. control and synthesis, they are achieved through the Diphone program itself. Since Generalized Diphone Control is aimed at providing control parameters for any synthesis method and in order to facilitate extension to any synthesis engine, the program has been cut into separated modules using shared libraries and plugins. The central part is in charge of sequence concatenation, independently of the synthesis model under use. Peripheral plugins are in charge of the computation of parameters streams for specific synthesis methods and of the synthesis engines themselves, such as additive synthesis or Chant synthesis. Analysis and synthesis engines are implemented in terms of shared libraries with specific parsers accepting the exact UNIX command lines for best compatibility. The Diphone control program also has a powerful GUI. The first component of the GUI is a dictionary browser which provides graphical facilities for looking at dictionaries, displaying their contents, modifying them, etc.. The second component is a sequence editor. It allows for the building of sequences of segments and the tuning of segment parameter values such as position on the time axis, loudness or interpolation zones. Parameter evolution as stored in segments or as calculated from a sequence, are represented by break-pointfunctions (BPFs). The third component is a BPF editor which allows the user to create BPFs or to modify them, in particular the complicated data obtained from the analysis programs, frequency and amplitude of sinusoidal partials for instance. The Diphone program constitute a very flexible and inspiring tool for composition and synthesis, which musicians are using at IRCAM and outside for music creation. 4 The Sound Description Interchange Format (SDIF) The Sound Description Interchange Format (SDIF) [31] is a recently adopted standard that can store a 5

6 6 variety of sound representations including spectral, time domain, and higher-level models. SDIF consists of a specified data format and a set of standard sound descriptions and their official representation. SDIF is flexible in that new sound descriptions can be represented, and new kinds of data can be added to existing sound descriptions, facilitating innovation and research. This standard is developed in collaboration by several research centers, notably Ircam, CNMAT and Audiovisual Institute of the Pompeu Fabra University. The main goal of SDIF is to promote interchange by providing a common format for a variety of sound descriptions. For example, a panel appears at ICMC 2000 in Berlin on various Analysis/Synthesis Techniques where the analysis/synthesis data are exclusively shared, between seven different institutions, in SDIF format. The SDIF specification is open and publicly available as well as C and C++ libraries, available at no charge from and for multiple platforms, including SGI IRIX, DEC Alpha OSF, Apple MacOS, Windows, and Linux. SDIF has standardized formats to support common extant sound descriptions. SDIF allows custom versions of these representations that include all the standard data in the standard places, plus extra fields. Entirely new representations can also be added. SDIF was designed to support file storage and Internet streaming of aggregates of various sound descriptions. We hope that SDIF will encourage and facilitate the development of new tools for manipulating sound in spectral and other domains, promote the use of interesting sound descriptions in general, and facilitate sharing of work within the community. The body of an SDIF file is a sequence of timetagged frames modeled after chunks in the IFF/AIFF/RIFF formats. The time tag is an eight byte floating point number that indicates the time to which the given frame applies. By allowing any kind of frames in the same file, SDIF is also an aggregate or archive format. Data in a frame are stored in one or more 2D matrices. Matrix columns correspond to parameters like frequency or amplitude; each row represents an object like a filter, sinusoid, or noise band. Among the many SDIF sound description types (frames), the following are used by the libraries and programs quoted here above: fundamental frequency, signal windows, discrete short-time Fourier transform, picked spectral peaks, sinusoidal tracks, harmonic sinusoidal tracks, FOFs, resonance models, white noise, signal samples, PSOLA markers, voicing coefficient, time-domain envelope, spectral envelope, cepstral coefficients, autocorrelation coefficients, autoregressive coefficients, reflection coefficients. A number of utility programs have been written at Ircam by D. Schwarz, D. Virolle and P. Tisserand, among which are: an invertible SDIF to text transformation; sdifextract, which extract data from an SDIF file according to time-interval, frame, matrix, row, column, etc.; various conversion tools to other textual and binary formats, a drag&drop conversion tool for Macintosh, a reader/writer for Matlab, a real-time parameter reader/writer for jmax patches and a reader/writer for OpenMusic. Other utility programs have been written in other institutions, such as a merger and a reader/writer for Max-MSP at CNMAT. 5 Musical and other usage 5.1 Musical creation at Ircam Various pieces have been created at Ircam using the tools which we have described above. Let us quote a few: M by P. Leroux, Voile by D. Cohen, Epitafios by A. Vinao and Mountain Language by J. Wood have been done with Diphone, ModRes and AddAn. 5.2 A PSOLA virtual choir Composer P. Manoury is working on a new opera K based on F. Kafka s novel Der Prozess, with the musical assistance of S. Lemouton [32]. For several scenes of this opera, the composer has expressed the need for choral voices evoking the notion of crowd and for sounds unusual or impossible for a real choir. It was decided that the virtual choir would be the superposition of multiple well enough distinguishable voices. The PSOLA method described here above was found a proper way to create the different voices with individual differences and to allow a wide range of transformations [32]. A whole group of voices is derived from a single recorded voice by using the real-time PSOLA implementation in FTX-jMax. Each individual voice differs from the others by pitch deviation and duration change allowed by PSOLA. Other modifications are possible such as suppressing the vibrato in the recorded voice and imposing a new and different vibrato for the 6

7 7 individual voices. Interesting effects can be obtained when using the voicing coefficient. For example, voiced segments can be stretched more than unvoiced ones. Similarly, vowels and consonants can be independently processed and spatialized. Sound examples will be played at the conference. 5.3 Post-processing A few years ago, our team was asked to create the voice of a castrato for the movie Farinelli [33]. This was done essentially by morphing real soprano and counter-tenor voices into a virtual castrato voice with our Super-VP phase vocoder. Recently, another demand came from the movie industry to improve the English pronunciation of a French actor in an English movie. PSOLA was found adequate to correct the prosody by changing pitch evolution and duration and energy of phonemes. Similarly, we have been asked to work on recordings of an actor who has a noticeable German accent in French. The task is to diminish his accent and eventually to change it in the direction of a neutral accent or even an Italian accent. We found this possible with a combination of PSOLA for the prosody and Spectral Envelope modification to change the timbre of the voice as well. These different works have been done by S. Rossignol, G. Peeters and A. Lithaud. Finally, let us mention a non-musical application. Additive+residual has been used to create new samples in a psychoacoustic research project on the sound of air-conditioning devices. These devices produce sounds with well defined sinusoidal partials and a random component heard as a noise. Our programs have been used by I. Perry at Ircam to create new air-conditioning sounds located between existing ones, by interpolating sinusoidal partials on the one hand and noise spectral envelope on the other. The application of such an interpolation to musical sounds can be heard in [34], where, for instance hybrids of trombone and flute with varying factors provide a continous timbre change from one instrument to the other. 6 Conclusion We have presented in this paper a whole set of libraries and programs for sound analysis, processing and synthesis using signal models. A great deal of effort has been spent to assure the development, maintenance and easy usage of these tools for musicians. We have shown that a key component to guaranty such facilities is the Sound Description Interchange Format developed at Ircam in collaboration with CNMAT and other research centers. It appears that SDIF has gained ground as a standard. SDIF has demonstrated its utility sufficiently that it will be used for future projects, and we hope to see SDIF adopted by many other groups as well. 7 References [1] Rodet, X., "Time Domain Formant-Wave- Function Synthesis", Cambridge, Massachusets, Computer Music Journal, Vol 8, n 3, [2] d'alessandro, C. and Rodet, X. (1989). Synthèse et analyse-synthèse par fonctions d'ondes formantiques. J. Acoustique, (2): [3] Clarke, J. M., Manning, P.D., Berry, R, and Purvis, A., VOCEL: New implementations of the FOF synthesis method. Proc. In. Comp. Music Conf., ICMC88, Cologne 1988, pp [4] [5] Y. Potard, P-F. Baisnée & J-B.Barrière (1986) Experimenting with Models of Resonance Produced by a New Technique for the Analysis of Impulsive Sounds, Proceedings of 1986 International Computer Music Conference, La Haye, Berkeley: Computer Music Association, pp [6] Baisnée & J-B.Barrière Baudot ICMC89 Barriere, J.-B., Baisnee, P.-F., Freed, A., Baudot, M.-D., "A Digital Signal Multiprocessor and its Musical Application", Proceedings of the 15th International Computer Music Conference, Ohio State University, 1989, Computer Music Association. [7] anamod.html [8] Eckel, G., Iturbide, M.R. ans Becker, B., The development of GiST, a Granular Synthesis Tool Kit Based on an Extension of the FOF Generator, Proc. Int. Comp. Music Conf., ICMC95, Banff, 1995, pp [9] Schwarz, D., Rodet, X., Spectral envelope estimation, representation, and morphing for sound analysis, transformation, and synthesis. Proc. Int. Comp. Music Conf., ICMC99, Beijing, Oct. 99, pp [10] [11] Rodet, X., Schwarz, D., Spectral envelopes and additive+residual analysis-synthesis, to appear in J. Beauchamp ed. The Sound of Music. Springer N.Y. to be published. 7

8 8 [12] Oudot M., "Estimation of the spectral envelope of mixed spectrum signals using a penalized likelihood criterion", IEEE Trans. Speech and Audio Processing, Juin [13] estimate.html [14] McAulay RJ, Quatieri ThF. Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Trans. on Acoust., Speech and Signal Proc., vol ASSP pp [15] Serra X. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition. Philosophy Dissertation, Stanford University, Oct [16] Serra, Xavier. "Musical Sound Modeling with Sinusoids plus Noise," in G. D. Poli, A. Picialli, S. T. Pope, and C. Roads, editors, Musical Signal Processing. Swets & Zeitlinger Publishers, [17] Quatieri ThF, McAulay RJ. Shape Invariant Time-Scale and Pitch Modification of Speech. IEEE Trans. on Signal Processing, Vol. 40 No. 3, March [18] Depalle, P., Rodet, X, A new additive synthesis method using inverse Fourier transform, Int. Comp. Music Conf., San-Jose, Oct. 92. [19] index-e.html [20] [21] P. Depalle, G. García & X. Rodet, Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP-93, Minneapolis, Min., Apr [22] dochmm.html [23] docsview/ [24] [25] Déchelle et al. 1999a ICMC Déchelle, F., Borghesi, R., Cecco, M. D., Maggi, E., Rovan, B. and Scnell, N., jmax: An environment for real-tme musical applications, Comp. Music J., 23(3): [26] Moulines, E. and Charpentier, F. (1990). Pitch- Synchronous Waveform Processing Techniques for Text-To-Speech Synthesis using Diphones. Speech Communication, (9): [27] [28] Peeters, G. (1998). Analyse-Synthèse des sons musicaux par la méthode PSOLA. In proc. Journées Informatique Musicale, Agelonde, France. [29] Peeters, G. and Rodet, X. (1999). Non- Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum. In Proc. Int. Congr. Signal Proc. Applic. and Tech., Orlando, USA. [30] X. Rodet, A. Lefèvre: The Diphone program: New features, new synthesis methods and experience of musical use, proc. Int. Comp. Music Conference, Thessaloniki, 1997 [31] Wright, M., Chaudhary, A, Freed, A., Wessel, D., Rodet, X., Virolle, D., Woehrmann, R., Serra, X., New Applications of the Sound Description Interchange Format, proc. Int. Comp. Music Conf. ICMC98, Ann Arbor, Michigan, USA, Oct. 1998, pp [32] Schnell, N., Peeters, G., Lemouton, S., Manoury, P. and Rodet, X., Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), proc. Int. Comp. Music Conf. ICMC 2000, Berlin, Sep [33] P. Depalle, G. García & X. Rodet, A virtual Castrato (!?), Proc. Int. Computer Music Conference, Aarhus, Denmark, Oct [Schwarz 98] Schwarz, D., "Spectral Envelopes in Sound Analysis and Synthesis," ICMC [34] 8

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,