Sound analysis, processing and synthesis tools for music research and production
|
|
- Jewel McDonald
- 6 years ago
- Views:
Transcription
1 1 Sound analysis, processing and synthesis tools for music research and production Xavier Rodet Analysis-Synthesis team Ircam, 1 place Stravinsky, Paris, France Abstract We present a set of analysis, processing and synthesis tools developed at Ircam and the efforts pursued to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs which are presented are signal models: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. One of the key elements for easy usage, success and perennity of all these programs is the Sound Description Interchange Format (SDIF) invented and implemented in collaboration with other institutions to allow programs to easily deal with and exchange a/s data. Finally, examples of musical applications and pieces produced with our programs are given. 1 Introduction Since its birth in the 50 s, the domain of analysis, processing and synthesis of musical sound signals has seen a large number of computer programs being developed for research and production. Such developments have been done at the Institut de Recherche et de Coordination Acoustique/Musique (Ircam) since the end of the 70 s. However, it is too often the case that the possibilities offered by these programs, whatever good they can be, are available for only a short lifetime (even for researchers who may be more tolerant about the weaknesses of the tools) or, at worst, that these programs never reach the point where they are used by composers, musicians or researchers. There are several reasons for this loss and it is not the purpose of this paper to list and analyze all of them. But we describe here some of the efforts pursued in the Analysis- Synthesis team at Ircam to keep such programs alive through the years, to develop new ones and to allow different programs from a given institution or from various institutions to easily manage, exchange and maintain analysis and synthesis data. The analysis/synthesis (a/s) methods and programs that will be presented are signal models, as opposed to physical models which are also developed in our team but are not presented here. These signal models are: Formant Wave-form (FOF) synthesis, Resonance Modeling analysis, Spectral Envelope a/s, Sinusoidal+Residual Additive a/s and Pitch Synchronous Overlap Add (PSOLA) a/s. In our institution, such programs are usually developed first in a research version under UNIX and only when they are found to be of interest to musicians and composers are they ported to the Macintosh platform and on FTS-jMax, Ircam s real-time system, for use by composers and musicians. Porting and keeping alive all these libraries and programs on several flavors of Unix, Linux and Macintosh is a huge task. Consequently, a great deal of effort has been done to improve software development and have unique sources for all platforms kept under the version management system CVS. On Macintosh, the environment which has been most developed for signal models (even though it is also well adapted for physical models) is Diphone Studio which will be presented in section 3. One of the key elements for easy usage, success and durability of all these programs is the Sound Description Interchange Format (SDIF) which will be presented in section 4. SDIF, which we have invented and implemented in collaboration with other institutions, allows programs to easily deal with and exchange a/s data and facilitates management and maintenance of the data as well as exchange with other music research centers. Finally, examples of musical applications and pieces produced with our programs will be given in section 5, not forgetting other applications such as post-production or acoustics research. 1
2 2 2 Analysis/synthesis methods and programs 2.1 Chant synthesis The Chant [1],[2] synthesis technique, Formant- Wave-Forms (FOFs) and filters, has been implemented in various environments [3] but a general and portable library was not available. Therefore, a Chant library for UNIX and Macintosh has been designed at IRCAM by X. Rodet and F. Iovino, and written by F. Iovino, with modifications by G. Eckel and C. Rodgers for fast synthesis. It defines a small set of module classes: Formant- Wave-Form (FOF) bank, Filter bank, Noise generator, Sound File and Channel for sound output. Instances of these modules can be arbitrarily patched to build synthesizers offering polyphony and multi-channel output. It has recently been improved and documented [4] by D. Virolle. In particular, an SDIF format has been defined for patches and for the time-varying parameters corresponding to a patch. A general synthesis program, named Chant, reads patch and parameter files and computes the corresponding sound file. This Chant library and synthesizer have been ported to Diphone Studio on Macintosh by A. Lefèvre. As for the other synthesis methods, the computation of a Chant sequence results in a Chant parameter file and the corresponding sound file is computed by the Chant synthesizer. This new extension offers interesting possibilities of FOF and source filter synthesis by using all the features and facilities of the Diphone program. 2.2 Resonance model analysis and synthesis Resonance Modeling [5] is a technique suited for percussive-like sounds or sounds which look like the free response of a linear system (however it has been recently applied to sustained sounds and has produced remarkable results). Such sounds can be considered as a sum of exponentially damped sinusoids which are the responses of the modes of musical instruments or generally of linear systems. The principle of Resonance Modeling is to look for sinusoids in two Fourier Transforms, one applied to a window located at the beginning of the sound where the resonances are of maximum amplitude, the other further in the sound where the resonances have damped. A resonance, i.e. a damped sinuoid, appears as a peak in both transforms, with similar frequency. Therefore peaks in one transform are matched to peaks in the other and couples with close enough frequencies are considered as a resonance. The amplitude and the decay rate of such a resonance are easily deduced from the amplitude of the peak in each transform. The process is done iteratively with various window sizes and positions to improve the detection and the estimation of the resonances. The final result is a Resonance Model of the sound, i.e. a set of resonances, each one characterized by its frequency, its amplitude and its decay rate. One can view the sound as the sum of exponentially damped sinusoids, or as the response to an impulse of a bank of second order filters implementing the resonances. In both cases, synthesis can be done with a Chant synthesizer. The first case is what comes out of a bank of FOF generators triggered once. The second case, bank of second order filters, is also implemented in the Chant synthesizer mentionned in section 2.1. After the first version in 1985, a C version has been written in 1989 by P.-F. Baisnée [6] and ported to UNIX and MPW. Recently, this code has been improved and ported to various UNIX platforms at Ircam by Francesc Marti [7] and ported to Macintosh in Diphone Studio by A. Lefèvre and is named ModRes. In particular, SDIF i/o has been added for storage of Resonance Models in terms of FOFs or filters. Therefore, Resonance Models can be used in Diphone Studio using the Chant synthesis plugin. Let us also mention two real-time implementations of FOF synthesis. It has been ported to Ircam s realtime system FTS-jMax and extended under the name FOG by G. Eckel and F. Iovino [8]. It has also been ported to Max-MSP by F. Iovino and R. Dudas. 2.3 Spectral envelopes A spectral envelope is a curve in the frequencymagnitude plane which envelopes the short time spectrum of a signal, e.g. connecting the peaks which represent sinusoidal partials, or modeling the spectral density of a noise signal. It describes the perceptually pertinent distribution of energy over frequency, which determines a large part of timbre for instruments, and the type of vowel for speech. Because of the importance of using spectral envelopes for sound synthesis, a high level approach to their handling has been taken [9]. A C- library and applications [10] using the SDIF standard have been developed by D. Schwarz dealing with spectral envelopes for analysis, representation, manipulation, and synthesis. Spectral envelopes can be estimated by linear 2
3 3 prediction, cepstrum or discrete cepstrum. In [11] the strong and weak points of each are discussed relative to the requirements for estimation, such as robustness and regularity. Improvements of discrete cepstrum estimation (regularization, statistical smoothing, logarithmic frequency scale, adding control points) have been added. For speech signals, a composite envelope is shown to be advantageous [12]. It is estimated from the sinusoidal partials and from the noise part above the maximum partial frequency. The representation of spectral envelopes is the central point for their handling. A good representation is crucial for the ease and flexibility with which they can be manipulated. Several requirements should be fulfilled such as stability, locality, and flexibility. As a consequence of these requirements, several representations (filter coefficients, sampled, break-point-functions, splines, formants) are available in the library. The notion of fuzzy formants based on formant regions has also been introduced. Some general forms of manipulations and morphing are offered. For morphing between two or more spectral envelopes over time, linear interpolation, and formant shifting which preserves valid vocal tract characteristics, are considered. For synthesis, spectral envelopes are applied to sinusoidal additive synthesis and are also used for filtering the residual noise component. This is especially easy and efficient for both components in the FFT-1 method (see section 2.4 Additve). Some features remain to be implemented. For instance, in additive analysis, spectral envelopes can be generalized not only to apply to magnitude, but also to frequency and phase, while keeping the same representation. The frequency envelope expresses harmonicity of partials over frequency and the phase envelope expresses phase relations between harmonic partials. With this high level approach to spectral envelopes, additive synthesis can avoid the dilemma of how to control hundreds of partials, and the residual noise part can be treated by the same manipulations as the sinusoidal part by using the same representation. Also, high quality singing voice synthesis can use morphing between sampled spectral envelopes and formants to combine natural sounding transitions with a precisely modeled sustained part. The SpecEnv library and applications are used in various real-time and non real-time programs on UNIX and Macintosh. On UNIX, the estimate application [13] computes spectral envelopes from signals and from additive data, while the modformat and filnor programs allows modification and application of spectral envelopes to additive+residual representations. These programs have been written by D. Schwarz, G. Poirot and S. Roux and have been ported to Diphone Studio by A. Lefèvre. Spectral envelopes for additive representation and for residual have been ported to FTS-jMax by N. Schnell. 2.4 Additive+residual method Additive synthesis, that is the summation of timevarying sinusoidal components [14], [15] is accepted as perhaps the most powerful and flexible synthesis method. To take into account nonsinusoidal signals, a residual or noise is also added to the sinusoids [16]. Additive+residual synthesis allows the pitch and length of sounds to be varied independently [17]. Furthermore, because independent control of every component is available in additive synthesis, it is possible to implement models of perceptually significant features of sound such as inharmonicity, roughness and fine control over the sound spectrum for timbre manipulations, such as continuous changes in its detailed structure. Another important aspect of additive synthesis is the simplicity of the mapping of frequency and amplitude parameters into the human perceptual space. These parameters are meaningful and easily understood by musicians. Recently, a new additive synthesis method based on spectral envelopes and Fast Fourier Transform has been developed [18]. Use of the inverse FFT reduces the computation cost by a factor in the order of 15 compared to oscillators. Furthermore, noise signals of any spectral density are also easily synthesized by this method. It is sufficient to add the desired spectral density in the spectrum prior to inverse FFT. This technique, which we name fft-1 additive, renders possible the design of low cost real-time synthesizers allowing processing of recorded and live sounds, synthesis of instruments and synthesis of speech and the singing voice. The IRCAM Additive Analysis/Synthesis software, developed by X. Rodet, Ph. Depalle, G. Garcia and R. Woehrman, has been used at IRCAM for several years by researchers [19] and by musicians. It has been rewritten by G. Garcia as a library named Pm [20] and has been ported to Macintosh by A. Lefèvre. SDIF i/o facilities have been added to Pm by D. Schwarz. The analysis of harmonic sounds relies on the 3
4 4 selection of peaks of the Short Time Fourier Transform which are close to a multiple (harmonic) of the local fundamental frequency. The analysis of inharmonic sounds relies on the building of trajectories of sinusoids with statistically optimal properties of continuity of parameter values (frequency, amplitude and phase) and/or of their slopes [21]. This is implemented in a program named hmm (Hidden Markov Model) after the method it uses. hmm [22] has been written by G. Garcia, Ph. Depalle and recently improved by P. Chose who added SDIF i/o and a graphic user interface named sview [23]. In the case of harmonic sounds, the first analysis step is a fundamental frequency estimation done by IRCAM's f0 program [24] and stored in an f0 file. In the second step, a sliding window FFT analysis is applied to the sound file and produces an FFT file. Then, the peaks of each FFT are detected and their frequency, amplitude and phase are estimated by polynomial interpolation, resulting in a peaks file. Finally, peaks of successive windows are grouped into sinusoidal partial trajectories. Trajectories are stored in a partial file which contains, for each of the successive windows, the frequency, amplitude and phase of each partial. All these data can be stored in, or retrieved from SDIF files (see section 4). The fft-1 additive method mentioned above has been written for the real-time system FTS-jMax by N. Schnell [25]. It provides the synthesis of hundreds of arbitrary sinusoidal partials and of noise with any required spectral density at a very low cost. 2.5 PSOLA analysis and synthesis PSOLA (Pitch Synchronous OverLap-Add) [26] is a method based on the decomposition of a signal into a series of elementary waveforms so that each waveform represents one of the successive pitch periods of the signal and the sum (overlap-add) of them reconstitutes the signal [27]. PSOLA works directly with the signal waveform and therefore is not expensive while not losing any detail of the signal. But in opposition to usual sampling, PSOLA allows independent control of pitch, duration and formants of the signal. PSOLA analysis consists of positioning markers [28] pitch-synchronously, i.e. the interval between two successive markers is equal to the local fundamental period, and such that each marker is close to the local maximum of the signal energy. The signal is then decomposed into a series of elementary waveforms by applying analysis windows centered on the markers. PSOLA synthesis proceeds by overlap-add of the waveforms re-positioned on time instants calculated so that the interval between two successive waveforms is equal to the desired local pitch period. In the usual PSOLA method, time stretching or compression is obtained by repeating or skipping waveforms. However, in case of strong time-stretching, repetition produces audible signal discontinuities. This is the reason why a TDI- and a FDI PSOLA (Time Domain and Frequency Domain Interpolation PSOLA) have been proposed [28], where the waveforms to be overlap-added are interpolated between two successive waveforms of the analyzed signal. By its definition, the PSOLA method allows only modification of the periodic parts of the signal. For the portions of the signal which are not periodic but random-like, the processing differs [29] in order to preserve the randomness and avoid introducing an artificial correlation in these parts, which would then be perceived as tones ( flanging effect ). It is thus necessary to estimate which parts of the signal are periodic, which are non-periodic and which are transient. In the case of the voice, the periodic part of the signal is produced by the vibration of the vocal chords and is called voiced. In our analysis algorithm, we extend this notion to any signal: at each time instant t, a voicing coefficient v(t) is estimated. This coefficient is obtained by use of a Phase Derived Sinusoidality measure [29]. For each time/frequency region, the instantaneous frequency is compared to the frequency measured from spectral peaks. If they match, the time/frequency region is said to be sinusoidal. If, for a specific time, most regions of the spectrum are sinusoidal, this time frame is said to be voiced and is therefore processed by the PSOLA algorithm. Otherwise it is considered as random. Two main advantages of the PSOLA method are preservation of phase even when the length of the sound is modified and preservation of the spectral envelope (formant positions) even when pitch is shifted. High-quality transformations of signals can be obtained at very low computational cost. For modification of spectral envelope independently of pitch, a Frequency Shifting (FS-PSOLA [29]) method has been proposed. The PSOLA method can also be viewed as close to granular synthesis in which each grain corresponds to one pitch period, or close to Chant- 4
5 5 FOF synthesis since PSOLA elementary waveforms can be considered as an approximation of Formant Waveforms but without explicit estimation of source and filter parameters. G. Peeters has developed a PSOLA analysis and synthesis package on UNIX named psolab. The synthesis algorithm has been ported to the real-time system FTS-jMax by N. Schnell. A musical application for synthesizing a choir in real-time as well as other applications using PSOLA are described in section 5. It should be noticed that overlap-add (OLA) does not need the signal to be sinusoidal and therefore gives good results for portions of the signal where additive analysis fails, for example in fast transients or random portions. This is why G. Peeters has developed an a/s scheme, known as sinola [29], which combines additive sinusoids and OLA. 3 Diphone Studio Diphone Studio, developed at Ircam by A. Lefèvre and co-workers, is the package where the above mentioned analysis and synthesis programs are ported to Macintosh to be called by the Diphone control program itself [30]. Let us first explain the principles of diphone control. Diphone control is a powerful means of building a musical phrase from dictionaries of sound segments by concatenating and articulating them. Musical phrases can be built from any sound segment (not only instrumental sounds but any recorded or synthetic sounds) stored in the form of time varying control parameters of any synthesis method such as additive synthesis or physical model. A musical sentence is obtained by the concatenation of parameters from each segment of the sequence of segments representing the desired sentence. Between two successive segments, parameter values are interpolated. The stream of parameter values so obtained is fed into a synthesis engine which computes the resulting sound signal. Dictionaries of sound segments coded into parameters are built as follows. At first, sound recordings are analyzed through a method such as additive analysis, resulting in a parameter file, e.g. time varying frequency and amplitude of sinusoidal partials. In the second step, the parameter file is cut into segments representing units of sound suited for the musical usage in view. Finally, these segments are recorded in dictionaries. The analysis methods mentioned above are implemented in separate programs with specific GUIs. For example, additive analysis is implemented in the AddAn program and resonance modeling in the ModRes program. In AddAn, for instance, the different steps of the additive analysis are specified and triggered through an analysis panel and an analysis script file. The panel allows the user to set any of the analysis parameters, such as window size, window step, etc.. All the settings are also accessible through an ASCII console and can be stored into, or restored from, an analysis script file. For the rest, i.e. control and synthesis, they are achieved through the Diphone program itself. Since Generalized Diphone Control is aimed at providing control parameters for any synthesis method and in order to facilitate extension to any synthesis engine, the program has been cut into separated modules using shared libraries and plugins. The central part is in charge of sequence concatenation, independently of the synthesis model under use. Peripheral plugins are in charge of the computation of parameters streams for specific synthesis methods and of the synthesis engines themselves, such as additive synthesis or Chant synthesis. Analysis and synthesis engines are implemented in terms of shared libraries with specific parsers accepting the exact UNIX command lines for best compatibility. The Diphone control program also has a powerful GUI. The first component of the GUI is a dictionary browser which provides graphical facilities for looking at dictionaries, displaying their contents, modifying them, etc.. The second component is a sequence editor. It allows for the building of sequences of segments and the tuning of segment parameter values such as position on the time axis, loudness or interpolation zones. Parameter evolution as stored in segments or as calculated from a sequence, are represented by break-pointfunctions (BPFs). The third component is a BPF editor which allows the user to create BPFs or to modify them, in particular the complicated data obtained from the analysis programs, frequency and amplitude of sinusoidal partials for instance. The Diphone program constitute a very flexible and inspiring tool for composition and synthesis, which musicians are using at IRCAM and outside for music creation. 4 The Sound Description Interchange Format (SDIF) The Sound Description Interchange Format (SDIF) [31] is a recently adopted standard that can store a 5
6 6 variety of sound representations including spectral, time domain, and higher-level models. SDIF consists of a specified data format and a set of standard sound descriptions and their official representation. SDIF is flexible in that new sound descriptions can be represented, and new kinds of data can be added to existing sound descriptions, facilitating innovation and research. This standard is developed in collaboration by several research centers, notably Ircam, CNMAT and Audiovisual Institute of the Pompeu Fabra University. The main goal of SDIF is to promote interchange by providing a common format for a variety of sound descriptions. For example, a panel appears at ICMC 2000 in Berlin on various Analysis/Synthesis Techniques where the analysis/synthesis data are exclusively shared, between seven different institutions, in SDIF format. The SDIF specification is open and publicly available as well as C and C++ libraries, available at no charge from and for multiple platforms, including SGI IRIX, DEC Alpha OSF, Apple MacOS, Windows, and Linux. SDIF has standardized formats to support common extant sound descriptions. SDIF allows custom versions of these representations that include all the standard data in the standard places, plus extra fields. Entirely new representations can also be added. SDIF was designed to support file storage and Internet streaming of aggregates of various sound descriptions. We hope that SDIF will encourage and facilitate the development of new tools for manipulating sound in spectral and other domains, promote the use of interesting sound descriptions in general, and facilitate sharing of work within the community. The body of an SDIF file is a sequence of timetagged frames modeled after chunks in the IFF/AIFF/RIFF formats. The time tag is an eight byte floating point number that indicates the time to which the given frame applies. By allowing any kind of frames in the same file, SDIF is also an aggregate or archive format. Data in a frame are stored in one or more 2D matrices. Matrix columns correspond to parameters like frequency or amplitude; each row represents an object like a filter, sinusoid, or noise band. Among the many SDIF sound description types (frames), the following are used by the libraries and programs quoted here above: fundamental frequency, signal windows, discrete short-time Fourier transform, picked spectral peaks, sinusoidal tracks, harmonic sinusoidal tracks, FOFs, resonance models, white noise, signal samples, PSOLA markers, voicing coefficient, time-domain envelope, spectral envelope, cepstral coefficients, autocorrelation coefficients, autoregressive coefficients, reflection coefficients. A number of utility programs have been written at Ircam by D. Schwarz, D. Virolle and P. Tisserand, among which are: an invertible SDIF to text transformation; sdifextract, which extract data from an SDIF file according to time-interval, frame, matrix, row, column, etc.; various conversion tools to other textual and binary formats, a drag&drop conversion tool for Macintosh, a reader/writer for Matlab, a real-time parameter reader/writer for jmax patches and a reader/writer for OpenMusic. Other utility programs have been written in other institutions, such as a merger and a reader/writer for Max-MSP at CNMAT. 5 Musical and other usage 5.1 Musical creation at Ircam Various pieces have been created at Ircam using the tools which we have described above. Let us quote a few: M by P. Leroux, Voile by D. Cohen, Epitafios by A. Vinao and Mountain Language by J. Wood have been done with Diphone, ModRes and AddAn. 5.2 A PSOLA virtual choir Composer P. Manoury is working on a new opera K based on F. Kafka s novel Der Prozess, with the musical assistance of S. Lemouton [32]. For several scenes of this opera, the composer has expressed the need for choral voices evoking the notion of crowd and for sounds unusual or impossible for a real choir. It was decided that the virtual choir would be the superposition of multiple well enough distinguishable voices. The PSOLA method described here above was found a proper way to create the different voices with individual differences and to allow a wide range of transformations [32]. A whole group of voices is derived from a single recorded voice by using the real-time PSOLA implementation in FTX-jMax. Each individual voice differs from the others by pitch deviation and duration change allowed by PSOLA. Other modifications are possible such as suppressing the vibrato in the recorded voice and imposing a new and different vibrato for the 6
7 7 individual voices. Interesting effects can be obtained when using the voicing coefficient. For example, voiced segments can be stretched more than unvoiced ones. Similarly, vowels and consonants can be independently processed and spatialized. Sound examples will be played at the conference. 5.3 Post-processing A few years ago, our team was asked to create the voice of a castrato for the movie Farinelli [33]. This was done essentially by morphing real soprano and counter-tenor voices into a virtual castrato voice with our Super-VP phase vocoder. Recently, another demand came from the movie industry to improve the English pronunciation of a French actor in an English movie. PSOLA was found adequate to correct the prosody by changing pitch evolution and duration and energy of phonemes. Similarly, we have been asked to work on recordings of an actor who has a noticeable German accent in French. The task is to diminish his accent and eventually to change it in the direction of a neutral accent or even an Italian accent. We found this possible with a combination of PSOLA for the prosody and Spectral Envelope modification to change the timbre of the voice as well. These different works have been done by S. Rossignol, G. Peeters and A. Lithaud. Finally, let us mention a non-musical application. Additive+residual has been used to create new samples in a psychoacoustic research project on the sound of air-conditioning devices. These devices produce sounds with well defined sinusoidal partials and a random component heard as a noise. Our programs have been used by I. Perry at Ircam to create new air-conditioning sounds located between existing ones, by interpolating sinusoidal partials on the one hand and noise spectral envelope on the other. The application of such an interpolation to musical sounds can be heard in [34], where, for instance hybrids of trombone and flute with varying factors provide a continous timbre change from one instrument to the other. 6 Conclusion We have presented in this paper a whole set of libraries and programs for sound analysis, processing and synthesis using signal models. A great deal of effort has been spent to assure the development, maintenance and easy usage of these tools for musicians. We have shown that a key component to guaranty such facilities is the Sound Description Interchange Format developed at Ircam in collaboration with CNMAT and other research centers. It appears that SDIF has gained ground as a standard. SDIF has demonstrated its utility sufficiently that it will be used for future projects, and we hope to see SDIF adopted by many other groups as well. 7 References [1] Rodet, X., "Time Domain Formant-Wave- Function Synthesis", Cambridge, Massachusets, Computer Music Journal, Vol 8, n 3, [2] d'alessandro, C. and Rodet, X. (1989). Synthèse et analyse-synthèse par fonctions d'ondes formantiques. J. Acoustique, (2): [3] Clarke, J. M., Manning, P.D., Berry, R, and Purvis, A., VOCEL: New implementations of the FOF synthesis method. Proc. In. Comp. Music Conf., ICMC88, Cologne 1988, pp [4] [5] Y. Potard, P-F. Baisnée & J-B.Barrière (1986) Experimenting with Models of Resonance Produced by a New Technique for the Analysis of Impulsive Sounds, Proceedings of 1986 International Computer Music Conference, La Haye, Berkeley: Computer Music Association, pp [6] Baisnée & J-B.Barrière Baudot ICMC89 Barriere, J.-B., Baisnee, P.-F., Freed, A., Baudot, M.-D., "A Digital Signal Multiprocessor and its Musical Application", Proceedings of the 15th International Computer Music Conference, Ohio State University, 1989, Computer Music Association. [7] anamod.html [8] Eckel, G., Iturbide, M.R. ans Becker, B., The development of GiST, a Granular Synthesis Tool Kit Based on an Extension of the FOF Generator, Proc. Int. Comp. Music Conf., ICMC95, Banff, 1995, pp [9] Schwarz, D., Rodet, X., Spectral envelope estimation, representation, and morphing for sound analysis, transformation, and synthesis. Proc. Int. Comp. Music Conf., ICMC99, Beijing, Oct. 99, pp [10] [11] Rodet, X., Schwarz, D., Spectral envelopes and additive+residual analysis-synthesis, to appear in J. Beauchamp ed. The Sound of Music. Springer N.Y. to be published. 7
8 8 [12] Oudot M., "Estimation of the spectral envelope of mixed spectrum signals using a penalized likelihood criterion", IEEE Trans. Speech and Audio Processing, Juin [13] estimate.html [14] McAulay RJ, Quatieri ThF. Speech analysis/synthesis based on a sinusoidal representation. In: IEEE Trans. on Acoust., Speech and Signal Proc., vol ASSP pp [15] Serra X. A system for sound analysis/transformation/synthesis based on a deterministic plus stochastic decomposition. Philosophy Dissertation, Stanford University, Oct [16] Serra, Xavier. "Musical Sound Modeling with Sinusoids plus Noise," in G. D. Poli, A. Picialli, S. T. Pope, and C. Roads, editors, Musical Signal Processing. Swets & Zeitlinger Publishers, [17] Quatieri ThF, McAulay RJ. Shape Invariant Time-Scale and Pitch Modification of Speech. IEEE Trans. on Signal Processing, Vol. 40 No. 3, March [18] Depalle, P., Rodet, X, A new additive synthesis method using inverse Fourier transform, Int. Comp. Music Conf., San-Jose, Oct. 92. [19] index-e.html [20] [21] P. Depalle, G. García & X. Rodet, Tracking of partials for additive sound synthesis using hidden Markov models, IEEE ICASSP-93, Minneapolis, Min., Apr [22] dochmm.html [23] docsview/ [24] [25] Déchelle et al. 1999a ICMC Déchelle, F., Borghesi, R., Cecco, M. D., Maggi, E., Rovan, B. and Scnell, N., jmax: An environment for real-tme musical applications, Comp. Music J., 23(3): [26] Moulines, E. and Charpentier, F. (1990). Pitch- Synchronous Waveform Processing Techniques for Text-To-Speech Synthesis using Diphones. Speech Communication, (9): [27] [28] Peeters, G. (1998). Analyse-Synthèse des sons musicaux par la méthode PSOLA. In proc. Journées Informatique Musicale, Agelonde, France. [29] Peeters, G. and Rodet, X. (1999). Non- Stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum. In Proc. Int. Congr. Signal Proc. Applic. and Tech., Orlando, USA. [30] X. Rodet, A. Lefèvre: The Diphone program: New features, new synthesis methods and experience of musical use, proc. Int. Comp. Music Conference, Thessaloniki, 1997 [31] Wright, M., Chaudhary, A, Freed, A., Wessel, D., Rodet, X., Virolle, D., Woehrmann, R., Serra, X., New Applications of the Sound Description Interchange Format, proc. Int. Comp. Music Conf. ICMC98, Ann Arbor, Michigan, USA, Oct. 1998, pp [32] Schnell, N., Peeters, G., Lemouton, S., Manoury, P. and Rodet, X., Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA), proc. Int. Comp. Music Conf. ICMC 2000, Berlin, Sep [33] P. Depalle, G. García & X. Rodet, A virtual Castrato (!?), Proc. Int. Computer Music Conference, Aarhus, Denmark, Oct [Schwarz 98] Schwarz, D., "Spectral Envelopes in Sound Analysis and Synthesis," ICMC [34] 8
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment
Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase Reassignment Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou, Analysis/Synthesis Team, 1, pl. Igor Stravinsky,
More informationSINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum
SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase Reassigned Spectrum Geoffroy Peeters, Xavier Rodet Ircam - Centre Georges-Pompidou Analysis/Synthesis Team, 1, pl. Igor
More informationSynthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA)
Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA) Norbert Schnell, Geoffroy Peeters, Serge Lemouton,! "#$ %'&(')'!+*-, Philippe Manoury, Xavier Rodet IRCAM - CENTRE GEORGES-POMPIDOU
More informationA Parametric Model for Spectral Sound Synthesis of Musical Sounds
A Parametric Model for Spectral Sound Synthesis of Musical Sounds Cornelia Kreutzer University of Limerick ECE Department Limerick, Ireland cornelia.kreutzer@ul.ie Jacqueline Walker University of Limerick
More informationSound Synthesis Methods
Sound Synthesis Methods Matti Vihola, mvihola@cs.tut.fi 23rd August 2001 1 Objectives The objective of sound synthesis is to create sounds that are Musically interesting Preferably realistic (sounds like
More informationSignal Characterization in terms of Sinusoidal and Non-Sinusoidal Components
Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components Geoffroy Peeters, avier Rodet To cite this version: Geoffroy Peeters, avier Rodet. Signal Characterization in terms of Sinusoidal
More informationTimbral Distortion in Inverse FFT Synthesis
Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials
More informationHungarian Speech Synthesis Using a Phase Exact HNM Approach
Hungarian Speech Synthesis Using a Phase Exact HNM Approach Kornél Kovács 1, András Kocsor 2, and László Tóth 3 Research Group on Artificial Intelligence of the Hungarian Academy of Sciences and University
More informationADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL
ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL José R. Beltrán and Fernando Beltrán Department of Electronic Engineering and Communications University of
More informationL19: Prosodic modification of speech
L19: Prosodic modification of speech Time-domain pitch synchronous overlap add (TD-PSOLA) Linear-prediction PSOLA Frequency-domain PSOLA Sinusoidal models Harmonic + noise models STRAIGHT This lecture
More informationHIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING
HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING Jeremy J. Wells, Damian T. Murphy Audio Lab, Intelligent Systems Group, Department of Electronics University of York, YO10 5DD, UK {jjw100
More informationTIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis
TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis Cornelia Kreutzer, Jacqueline Walker Department of Electronic and Computer Engineering, University of Limerick, Limerick,
More informationAudio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands
Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,
More informationDetermination of instants of significant excitation in speech using Hilbert envelope and group delay function
Determination of instants of significant excitation in speech using Hilbert envelope and group delay function by K. Sreenivasa Rao, S. R. M. Prasanna, B.Yegnanarayana in IEEE Signal Processing Letters,
More informationBetween physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz
Between physics and perception signal models for high level audio processing Axel Röbel Analysis / synthesis team, IRCAM DAFx 2010 iem Graz Overview Introduction High level control of signal transformation
More informationVIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering
VIBRATO DETECTING ALGORITHM IN REAL TIME Minhao Zhang, Xinzhao Liu University of Rochester Department of Electrical and Computer Engineering ABSTRACT Vibrato is a fundamental expressive attribute in music,
More informationAnalysis/Synthesis Comparison
Analysis/Synthesis Comparison Matthew Wright (CNMAT), James Beauchamp (UIUC), Kelly Fitz (CERL Sound Group), Xavier Rodet (IRCAM), Axel Röbel (CCRMA, now IRCAM), Xavier Serra (IUA/UPF), Gregory Wakefield
More informationSPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester
SPEECH TO SINGING SYNTHESIS SYSTEM Mingqing Yun, Yoon mo Yang, Yufei Zhang Department of Electrical and Computer Engineering University of Rochester ABSTRACT This paper describes a speech-to-singing synthesis
More informationSpectral analysis based synthesis and transformation of digital sound: the ATSH program
Spectral analysis based synthesis and transformation of digital sound: the ATSH program Oscar Pablo Di Liscia 1, Juan Pampin 2 1 Carrera de Composición con Medios Electroacústicos, Universidad Nacional
More informationAnalysis/synthesis comparison
Analysis/synthesis comparison MATTHEW WRIGHT (CNMAT), JAMES BEAUCHAMP (UIUC), KELLY FITZ (CERL Sound Group), XAVIER RODET (IRCAM), AXEL RÖBEL (CCRMA, now IRCAM), XAVIER SERRA (IUA/UPF) and GREGORY WAKEFIELD
More informationVOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL
VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL Narsimh Kamath Vishweshwara Rao Preeti Rao NIT Karnataka EE Dept, IIT-Bombay EE Dept, IIT-Bombay narsimh@gmail.com vishu@ee.iitb.ac.in
More informationMel Spectrum Analysis of Speech Recognition using Single Microphone
International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree
More informationReading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.
L105/205 Phonetics Scarborough Handout 7 10/18/05 Reading: Johnson Ch.2.3.3-2.3.6, Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday Spectral Analysis 1. There are
More informationWhat is Sound? Part II
What is Sound? Part II Timbre & Noise 1 Prayouandi (2010) - OneOhtrix Point Never PSYCHOACOUSTICS ACOUSTICS LOUDNESS AMPLITUDE PITCH FREQUENCY QUALITY TIMBRE 2 Timbre / Quality everything that is not frequency
More informationSound Modeling from the Analysis of Real Sounds
Sound Modeling from the Analysis of Real Sounds S lvi Ystad Philippe Guillemain Richard Kronland-Martinet CNRS, Laboratoire de Mécanique et d'acoustique 31, Chemin Joseph Aiguier, 13402 Marseille cedex
More informationSinging Expression Transfer from One Voice to Another for a Given Song
Singing Expression Transfer from One Voice to Another for a Given Song Korea Advanced Institute of Science and Technology Sangeon Yong, Juhan Nam MACLab Music and Audio Computing Introduction Introduction
More informationStructure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping
Structure of Speech Physical acoustics Time-domain representation Frequency domain representation Sound shaping Speech acoustics Source-Filter Theory Speech Source characteristics Speech Filter characteristics
More informationSpeech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065
Speech Processing Undergraduate course code: LASC10061 Postgraduate course code: LASC11065 All course materials and handouts are the same for both versions. Differences: credits (20 for UG, 10 for PG);
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationADDITIVE synthesis [1] is the original spectrum modeling
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 15, NO. 3, MARCH 2007 851 Perceptual Long-Term Variable-Rate Sinusoidal Modeling of Speech Laurent Girin, Member, IEEE, Mohammad Firouzmand,
More informationINFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE
INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE Pierre HANNA SCRIME - LaBRI Université de Bordeaux 1 F-33405 Talence Cedex, France hanna@labriu-bordeauxfr Myriam DESAINTE-CATHERINE
More informationConverting Speaking Voice into Singing Voice
Converting Speaking Voice into Singing Voice 1 st place of the Synthesis of Singing Challenge 2007: Vocal Conversion from Speaking to Singing Voice using STRAIGHT by Takeshi Saitou et al. 1 STRAIGHT Speech
More informationIMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey
Workshop on Spoken Language Processing - 2003, TIFR, Mumbai, India, January 9-11, 2003 149 IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES P. K. Lehana and P. C. Pandey Department of Electrical
More informationSignal segmentation and waveform characterization. Biosignal processing, S Autumn 2012
Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?
More informationAdaptive noise level estimation
Adaptive noise level estimation Chunghsin Yeh, Axel Roebel To cite this version: Chunghsin Yeh, Axel Roebel. Adaptive noise level estimation. Workshop on Computer Music and Audio Technology (WOCMAT 6),
More informationPreeti Rao 2 nd CompMusicWorkshop, Istanbul 2012
Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o
More informationCMPT 468: Frequency Modulation (FM) Synthesis
CMPT 468: Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University October 6, 23 Linear Frequency Modulation (FM) Till now we ve seen signals
More informationSynthesis Techniques. Juan P Bello
Synthesis Techniques Juan P Bello Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals
More informationWARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS
NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS Helsinki University of Technology Laboratory of Acoustics and Audio
More informationDAFX - Digital Audio Effects
DAFX - Digital Audio Effects Udo Zölzer, Editor University of the Federal Armed Forces, Hamburg, Germany Xavier Amatriain Pompeu Fabra University, Barcelona, Spain Daniel Arfib CNRS - Laboratoire de Mecanique
More informationspeech signal S(n). This involves a transformation of S(n) into another signal or a set of signals
16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract
More informationCombining granular synthesis with frequency modulation.
Combining granular synthesis with frequey modulation. Kim ERVIK Department of music University of Sciee and Technology Norway kimer@stud.ntnu.no Øyvind BRANDSEGG Department of music University of Sciee
More informationSpeech Synthesis; Pitch Detection and Vocoders
Speech Synthesis; Pitch Detection and Vocoders Tai-Shih Chi ( 冀泰石 ) Department of Communication Engineering National Chiao Tung University May. 29, 2008 Speech Synthesis Basic components of the text-to-speech
More informationProject 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing
Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You
More informationAdvanced audio analysis. Martin Gasser
Advanced audio analysis Martin Gasser Motivation Which methods are common in MIR research? How can we parameterize audio signals? Interesting dimensions of audio: Spectral/ time/melody structure, high
More informationPlaits. Macro-oscillator
Plaits Macro-oscillator A B C D E F About Plaits Plaits is a digital voltage-controlled sound source capable of sixteen different synthesis techniques. Plaits reclaims the land between all the fragmented
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 Speech and telephone speech Based on a voice production model Parametric representation
More informationSynthesis Algorithms and Validation
Chapter 5 Synthesis Algorithms and Validation An essential step in the study of pathological voices is re-synthesis; clear and immediate evidence of the success and accuracy of modeling efforts is provided
More informationGMU, A FLEXIBLE GRANULAR SYNTHESIS ENVIRONMENT IN MAX/MSP
GMU, A FLEXIBLE GRANULAR SYNTHESIS ENVIRONMENT IN MAX/MSP Charles Bascou and Laurent Pottier GMEM Centre National de Creation Musicale 15, rue de Cassis 13008 MARSEILLE FRANCE www.gmem.org charles.bascou@free.fr
More informationSGN Audio and Speech Processing
Introduction 1 Course goals Introduction 2 SGN 14006 Audio and Speech Processing Lectures, Fall 2014 Anssi Klapuri Tampere University of Technology! Learn basics of audio signal processing Basic operations
More informationNOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW
NOTES FOR THE SYLLABLE-SIGNAL SYNTHESIS METHOD: TIPW Hung-Yan GU Department of EE, National Taiwan University of Science and Technology 43 Keelung Road, Section 4, Taipei 106 E-mail: root@guhy.ee.ntust.edu.tw
More informationAudio Imputation Using the Non-negative Hidden Markov Model
Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.
More informationA system for automatic detection and correction of detuned singing
A system for automatic detection and correction of detuned singing M. Lech and B. Kostek Gdansk University of Technology, Multimedia Systems Department, /2 Gabriela Narutowicza Street, 80-952 Gdansk, Poland
More informationADAPTIVE NOISE LEVEL ESTIMATION
Proc. of the 9 th Int. Conference on Digital Audio Effects (DAFx-6), Montreal, Canada, September 18-2, 26 ADAPTIVE NOISE LEVEL ESTIMATION Chunghsin Yeh Analysis/Synthesis team IRCAM/CNRS-STMS, Paris, France
More informationAspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta
Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied
More informationBand-Limited Simulation of Analog Synthesizer Modules by Additive Synthesis
Band-Limited Simulation of Analog Synthesizer Modules by Additive Synthesis Amar Chaudhary Center for New Music and Audio Technologies University of California, Berkeley amar@cnmat.berkeley.edu March 12,
More informationFREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche
Proc. of the 6 th Int. Conference on Digital Audio Effects (DAFx-3), London, UK, September 8-11, 23 FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION Jean Laroche Creative Advanced Technology
More informationDetermination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech
Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech L. Demri1, L. Falek2, H. Teffahi3, and A.Djeradi4 Speech Communication
More informationGlottal source model selection for stationary singing-voice by low-band envelope matching
Glottal source model selection for stationary singing-voice by low-band envelope matching Fernando Villavicencio Yamaha Corporation, Corporate Research & Development Center, 3 Matsunokijima, Iwata, Shizuoka,
More informationLinear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis
Linear Frequency Modulation (FM) CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis Tamara Smyth, tamaras@cs.sfu.ca School of Computing Science, Simon Fraser University January 26, 29 Till now we
More informationA Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features
A Linear Hybrid Sound Generation of Musical Instruments using Temporal and Spectral Shape Features Noufiya Nazarudin, PG Scholar, Arun Jose, Assistant Professor Department of Electronics and Communication
More informationTranscription of Piano Music
Transcription of Piano Music Rudolf BRISUDA Slovak University of Technology in Bratislava Faculty of Informatics and Information Technologies Ilkovičova 2, 842 16 Bratislava, Slovakia xbrisuda@is.stuba.sk
More informationMUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting
MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting Julius O. Smith III (jos@ccrma.stanford.edu) Center for Computer Research in Music and Acoustics (CCRMA)
More informationSpeech Processing. Simon King University of Edinburgh. additional lecture slides for
Speech Processing Simon King University of Edinburgh additional lecture slides for 2018-19 assignment Q&A writing exercise Roadmap Modules 1-2: The basics Modules 3-5: Speech synthesis Modules 6-9: Speech
More informationFeature extraction and temporal segmentation of acoustic signals
Feature extraction and temporal segmentation of acoustic signals Stéphane Rossignol, Xavier Rodet, Joel Soumagne, Jean-Louis Colette, Philippe Depalle To cite this version: Stéphane Rossignol, Xavier Rodet,
More informationPresented at the 93rd Convention 1992 October 1-4 SanFrancisco
Spectral Envelopes and Inverse FFT Synthesis 3393 (H-3) X. Rodet and P. Depall IRCAM Paris, France Presented at the 93rd Convention 1992 October 1-4 SanFrancisco AuDIO This preprint has been reproduced
More informationFormant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope
Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope Myeongsu Kang School of Computer Engineering and Information Technology Ulsan, South Korea ilmareboy@ulsan.ac.kr
More informationDifferent Approaches of Spectral Subtraction Method for Speech Enhancement
ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches
More informationDirection-Dependent Physical Modeling of Musical Instruments
15th International Congress on Acoustics (ICA 95), Trondheim, Norway, June 26-3, 1995 Title of the paper: Direction-Dependent Physical ing of Musical Instruments Authors: Matti Karjalainen 1,3, Jyri Huopaniemi
More informationIdentification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound
Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound Paul Masri, Prof. Andrew Bateman Digital Music Research Group, University of Bristol 1.4
More informationCOMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING
17th European Signal Processing Conference (EUSIPCO 29) Glasgow, Scotland, August 24-28, 29 COMBINING ADVANCED SINUSOIDAL AND WAVEFORM MATCHING MODELS FOR PARAMETRIC AUDIO/SPEECH CODING Alexey Petrovsky
More informationQuantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation
Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation Peter J. Murphy and Olatunji O. Akande, Department of Electronic and Computer Engineering University
More informationSPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION
M.Tech. Credit Seminar Report, Electronic Systems Group, EE Dept, IIT Bombay, submitted November 04 SPEECH ANALYSIS-SYNTHESIS FOR SPEAKER CHARACTERISTIC MODIFICATION G. Gidda Reddy (Roll no. 04307046)
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationRobust Voice Activity Detection Based on Discrete Wavelet. Transform
Robust Voice Activity Detection Based on Discrete Wavelet Transform Kun-Ching Wang Department of Information Technology & Communication Shin Chien University kunching@mail.kh.usc.edu.tw Abstract This paper
More informationTime-Frequency Distributions for Automatic Speech Recognition
196 IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001 Time-Frequency Distributions for Automatic Speech Recognition Alexandros Potamianos, Member, IEEE, and Petros Maragos, Fellow,
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationMonophony/Polyphony Classification System using Fourier of Fourier Transform
International Journal of Electronics Engineering, 2 (2), 2010, pp. 299 303 Monophony/Polyphony Classification System using Fourier of Fourier Transform Kalyani Akant 1, Rajesh Pande 2, and S.S. Limaye
More informationA GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING. Martin Raspaud, Sylvain Marchand, and Laurent Girin
Proc. of the 8 th Int. Conference on Digital Audio Effects (DAFx 5), Madrid, Spain, September 2-22, 25 A GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING Martin Raspaud,
More informationINTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.
INTRODUCTION TO COMPUTER MUSIC FM SYNTHESIS A classic synthesis algorithm Roger B. Dannenberg Professor of Computer Science, Art, and Music ICM Week 4 Copyright 2002-2013 by Roger B. Dannenberg 1 Frequency
More informationTwo-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling
Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Mikko Parviainen 1 and Tuomas Virtanen 2 Institute of Signal Processing Tampere University
More informationRECENTLY, there has been an increasing interest in noisy
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 9, SEPTEMBER 2005 535 Warped Discrete Cosine Transform-Based Noisy Speech Enhancement Joon-Hyuk Chang, Member, IEEE Abstract In
More informationDeveloping a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab
Developing a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab 2009-2010 Victor Shepardson June 7, 2010 Abstract A software audio synthesizer is being implemented in C++,
More informationInternational Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015
International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha
More informationSinusoidal Modelling in Speech Synthesis, A Survey.
Sinusoidal Modelling in Speech Synthesis, A Survey. A.S. Visagie, J.A. du Preez Dept. of Electrical and Electronic Engineering University of Stellenbosch, 7600, Stellenbosch avisagie@dsp.sun.ac.za, dupreez@dsp.sun.ac.za
More informationIntroduction of Audio and Music
1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,
More informationEE482: Digital Signal Processing Applications
Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/
More informationPitch Period of Speech Signals Preface, Determination and Transformation
Pitch Period of Speech Signals Preface, Determination and Transformation Mohammad Hossein Saeidinezhad 1, Bahareh Karamsichani 2, Ehsan Movahedi 3 1 Islamic Azad university, Najafabad Branch, Saidinezhad@yahoo.com
More informationHIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS
ARCHIVES OF ACOUSTICS 29, 1, 1 21 (2004) HIGH ACCURACY AND OCTAVE ERROR IMMUNE PITCH DETECTION ALGORITHMS M. DZIUBIŃSKI and B. KOSTEK Multimedia Systems Department Gdańsk University of Technology Narutowicza
More informationUNIT-4 POWER QUALITY MONITORING
UNIT-4 POWER QUALITY MONITORING Terms and Definitions Spectrum analyzer Swept heterodyne technique FFT (or) digital technique tracking generator harmonic analyzer An instrument used for the analysis and
More informationSpeech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter
Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter 1 Gupteswar Sahu, 2 D. Arun Kumar, 3 M. Bala Krishna and 4 Jami Venkata Suman Assistant Professor, Department of ECE,
More informationFrequency slope estimation and its application for non-stationary sinusoidal parameter estimation
Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation Preprint final article appeared in: Computer Music Journal, 32:2, pp. 68-79, 2008 copyright Massachusetts
More information8.3 Basic Parameters for Audio
8.3 Basic Parameters for Audio Analysis Physical audio signal: simple one-dimensional amplitude = loudness frequency = pitch Psycho-acoustic features: complex A real-life tone arises from a complex superposition
More informationLearning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks
Learning New Articulator Trajectories for a Speech Production Model using Artificial Neural Networks C. S. Blackburn and S. J. Young Cambridge University Engineering Department (CUED), England email: csb@eng.cam.ac.uk
More informationLocalized Robust Audio Watermarking in Regions of Interest
Localized Robust Audio Watermarking in Regions of Interest W Li; X Y Xue; X Q Li Department of Computer Science and Engineering University of Fudan, Shanghai 200433, P. R. China E-mail: weili_fd@yahoo.com
More informationTHE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES
J. Rauhala, The beating equalizer and its application to the synthesis and modification of piano tones, in Proceedings of the 1th International Conference on Digital Audio Effects, Bordeaux, France, 27,
More informationDept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark
NORDIC ACOUSTICAL MEETING 12-14 JUNE 1996 HELSINKI Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark krist@diku.dk 1 INTRODUCTION Acoustical instruments
More informationAdaptive Filters Application of Linear Prediction
Adaptive Filters Application of Linear Prediction Gerhard Schmidt Christian-Albrechts-Universität zu Kiel Faculty of Engineering Electrical Engineering and Information Technology Digital Signal Processing
More informationAudio Signal Compression using DCT and LPC Techniques
Audio Signal Compression using DCT and LPC Techniques P. Sandhya Rani#1, D.Nanaji#2, V.Ramesh#3,K.V.S. Kiran#4 #Student, Department of ECE, Lendi Institute Of Engineering And Technology, Vizianagaram,
More information