Frequency-domain. Time-domain. time-aliasing. Time. Frequency. Frequency. Time. Time. Frequency

Size: px

Start display at page:

Download "Frequency-domain. Time-domain. time-aliasing. Time. Frequency. Frequency. Time. Time. Frequency"

Shannon Douglas
5 years ago
Views:

1 IEEE TRASACTIOS O SPEECH AD AUDIO PROCESSIG, VOL. XX, O. Y, MOTH Synthesis of sinusoids via non-overlapping inverse Fourier transform Jean Laroche Abstract Additive synthesis is a powerful tool for the analysis/modication/synthesis of complex audio or speech signals. However, the cost of wavetable sinusoidal synthesis can become prohibitive for large numbers of sinusoids (more than a few hundred). In that case, techniques based on the inverse Fourier transform oer an attractive alternative, being 200% to 300% more ecient than wavetable synthesis depending on the number of sinusoids. This paper presents an improved technique based on the concatenation of shortterm signals obtained by inverse Fourier transforms. By contrast with the standard overlap-add technique, the new algorithm requires synthesizing sinusoids in the frequency domain whose time-domain amplitudes vary linearly within the synthesis frame. The technique is shown to achieve higher quality than the standard overlap-add technique, at the cost of a small increase in computation. Keywords I. Introduction MAY musical and audio applications require the calculation of large numbers of sinusoids to synthesize a signal. Examples include sinusoidal analysis/synthesis [1], [2], [3], [4], sinusoidal coding [5], [6]. In most cases, the sinusoids have time-varying parameters to account for the non-stationary nature of the signal to be modeled, and their frequencies, amplitudes and phases are controlled over time. A traditional way of synthesizing sinusoids consists of using a "wavetable" synthesis technique [7], [8] in which one cycle of the sinusoid (or one half-cycle) is pre-calculated and stored in a table. At synthesis time, the signal is read from the table at a rate which depends on the desired frequency, and linear interpolation is used optionally to improve the accuracy of the synthesized signal. The technique is easy to implement, lends itself very well to VLSI applications, but is moderately ecient in terms of computation. Another sinusoidal synthesis technique consists of using a highly-resonant second-order lter excited by a very short impulse-like signal [9], [10], [11]. While this technique has the additional advantage of enabling source-lter synthesis (in which an audio signal is fed into the resonant lter), it is usually much more costly than wavetable synthesis and is therefore limited to the synthesis of small numbers of sinusoids. When large numbers of sinusoids must be synthesized (for example a few thousand), the computational cost of wavetable synthesis becomes prohibitive too, and alternative techniques must be used. One such technique uses an inverse Fourier transform to generate a -point frame containing a sum of several sinusoids [12], [13], [14], [15], [16]. The technique consists of synthesizing each sinusoid's trace in the frequency do- Jean Laroche is with the Joint Creative/Emu Technology Center, Scotts Valley, CA, USA. jeanl@emu.com main, then taking an inverse Fast Fourier Transform (FFT) of the result, and normalizing and overlapping the resulting frame with the preceding frame in a process known as overlap-add synthesis. This is advantageous because a large number of samples of one sinusoid can be relatively accurately represented by a few points in the frequencydomain (typically four to nine points). As a result, when large numbers of sinusoids are needed, the overhead of the inverse-fft/overlap-add stages is easily oset by the low cost of synthesizing a few points per sinusoid in the frequency domain. The technique allows the synthesis of a few thousand sinusoids with time-varying parameters on current general-purpose processors while wavetable synthesis would only allow the synthesis of a few hundred. The cost saving increases with the size of the Fourier transform, as the overhead of the inverse-fft/overlap-add stages is distributed among a larger number of output samples. The signals synthesized by overlap-add inverse-fft techniques are usually not perfectly sinusoidal: The fact that only a few points are synthesized in the frequency domain generates distortions which can be minimized (but not eliminated) by an appropriate choice of the window template. In addition, for sinusoids with time-varying frequencies, in order to avoid amplitude modulation resulting from incoherence during the overlap-add stage, one needs to synthesize sweeping sinusoids, i.e., sinusoids with a frequency that varies linearly within an FFT frame, or better yet, sinusoids with two sweeping rates within one FFT frame [15]. This results in a more complex, more expensive algorithm because the frequency-domain trace of the sinusoid is less simple. In this paper, we present an alternative technique based on non-overlapping inverse-fft synthesis, in which each synthesized frame is simply concatenated with the preceding one, instead of overlap-added. Getting rid of the overlapadd stage eliminates the modulation problems mentioned above. The drawback of this technique is that the linearlyvarying amplitude of the sinusoid must be synthesized in the frequency domain, which makes the algorithm slightly more complex. The rst section briey describes the standard overlap-add technique. In the second section, the non-overlapping technique is introduced and various quality/eciency tradeos are discussed. II. Standard overlap-add inverse-fft synthesis of sinusoids In this section, we briey present the standard overlapadd inverse-fft synthesis technique [12], [13], [14], [15], [16] and discuss the modulation problems arising during the overlap-add stage.

2 2 IEEE TRASACTIOS O SPEECH AD AUDIO PROCESSIG, VOL. XX, O. Y, MOTH 1999 A. Technique The standard overlap-add inverse-fft synthesis requires the choice of what is called a "spectral motif" H(), the representation of a sinusoid in the frequency-domain, given a time-domain window h(n). Specically, we have: H() = 1X i=?1 h(i)e?ji The window h(n) is chosen for its spectral properties: Its main lobe must be as narrow as possible and its side lobes as low as possible so it can be represented with as few points as possible. Typical choices include the Blackman-Harris family, the Kaiser window and so on [17]. For real-time synthesis purposes, the spectral motif H() is precalculated, sampled and stored in a table. Because this spectral motif will later be positioned arbitrarily in the frequency domain, depending on the frequency of the sinusoid to be synthesized, it should be sampled nely enough to make the corresponding frequency inaccuracy inaudible. The size of the spectral motif table does not need to be very large though, since only the positive-frequency half of the main lobe and optionally the rst side lobe need to be stored. In order to synthesize a sinusoid with an amplitude A, a phase at the center of the FFT frame, and a constant frequency!, the spectral motif is copied at the right frequency, with the appropriate amplitude and phase. Denoting Y ( k ) the frequency-domain -point FFT frame with k = 2k=, we have Y ( k ) = A 2 ej H( k?!) for j k?!j < K 2 (1) where K controls the number of FFT channels that are dened for each sinusoid. K ranges typically from 3 to 4. This process is repeated for each sinusoid and the resulting frequency-domain frame Y ( k ) is then inverse- Fourier transformed, yielding the time-domain signal y u (n) at frame u. This short-term signal is then normalized by the inverse of the analysis window 1=h(n) and multiplied by a triangular window g(n). This ensures that the amplitude of each sinusoid will have a triangular shape within the FFT frame, and therefore that the amplitude of each sinusoid after the overlap-add will vary linearly from one frame to the next. This stage is performed in one step, the ratio g(n)=h(n) being stored in a table. Care must be taken for g(n)=h(n) not to take large values at the beginning and the end of the FFT frame (i.e., g(n) must decrease faster than h(n)). Finally, successive normalized frames are overlapadded with a 50% overlap, yielding the output synthesized signal. B. Problems in the overlap-add stage In the simplest method described above, a sinusoid with a time-varying frequency is synthesized by overlap-adding short-term sinusoids with constant frequencies within each FFT frame. This means that in the region where frames u and u + 1 overlap, a sinusoid with frequency! u is added to a sinusoid with a frequency! u+1 close but not exactly equal to! u. As a result, regardless of the dierence of phases at the centers of frames u and u + 1, some degree of amplitude modulation will occur, due to the beating between the two close-frequency sinusoids. The solution proposed in [13] to minimize this problem consists of making the phase of the sinusoid in frame u match that of the sinusoid in frame u + 1 halfway in the overlap region. This only minimizes the artifact, but does not require any additional computation. A more accurate solution consists of synthesizing a sinusoid with a frequency that varies linearly within an FFT frame, with two dierent slopes in the rst and second half-frames [15]. The slope of the second half of frame u is constrained to match that of the rst half of frame u + 1, and as a result, the instantaneous frequencies of the sinusoids in the overlap region coincide perfectly. With the right choice of phases at the center of the FFT frames, the two sinusoids overlap-add perfectly (i.e., with no phase-mismatch amplitude modulation). Unfortunately, synthesizing sinusoids with two dierent sweep rates within an FFT frame adds signicantly to the complexity of the algorithm. Several spectral motifs must be stored for a series of sweep rates, the synthesis of each sinusoid's frequency-domain pattern requires more calculations, and each frame must be synthesized by two half-size inverse Fourier transforms, undermining the cost saving of the overall algorithm. The algorithm presented below eliminates the overlap-add stage by simply allowing successive frames to be concatenated. Overlap-add modulation artifacts are therefore eliminated. III. on-overlapped inverse-fft synthesis of sinusoids A. Underlying idea If the successive frames obtained by inverse Fourier transforms are concatenated instead of overlap-added, one must make sure that amplitudes and phases are "continuous" from frame to frame. Because amplitudes generally vary from frame to frame, ensuring amplitude continuity requires synthesizing sinusoids with time-varying amplitudes within the FFT frame, instead of constant amplitudes as in the overlap-add synthesis technique. In practice, the easiest way to achieve this is to synthesize sinusoids with a linear amplitude variation within the FFT frame, which can be done relatively easily in the frequency domain. ote that it is still possible to use sinusoids with constant frequencies within the FFT frame, because frequency jumps from frame to frame are not too objectionable if the frame rate is high enough. Another important point is that because only a few frequency points are synthesized for each sinusoid, the time-domain short-term signal exhibits distortion, especially near the beginning and the end of the frame. As a result, a few boundary samples must be discarded before the concatenation. The following sections presents the details of the algorithm.

3 LAROCHE: SYTHESIS OF SIUSOIDS VIA O-OVERLAPPIG IVERSE FOURIER TRASFORM 3 B. Synthesizing linear-amplitude sinusoids The spectral motif of a sinusoid with a linearly varying amplitude (more precisely, an ane amplitude) includes two terms, one corresponding to a constant amplitude, and one corresponding to a linear amplitude. It is easier to derive the exact shape of the spectral motif for a null-frequency sinusoid: Given an analysis window h(n), the Fourier transform of a weighted ane function x(n) = h(n)(a + n) is easily obtained from standard Fourier transform results. If u(n) $ U() ($ indicates that U() is the Fourier transform of u(n)), then nu(n) $ j du() d [18]. As a result, the Fourier transform of x(n) is X() = AH() + j dh() d where H() is the Fourier transform of h(n). We see from Eq. (2) that the spectral motif is simply a linear combination of two parts, and therefore we now need to store two spectral motif tables, one corresponding to H() and another one corresponding to its frequency derivative: H 0 () = dh() d It is useful to notice that since h(n) usually is a real symmetrical window, H() is real symmetrical, and H 0 () is real odd-symmetrical, and therefore only the positivefrequency halves ( 0) must be stored. In order to synthesize a sinusoid of frequency!, with a phase, a constant amplitude A and a linear amplitude at the center of the frequency-domain frame, one needs to generate a sinusoidal spectral pattern from the stored spectral motif tables: Y ( k ) = e j A 2 H( k?!) + 2 H0 ( k?!) for (2) j k?!j K 2 (3) in which is the size of the Fourier transform and k = k2=. K controls how many frequency bins are synthesized in the frequency domain around the sinusoidal peak. More will be said later on how K should be selected. The 1=2 terms in Eq. (3) come from the fact that a sinusoid with a strictly positive frequency is represented by two halfamplitude spectral motifs at!. In practice, the signals to be synthesized are real, and the inverse Fast Fourier Transform algorithm only uses the positive-frequency half spectrum, so only one of the two spectral motifs must be synthesized. If! < K 2, then k will become negative in Eq. (3), and the corresponding part of the motif should be reected back into the positive frequencies with a complex conjugation, to account for the fact that the (not synthesized) negative-frequency spectral motif spills into the positive-frequency axis. More specically, denoting G( k ;!) = e j A 2 H( k?!) + 2 H0 ( k?!) Eq. (3) must be replaced by Y ( k ) = 8>< >: 2< fg(0;!)g for k = 0 G( k ;!) + G( k ;?!) for 0 < k?! + 2K G( k ;!) for?! + 2K < k! + 2K where < fzg is the real part of complex z. As in the overlap-add synthesis technique, H() and H 0 () are precalculated, sampled and stored in a table. Because the motif is sampled, it cannot be placed at arbitrary frequencies!: if H() and H 0 () are sampled for = 2=M, then the motif can be placed at discrete frequencies ^! = 2n=M. If M is large enough, the small frequency error can be conned within a range where it is inaudible. The quantized value ^! of the frequency! must be used, however, during the calculation of the frame-toframe phase increment (see below). The above procedure is repeated for every sinusoid to be synthesized, the spectral patterns being added together where they overlap. C. Obtaining the short-term frame Once the spectral patterns corresponding to all the sinusoids have been synthesized, the inverse Fourier transform of the frequency-domain frame Y ( k ) is calculated, usually by means of a Fast Fourier Transform (FFT) algorithm. The result is a short-term signal y u (n), where the subscript u indicates the frame index. In y u (n), the sinusoids have an ane amplitude multiplied by the analysis window h(n) corresponding to the spectral motif H(). To restore the desired ane amplitude, y u (n) is normalized by the inverse of h(n), which requires h(n) to never take null values: y0 u (n) = y u(n) h(n) The distortions introduced by the limited number of points in the spectral patterns are amplied by the normalization Eq. (5), especially where h(n) takes on small values (typically h(n) is a tapered window whose extremities can become quite small). As a result, the samples at the beginning and at the end of the short-term signal exhibit a higher level of distortion than samples located around the middle of the frame. For that reason, D samples are discarded from the beginning and the end of each short-term frame before the frames are concatenated. More will be said below on how D should be chosen, but a typical value is one tenth of the frame size. D. Ensuring amplitude and phase coherence Each short-term frame is concatenated with the preceding one, once D boundary samples have been discarded from both ends of the frame. As a result, the centers of two consecutive frames are?2d samples apart, and this must be taken into account when calculating the phase (4) (5)

4 4 IEEE TRASACTIOS O SPEECH AD AUDIO PROCESSIG, VOL. XX, O. Y, MOTH 1999 and amplitude terms A and for each sinusoid. Specically, the phase u+1 at the center of frame u + 1 must be equal to u+1 = u + ^! u? 2D 2 + ^! u+1? 2D 2 where ^! u is the quantized value of the sinusoid's frequency in frame u. Eq. (6) accounts for the fact that the frequency in the second half-frame u is ^! u while that in the rst halfframe u + 1 is ^! u+1. Similarly, the amplitude a e u of the sinusoid at the end of frame u must match the amplitude a b u+1 of the same sinusoid at the beginning of frame u + 1 (the superscripts e and b respectively denote the end and the beginning of the frame). A u and u in Eq. (3) can be derived from a b u and a e u by ( A u = 1 2 (ae u + a b u) u = 1?2D (ae u? ab u) and If the above continuity equations hold, the signal resulting from the concatenation of the short-term frames will be free of frame-synchronous artifacts. In the following section, we investigate how the various parameters of the algorithm should be set for optimal results. IV. Choosing the right parameters The quality of the signal synthesized by the algorithm described above greatly depends on the choice of the analysis window h(n) and of its length, on the frequency at which the spectral motifs H(!) and H 0 () are sampled, on the number 2K + 1 of frequency points synthesized per sinusoid, and on the number D of samples discarded at frame boundaries prior to the concatenation stage. A. Choosing the frame length The choice of the frame length is mainly a tradeo between computational cost and the rate at which the sinusoidal frequencies are allowed to vary. Since the frequencies of the sinusoids are constant during each frame, and since? 2D samples are synthesized per frame, the sinusoidal frequencies cannot change faster than every? 2D samples. If the frequencies vary slowly in time, large values of can be used without generating audible artifacts. By contrast, for rapidly varying sinusoids, using large frame lengths would yield noticeable frequency "jumps". Unfortunately, there is no precise psycho-acoustical data available on the perceptual detectability of frequency "discontinuities" in sinusoids with piece-wise constant frequencies. Informal tests have shown that for speech signals, the frequencies should be updated at least every 10 to 20 ms, which imposes an upper bound on how large can be. Large values of are desirable, because the computational cost related to the inverse Fourier transform is nearly independent of : the cost of one length- inverse Fourier transform is roughly proportional to log 2, and each inverse FFT yields? 2D samples. The cost of the inverse Fourier transform per output sample is therefore roughly proportional to log?2d 2 which increases very slowly (6) (7) with for standard values (from = 1024 to 4096 the cost increases by about 20% for D = =10). While the cost of the FFT is nearly independent of, the cost per output sample of synthesizing the spectral patterns is roughly inversely proportional to since? 2D samples are synthesized at each frame. As a result, for eciency, should be chosen as large as the rate of frequency variations can allow. B. Choosing the spectral motif, K and D The choice of the spectral motif, or equivalently of the analysis window h(n) is crucial in that it can strongly in- uence the quality of the synthesized signal. Its inuence is tightly tied to the choice of the number 2K + 1 of synthesized frequency bins and the number D of samples discarded at frame boundaries. To better understand the roles played by these parameters, it is useful to examine the kind of distortion introduced by the fact that only 2K +1 points are synthesized per sinusoid in the frequency domain. The easiest way to analyze the distortion consists of considering that a continuous-frequency, band-limited spectral motif ~H() is used: ( H() for jj K ~H() 2 = 0 for jj > K 2 And because H() ~ is band-limited, it corresponds to a non time-limited analysis window h(n). ~ Synthesizing the spectral pattern of a sinusoid consists of shifting H() ~ to the proper center frequency ^! and then sampling it in the frequency-domain at frequencies k = 2k (as is done implicitly in the FFT). Shifting H() ~ corresponds to modu- lating h(n) ~ by a innite-duration complex exponential of frequency ^!. Sampling it at frequencies k = 2k causes the modulated h(n) ~ to be replicated or imaged every samples, and because h(n) ~ is not time-limited, this timedomain imaging causes what is called time-aliasing: portions of the images leek into the "useful" part of the window as shown in Fig. 1. Because of this time-aliasing, the output of the inverse Fourier transform is not h(n) ~ modulated by a complex exponential but includes a sum of (low-amplitude) undesirable components modulated at the same frequency ^!. Unfortunately, how these undesirable components add up depends on ^! and therefore the distortion is a function of the sinusoidal frequency. This means that it is not possible to simply cancel it during the normalization stage Eq. (5) by normalizing by an appropriate window. An important point is that these additive aliased components can be severely amplied during the normalization stage, wherever h(n) becomes small. This is the reason why D samples are discarded from the edge of each frame (where h(n) usually becomes small). The additive time-aliased signals cause the signal in the "useful" frame?=2 < n =2 to be amplitude-modulated (since all the time-aliased signal have the same frequency but dierent phases). Even though relatively large levels of amplitude modulation can pass unnoticed when the modulation is "continuous" (does not exhibit jumps), the situation here is more stringent

5 LAROCHE: SYTHESIS OF SIUSOIDS VIA O-OVERLAPPIG IVERSE FOURIER TRASFORM 5 Frequency-domain Time-domain following section presents an example of such a procedure. π π π 2πK/M π π π Frequency Frequency Frequency /2 -/2 /2 /2 time-aliasing Fig. 1. Time-aliasing resulting from "band-limiting" the spectral motif. Stage 1 (top) shows the continuous-frequency spectral motif (left) and its time-limited corresponding signal (right). Band limiting the spectral motif in stage 2 yields a time-domain signal which extends outside of?=2 =2. Sampling the spectral motif in the frequency-domain in stage 3 corresponds to imaging the time-domain signal, yielding time-aliasing. because the successive frames are concatenated. As a result, sudden discontinuities in the amplitude can occur at the boundary between two frames with dierent patterns of modulation. These discontinuities must be kept below a very small level to remain unnoticeable. The choice of h(n), K and D inuences the amount of distortion in the following manner: h(n) should be chosen such that h(n) ~ (which results from band-limiting H()) has as little an amplitude outside of?=2 < n =2 (so time-aliasing is minimized) and as large an amplitude inside of?=2 < n =2 (so the normalizing stage amplies the aliased components as little as possible). Of course, these are conicting requirements! K must be as small as possible (to reduce the cost of synthesizing each sinusoidal spectral pattern), but the smaller K, the larger the time-aliasing and therefore the larger the distortion. D should also be chosen as small as possible (discarding samples means wasting computations) while keeping the amplication as small as possible in the normalizing stage Eq. (5). ote that the above discussion can be applied to H 0 (). Since its band-limited version H ~ 0 () corresponds in the time-domain to n h(n), ~ it is clear that time-aliasing is even more severe because the multiplicative term n gets larger outside of?=2 < n =2. Because of the large number of parameters, there doesn't seem to be an easy way to obtain an optimal choice of h(n), K and D. In practice, it is convenient to pick a value of K, for example K = 3. Then one can choose one of various standard candidate windows h(n). Finally, D can be picked to keep the distortion below a desired level. The Time Time Time C. Examples of parameter settings In this section, we compare three standard windows from the point of view of the output distortion. The windows are the Hanning window, the Kaiser window with a parameters of 9 and a Chebyshev equiripple window with a -80dB stopband level. To measure the amount of distortion, we measure the level of the components that time-alias back between?=2 < n =2. Specically, given a window h(n), H() is calculated by use of a highly oversampled Fourier transform: If h(n) is a -point window, we use a Fourier transform of size M = O to calculate H() (O can range from 8 to 32). Then H() ~ is obtained by zeroing H() for jj > K 2. Then h(n) ~ is calculating as the inverse Fourier transform of H(). ~ ow contrary to h(n), ~h(n) is not null outside of [?=2 =2], which is the reason why time-aliasing occurs. The amount of time-aliasing in [?=2 =2] is calculated by summing the absolute values of images falling into [?=2 =2] and normalizing by ~h(n): E(n) = k=o=2?1 X k=?o=2+1;k6=0 j ~ h(n)j j ~ h(n? k)j The absolute value in Eq. (8) yields a conservative upper bound for the aliasing distortion. Fig. 2 displays E(n) in db for the three windows, = 1024 and K = 3. Based on Fig. 2, one can set the number of discarded samples D so the level of aliasing is guaranteed to never exceed a given level. The Hanning window is shown to yield the worst results: to keep E(n) under a given level, more samples must be discarded for the Hanning window than for any of the two other windows. This can be explained by the fact that the Hanning window has fairly high side-lobes in the frequency domain, and discarding them by picking a small value for K yields a signicant amount of time-aliasing. In practice, it is sucient to ensure that E(n) is below -40dB to avoid any audible artifact. For the Kaiser window and the Chebyshev window, this requires discarding D = 90 samples from both ends of the frame. To achieve -50dB or more, the Kaiser window would require discarding 170 samples while the Chebyshev window would require discarding 110 samples. ote that for a larger FFT size, these results can simply be scaled proportionally. V. Discussion and conclusion To conclude this paper, we discuss the relative merits of both the standard overlap-add inverse Fourier transform synthesis technique and the new, non-overlapping technique with regard to computation cost, signal distortion, and ease of implementation. A. Computational cost As was mentioned above for the non-overlapping technique, if the Kaiser window is used, one can use D =10 (8)

6 6 IEEE TRASACTIOS O SPEECH AD AUDIO PROCESSIG, VOL. XX, O. Y, MOTH non-overlapping technique shows a very minimal amount. ote that as mentioned above, the overlapping technique 0 10 Hanning E in db Kaiser Chebyshev Sample index Fig. 2. E(n) in db for a Hanning window (solid line), a Kaiser window (dash-dotted line) and a Chebyshev window (dashed line) as a function of the sample index. Only the rst half of the frame is represented. and obtain high-quality results. This means that at each frame,? 2D = 0:8 samples are synthesized. For the overlapping technique =2 samples are synthesized at each frame (a 50% overlap must be used). As a result, for a given output length, about 62% fewer frames are calculated in the non-overlapping technique than in the overlapping technique. Counteracting this saving is the fact that each frame requires more calculations in the non-overlapping technique because the linear amplitude must be synthesized in the frequency domain, requiring the additional term H 0 () in Eq. (3). This amounts roughly to twice as many operations per sinusoid than in the overlapping technique where only the rst half of the equation is used. If the cost of the synthesis is dominated by the cost related to the sinusoidal pattern synthesis (Eq. (3)) (as opposed to the cost of the inverse Fourier transform), then each frame in the non-overlapping technique requires about twice as many computations than in the overlapping technique. In that case, the new technique ends up being about 25% more costly than the overlap-add technique. B. Signal distortion Because the short-term signals are concatenated rather than overlap-added, the new technique is free of the undesirable amplitude modulation that results from overlapadding sinusoids with dierent frequencies. This is a signicant advantage over the overlapping technique. Fig. 3 compares the outputs of the two algorithms, when synthesizing a constant-amplitude complex sinusoid with a sweeping frequency. The normalized frequency started at 2 0:015 radian/sample and increased at a rate of 2 5:10?6 radian/sample 2. The FFT size was = 1024, a Kaiser window with parameter = 9 was used in both techniques, and the parameters were K = 3 and D = 100. The gure shows the magnitude of the output complex sinusoid as a function of time. The overlap-add technique exhibits a very large amount of amplitude modulation while the Amplitude Time in sample Fig. 3. Comparison of the overlapping (top) and non-overlapping (bottom) synthesis techniques for the synthesis of one complex sinusoid with a sweeping frequency. The magnitude of the complex output is shown as a function of the sample index. can be improved to reduce such modulation problems [15], but the computational cost roughly doubles and the algorithm becomes much more complex. An important point is that contrary to the overlapping technique, the nonoverlapping technique requires the sinusoid amplitudes to start from 0 when the sinusoid rst appears, and end at 0 when the sinusoid disappears, to avoid any signal discontinuity. The overlapping technique relies on the overlapadd stage to achieve that, and consequently sinusoids can disappear from or appear in a frame without any adverse consequences. C. Ease of implementation The non-overlapping technique, if anything, is easier to implement than the overlapping technique, because the overlap-add stage is discarded. The additional term in Eq. (3) which codes the linear amplitude in the frequency domain is very similar to the rst one, and does not pose any specic implementation problems, other than requiring one additional table for H 0 (). By contrast, the technique described in [15] to solve amplitude modulation problems in the overlapping technique is signicantly more complex, requiring several tables to be stored for various frequencysweep factors, and an additional linear interpolation stage. D. Conclusion The technique presented in this paper diers from the standard technique in that the short-term signals are concatenated rather than overlap-added, and that sinusoids with linear amplitudes rather than constant amplitudes are synthesized in each frame. The technique oers a signicant quality improvement over the standard technique as it eliminates amplitude modulation problems arising from overlap-adding sinusoids with close frequencies. The new technique is slightly more costly (at most 25%) than the

7 LAROCHE: SYTHESIS OF SIUSOIDS VIA O-OVERLAPPIG IVERSE FOURIER TRASFORM 7 standard technique, but in high-quality synthesis applications the quality improvement can justify this small additional cost. References [1] L.B. Almeida and F.M. Silva, \Variable-frequency synthesis: an improved harmonic coding scheme," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 1984, pp { [2] R. J. McAulay and T. F. Quatieri, \Speech analysis/synthesis based on a sinusoidal representation," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, no. 4, pp. 744{754, Aug [3] E. B. George and M. J. T. Smith, \Analysis-bysynthesis/Overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones," J. Audio Eng. Soc., vol. 40, no. 6, pp. 497{516, [4] X. Serra and J. Smith, \Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition," Computer Music J., vol. 14, no. 4, pp. 12{24, Winter [5] R. J. McAulay and T. F. Quatieri, \Low-rate speech coding based on the sinusoidal model," in Advances in Speech Signal Processing, S. Furui and M. Sondhi, Eds., chapter 6, pp. 165{ 208. Marcel Dekker, [6] J.S. Marques, L.B. Almeida, and J.M. Tribolet, \Harmonic coding at 4.8 kb/s," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 1990, vol. 1, pp. 17{20. [7] M. Kahrs and K. Brandenburg, Applications of Digital Signal Processing to Audio and Acoustics, Kluwer, orwell, MA, [8] J. Snell, \Design of a digital oscillator which will generate up to 256 low distortion sine waves in real time," Computer Music J., vol. 1, no. 2, pp. 4{25, [9] Y. Potard, P. F. Baisnee, and J. B. Barriere, \Experimenting with models of resonance produced by a new technique for the analysis of impulsive sounds," in Proc. Intern. computer music conf., [10] J. Laroche and J.L. Meillier, \Multichannel excitation/lter modeling of percussive sounds with application to the piano," IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 329{344, Apr [11] J. Laroche, \Using resonant lters for the synthesis of timevarying sinusoids," Proc. 105th A.E.S Convention, San Francisco, 1998, preprint 4782 (F-6). [12] X. Rodet and P. Depalle, \A new additive synthesis method using inverse Fourier transform and spectral envelope.," in Proc. Intern. computer music conf., San Jose, CA, [13] X. Rodet and P. Depalle, \Spectral envelopes and inverse FFT synthesis," Proc. 93rd A.E.S Convention, San Francisco, 1992, preprint 3393 (H-3). [14] M. Goodwin and X. Rodet, \Ecient Fourier synthesis of nonstationary sinusoids," in Proc. Intern. computer music conf., San Francisco, CA, [15] M. Goodwin and A. Kogon, \Overlap-add synthesis of nonstationary sinusoids," in Proc. Intern. computer music conf., Ban, Canada, [16] M. Goodwin, Adaptive Signal Models: Theory, algorithms and audio applications, Kluwer International Seires in Engineering and Computer Science, orwell, MA, [17] J. Harris, \On the use of windows for harmonic analysis with the discrete Fourier transform," Proc. IEEE, vol. 66, no. 1, pp. 51{83, [18] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, Englewood Clis, ew Jersey, 1989.

Timbral Distortion in Inverse FFT Synthesis

Timbral Distortion in Inverse FFT Synthesis Mark Zadel Introduction Inverse FFT synthesis (FFT ) is a computationally efficient technique for performing additive synthesis []. Instead of summing partials