Synthesis Techniques. Juan P Bello

Synthesis Techniques Juan P Bello

Synthesis It implies the artificial construction of a complex body by combining its elements. Complex body: acoustic signal (sound) Elements: parameters and/or basic signals Synthesis Model / representation Sound Motivations: Reproduce existing sounds Reproduce the physical process of sound generation Generate new pleasant sounds Control/explore timbre

Oscillators Oscillators are used to generate a raw repeating signal/waveform. Frequency Amplitude osc Output signal Wavetable

Oscillators Can use any waveform stored in a memory list (wavetable) Called back whenever necessary (table look-up) Repetitive scanning (at a variable phase increment) produces a pitched sound Over-sampling and interpolation are combined to maximise its use

How can I generate new sounds? Trigger Envelope Pitch vibrato Cutoff freq Oscillator Filter Gain Sound Networks of basic elements synthesis techniques Two main types: linear and non-linear

Additive Synthesis It is based on the idea that complex waveforms can be created by the addition of simpler ones. It is a linear technique, i.e. do not create frequency components that were not explicitly contained in the original waveforms Commonly, these simpler signals are sinusoids (sines or cosines) with time-varying parameters, according to Fourier s theory: Amp 1 (t) Freq 1 (t) Amp 2 (t) Freq 2 (t) Σ! i= 0 ( 2# f t + ) s ( t) = A sin " i i i Amp N (t) Freq N (t)

Additive Synthesis: A Pipe Organ

Additive Synthesis Square wave: only odd harmonics. Amplitude of the n th harmonic = 1/n

Time-varying sounds According to Fourier, all sounds can be described and reproduced with additive synthesis. Even impulse-like components can be represented by using a short-lived sinusoid with infinite amplitude. Amp 1 (t) Freq 1 (t) Amp 2 (t) Freq 2 (t) Amp Freq Amp Freq Σ Amp N (t) Freq N (t) Amp Freq Additive synthesis is very general (perhaps the most versatile). Control data hungry: large number of parameters are required to reproduce realistic sounds

Analysis/Resynthesis Different techniques that employ a lossy parameterisation of a sound to facilitate its manipulation and reproduction The concept applies to additive synthesis, subtractive synthesis, combinations of the two, etc. Examples include the phase-vocoder and sinusoidal modelling

FFT/IFFT

Sinusoidal + Noise model (1) signal Sinusoidal modelling Tonal part - Residual (mostly transients)

Sinusoidal + Noise model (2)

SMS Transformations Filtering with arbitrary resolution Partial dependant frequency scaling

SMS Transformations Frequency stretching f i = f i " f ( i!1) stretch Spectral shape shift

Examples orig det stoch SMS trans flute guitar water

SMS Effects Content-dependent processing, e.g. real-time singing voice conversion (vocaloid)

Summary: additive synthesis Probably the most versatile synthesis method as any sound (old or new) can be represented Unusually accurate: even small variations can be reproduced Too much control data, and only changes in large amounts of this data bring perceptually-relevant sound modifications Thus, requires the use of analysis/resynthesis methodologies (e.g. phase vocoder, SMS, etc) to simplify control It is not well suited to deal with stochastic (impulse-like) components and highly transient signals

Subtractive Synthesis Is another linear technique based on the idea that sounds can be generated from subtracting (filtering out) components from a very rich signal (e.g. noise, square wave). Parameters Gain A Complex f Waveform Filter Amplifier Sound Its simplicity made it very popular for the design of analog synthesisers (e.g. Moog)

The human speech system The vocal chords act as an oscillator, the mouth/nose cavities, tongue and throat as filters We can shape a tonal sound ( oooh vs aaah ), we can whiten the signal ( sssshhh ), we can produce pink noise by removing high frequencies

Source-Filter model Subtractive synthesis can be seen as a excitation-resonator or source-filter model The resonator or filter shapes the spectrum, i.e. defines the spectral envelope

What is the spectral envelope? It is a smoothing of the spectrum that preserves its general form while neglecting its spectral line structure

Source-Filter model Whitening of the signal Transformations Envelope estimation Analysis Processing Synthesis

! = " = p k k a k z z P 1 ) ( Linear Predictive Coding (LPC) The input sample x(n) is extrapolated, i.e. approximated by a linear combination of past samples of the input signal: Where e(n) is the prediction error:! = " = # p k k k n x a n x n x 1 ) ( ) ˆ( ) (! = " " = " = p k k k n x a n x n x n x n e 1 ) ( ) ( ) ˆ( ) ( ) ( )] ( )[1 ( ) ( z P z X z E! =

Linear Predictive Coding (LPC) For synthesis, we just inverse the process : H ( z) H(z) G = 1! P( z) H(z) is an IIR filter known as the LPC filter which represents the spectral model of x(n).

Linear Predictive Coding (LPC) With optimal coefficients -> prediction error energy is minimised The higher the coefficient order p, the closer the approximation is to X(k) Thus the problem of linear prediction becomes the estimation of the set of coefficients a k from the input signal x(n). This can be efficiently solved using the Yule-Walker equations

Summary: subtractive synthesis Low-order filtering is very intuitive, hence easy to use. Most parameters directly map to psychophysical concepts (e.g. the frequency of the oscillator to pitch, the filter shaping to timbre). Our ears are used to sounds generated following these principles as this is the working principle of speech. That also impose limitations on the versatility of the approach. As in additive synthesis it requires the use of analysis-resynthesis techniques to control accurate sound simulations using a few parameters.

Amplitude modulation Non-linear technique, i.e. results on the creation of frequencies which are not produced by the oscillators. In AM the amplitude of the carrier wave is varied in direct proportion to that of a modulating signal. Amp m (t) Amp c (t) Freq m (t) Freq c (t) modulator carrier Bipolar -> Ring modulation Unipolar -> Amplitude modulation bipolar unipolar

Ring Modulation Let us define the carrier signal as: c( t) = Ac cos(! ct) And the (bipolar) modulator signal as: m( t) = A cos( t) The Ring modulated signal can be expressed as: Which can be re-written as: m! m s( t) = A cos " (! t) A cos( t) c c m! m A Am s( t) = c [ cos( [! ]) ([ ])] c "! m t + cos! c + m t 2! s(t) presents two sidebands at frequencies: ω c - ω m and ω c + ω m

Ring Modulation amp f c - f m f c + f m f c freq

Single-sideband modulation cosω c t s 1 (t) m(t) H ( 90 phase-shift # $ j % > j% ) = "! + j % < 0 0 sinω c t s 2 (t) s(t) H(jω) 1 H(jω) π/2 ω -π/2 ω

Single-sideband modulation M(ω) 1 S 1 (ω) --1/2 -ω c S 2 (ω) ω c ω ω --1/2 -ω c -ω c S (ω) --1 ω c ω c ω ω With changes of ω c the spectrum of m(t) will be shifted accordingly, so SSB modulation is also known as frequency shifting

Amplitude Modulation Let us define the carrier signal as: c( t) = cos(! t) c And the (unipolar) modulator signal as: m( t) = A + A cos( t) The amplitude modulated signal can be expressed as: s( t) = + Which can be re-written as: s( t) c m! m [ A A cos(! t) ] cos( t) c m m! c A = A cos c c m cos 2 m (! t) + [ cos( [! "! ] t) + ([! + ] t) ] c c! m s(t) presents components at frequencies: ω c, ω c - ω m and ω c + ω m

Modulation index In modulation techniques a modulation index is usually defined such that it indicates how much the modulated variable varies around its original value. For AM this quantity is also known as modulation depth: If β = 0.5 then the carrier s amplitude varies by 50% around its unmodulated level. For β = 1 it varies by 100%. β > 1 causes distortion and is usually avoided! = A A m c

C/M frequency ratio Lets define the carrier to modulator frequency ratio c/m (= ω c / ω m ) for a pitched signal m(t) If c/m is an integer n, then ω c, and all present frequencies, are multiples of ω m (which will become the fundamental) If c/m = 1/n, then ω c will be the fundamental When c/m deviates from n or 1/n (or more generally, from a ratio of integers), then the output frequencies becomes more inharmonic Example of C/M frequency variation

Summary: AM synthesis Easy to implement and extremely low computational cost A few parameters with direct control on the sonic output: A m affects the depth of change of A c (modulation depth), f m affects the rate of change of A c and c/m determines the perceived frequency of the sound Requires caution: fc+fm exceeding fs/2 causes aliasing, while small fc-fm may not be audible or cause inharmonicity Little possibilities given the simplicity of the method (not enough spectral complexity to synthesise rich timbres)

Frequency Modulation Frequency modulation (FM) is a form of modulation in which the frequency of a carrier wave is varied in direct proportion to the amplitude variation of a modulating signal. Amp m (t) Freq m (t) Amp c (t) Freq c (t) modulator carrier When the frequency modulation produces a variation of less than 20Hz this results on a vibrato.

Frequency Modulation Let us define the carrier signal as: c ( t) = A c cos( " ct +! c And the modulator signal as: m ( t) = A m sin( " mt +! m) The Frequency modulated signal can be expressed as: c From this equation it can be noted that it is NOT the frequency which is being modulated but the phase of the carrier. Nevertheless, any modulation of phase implies a frequency modulation, since ) (" t +! + A (" t + ) s ( t) = A cos sin! c c!t! = " # m m m

Frequency Modulation Let us re-write the expression of the modulated signal as: s( t) = cos + Where A c = 1, φ c = φ m = 0 and A m is renamed β for simplicity. Using phasor analysis and the 2-sided Laurent expansion: we can derive the expression: e j% sin( $ m t) "! k = #" (! t " sin( t ) = c! m [( $ + k ) t] s ( t) = J (% ) cos $ k "! k = #" Where J k (β) are known as the Bessel functions of the first kind, k is their integer order and β is the argument. J k (β) is real, and J -k (β) = (-1) k J k (β). J k (% ) e c jk$ m t m

Frequency Modulation If β 0 then the FM spectrum contains infinite sidebands at positions ω c ± kω m. The amplitudes of each pair of sidebands are given by the J k coefficients which are functions of β J k β k

Modulation index As in AM we define a FM modulation index that controls the modulation depth. In FM synthesis this index is equal to β, the amplitude of the modulator and is directly proportional to Δf. As we have seen the value of β determines the amplitude of the sidebands of the FM spectrum Furthermore the amplitude decreases with the order k. Thus, although theoretically the number of sidebands is infinite, in practice their amplitude makes them inaudible for higher orders. The number of audible sidebands is a function of β, and is approximated by 2β+1 Thus the bandwidth increases with the amplitude of m(t), like in some real instruments

C/M frequency ratio The ratio between the carrier and modulator frequencies c/m is relevant to define the (in)harmonic characteristic of s(t). The sound is pitched (harmonic) if c/m is a ratio of positive integers: ω c / ω m = N c / N m E.g. for f c = 800 Hz and f m = 200 Hz, we have sidebands at 600Hz and 1kHz, 400Hz and 1.2kHz, 200Hz and 1.4kHz, etc Thus the fundamental frequency of the harmonic spectrum responds to: f 0 = f c / N c = f m / N m If c/m is not rational an inharmonic spectrum is produced If f 0 is below the auditory range, the sound will not be perceived as having definitive pitch.

Sideband reflection For certain values of the c/m ratio and the FM index β, extreme sidebands will reflect into the audible spectrum (aliasing) The modulation may generate negative frequencies. Depending on the phase of the carrier and the modulator, we might end up with an expansion containing only sines. As: sin(-α) = -sin(α), the lower sidebands might reflect back into the spectrum in 180-degree phase inverted form: a half cycle (π) phase shift implying negative amplitude. These reflected sidebands could add richness to the spectrum Also they could cancel out components if they overlap exactly with positive components.

FM examples

Summary: FM synthesis Cost efficient and easy to implement Due to a strong mathematical formulation, the effects of parameter change are, in a sense, easy to predict. c/m determines the location of frequency components and β determines their amplitude prominence It is well-suited for original synthesis The synthesis procedure bears no resemblance to the formation of sound in nature. Hence, the method is poorly suited to simulation of acoustical instruments It has a distinctive sound which is difficult to escape (an possibly annoying in the long run) mostly due to the symmetric spectrum

Waveshaping synthesis Also known as non-linear distortion It is a synthesis method where the sound signal is passed through a function (a distortion box), such that the function w maps any input value x in [-1,1] to an output value w(x) in the same range. W is the shaping function. ω A x W W(x) x The value of A is of great importance as the scaling of the input signal makes it reference to different regions of W.

Waveshaping synthesis If W is a straight diagonal line from -1 to 1, then the process is linear, otherwise x is distorted by W.

Waveshaping synthesis Waveshaping is amplitude sensitive. This is useful to simulate the behaviour of acoustic instruments sounds: the harder an instrument is played the richer its spectrum An input signal with time-varying amplitude, produces an output whose spectrum changes according to that variation. A variation in the time-domain is translated into a variation in the frequency domain. Thus, waveshaping produces a variety of waveforms with simple amplitude variations at the input (very efficient). 1-1 1-1

Chebyshev functions LeBrun (1979) and Arfib (1979) demonstrated that it is possible to predict the output spectrum of the waveshaped signal if x is a cosine wave and W belongs to the Chebyshev family of polynomials The k th Chebyshev polynomial of the first kind T k is defined through the identity: T k cos(!) = cos( k! ) Thus, if we apply the k th Chebyshev polynomial to a sinusoid we obtain a cosine wave at the k th harmonic. Each Chebyshev polynomial, when used as W, produces a particular harmonic of x. A weighted combination of Chebyshev polynomials as W, will produce a corresponding harmonic mixture, E.g.: T 0 + T 0 +.5T2 0. 25 4

Closing remarks Modulation methods Fewer Control Parameters Subtractive synthesis Increasingly General Additive Synthesis