Sinusoidal Modeling. summer 2006 lecture on analysis, modeling and transformation of audio signals

Similar documents
Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation

Frequency slope estimation and its application for non-stationary sinusoidal parameter estimation

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

ADAPTIVE NOISE LEVEL ESTIMATION

Adaptive noise level estimation

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

AM/FM Rate Estimation and Bias Correction for Time-Varying Sinusoidal Modeling

HIGH ACCURACY FRAME-BY-FRAME NON-STATIONARY SINUSOIDAL MODELLING

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

A NEW APPROACH TO TRANSIENT PROCESSING IN THE PHASE VOCODER. Axel Röbel. IRCAM, Analysis-Synthesis Team, France

Timbral Distortion in Inverse FFT Synthesis

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Signal Characterization in terms of Sinusoidal and Non-Sinusoidal Components

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

GENERALIZATION OF THE DERIVATIVE ANALYSIS METHOD TO NON-STATIONARY SINUSOIDAL MODELING

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

METHODS FOR SEPARATION OF AMPLITUDE AND FREQUENCY MODULATION IN FOURIER TRANSFORMED SIGNALS

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Identification of Nonstationary Audio Signals Using the FFT, with Application to Analysis-based Synthesis of Sound

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

REAL-TIME BROADBAND NOISE REDUCTION

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

IMPROVED CODING OF TONAL COMPONENTS IN MPEG-4 AAC WITH SBR

Sound Synthesis Methods

A GENERALIZED POLYNOMIAL AND SINUSOIDAL MODEL FOR PARTIAL TRACKING AND TIME STRETCHING. Martin Raspaud, Sylvain Marchand, and Laurent Girin

Lecture 5: Sinusoidal Modeling

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Ricean Parameter Estimation Using Phase Information in Low SNR Environments

IMPROVED HIDDEN MARKOV MODEL PARTIAL TRACKING THROUGH TIME-FREQUENCY ANALYSIS

Advanced audio analysis. Martin Gasser

Carrier Frequency Offset Estimation in WCDMA Systems Using a Modified FFT-Based Algorithm

Final Exam Practice Questions for Music 421, with Solutions

TIME-FREQUENCY ANALYSIS OF MUSICAL SIGNALS USING THE PHASE COHERENCE

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Short-Term Sinusoidal Modeling of an Oriental Music Signal by Using CQT Transform

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

ON THE VALIDITY OF THE NOISE MODEL OF QUANTIZATION FOR THE FREQUENCY-DOMAIN AMPLITUDE ESTIMATION OF LOW-LEVEL SINE WAVES

THE BEATING EQUALIZER AND ITS APPLICATION TO THE SYNTHESIS AND MODIFICATION OF PIANO TONES

Enhanced Waveform Interpolative Coding at 4 kbps

FFT 1 /n octave analysis wavelet

Measurement of RMS values of non-coherently sampled signals. Martin Novotny 1, Milos Sedlacek 2

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Understanding Digital Signal Processing

SAMPLING THEORY. Representing continuous signals with discrete numbers

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

Drum Transcription Based on Independent Subspace Analysis

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

Estimation of Sinusoidally Modulated Signal Parameters Based on the Inverse Radon Transform

LOCAL MULTISCALE FREQUENCY AND BANDWIDTH ESTIMATION. Hans Knutsson Carl-Fredrik Westin Gösta Granlund

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

FFT analysis in practice

Application of Fourier Transform in Signal Processing

Complex Sounds. Reading: Yost Ch. 4

VOICE QUALITY SYNTHESIS WITH THE BANDWIDTH ENHANCED SINUSOIDAL MODEL

Instantaneous Higher Order Phase Derivatives

PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation

Proceedings of the 5th WSEAS Int. Conf. on SIGNAL, SPEECH and IMAGE PROCESSING, Corfu, Greece, August 17-19, 2005 (pp17-21)

ME scope Application Note 01 The FFT, Leakage, and Windowing

Lecture 9: Time & Pitch Scaling

Synthesis Techniques. Juan P Bello

I-Hao Hsiao, Chun-Tang Chao*, and Chi-Jo Wang (2016). A HHT-Based Music Synthesizer. Intelligent Technologies and Engineering Systems, Lecture Notes

L19: Prosodic modification of speech

INSTANTANEOUS FREQUENCY ESTIMATION FOR A SINUSOIDAL SIGNAL COMBINING DESA-2 AND NOTCH FILTER. Yosuke SUGIURA, Keisuke USUKURA, Naoyuki AIKAWA

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

Nonuniform multi level crossing for signal reconstruction

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

EFFECTS OF PHASE AND AMPLITUDE ERRORS ON QAM SYSTEMS WITH ERROR- CONTROL CODING AND SOFT DECISION DECODING

ON THE RELATIONSHIP BETWEEN INSTANTANEOUS FREQUENCY AND PITCH IN. 1 Introduction. Zied Mnasri 1, Hamid Amiri 1

Chapter 9. Chapter 9 275

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

Instantaneous Frequency and its Determination

Exploiting Spectral Leakage for Spectrogram Frequency Super-resolution

Speech Enhancement for Nonstationary Noise Environments

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

ADDITIVE synthesis [1] is the original spectrum modeling

New Features of IEEE Std Digitizing Waveform Recorders

Speech Signal Enhancement Techniques

Frequency Domain Representation of Signals

Short-Time Fourier Transform and Its Inverse

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

A Full-Band Adaptive Harmonic Representation of Speech

Prewhitening. 1. Make the ACF of the time series appear more like a delta function. 2. Make the spectrum appear flat.

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Module 9: Multirate Digital Signal Processing Prof. Eliathamby Ambikairajah Dr. Tharmarajah Thiruvaran School of Electrical Engineering &

8.3 Basic Parameters for Audio

FREQUENCY-DOMAIN TECHNIQUES FOR HIGH-QUALITY VOICE MODIFICATION. Jean Laroche

Discrete Fourier Transform (DFT)

AM-FM demodulation using zero crossings and local peaks

A hybrid phase-based single frequency estimator

Improved Detection by Peak Shape Recognition Using Artificial Neural Networks

Interpolation Error in Waveform Table Lookup

The Effect of Quantization Upon Modulation Transfer Function Determination

Lecture 6: Nonspeech and Music

The impact of High Resolution Spectral Analysis methods on the performance and design of millimetre wave FMCW radars

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Transcription:

Sinusoidal Modeling summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis Team 25th August 2006 KW - TU Berlin/

AMT Part VI: Sinusoidal Modeling 1/53 1 Sinusoids plus noise sound modeling 1.1 Sinusoids 1.2 Noise 2 Overview over the sinusoidal analysis/synthesis model 3 Peak detection 4 Parameter estimation 4.1 stationary sinusoids 4.2 DFT interpolation 5 Estimator performance evaluation KW - TU Berlin/

AMT Part VI: Sinusoidal Modeling 2/53 5.1 Cramer Rao bound 6 non stationary sinusoids 6.1 Bias in th QIFFT method 6.2 slope estimation 6.3 Alternative approach 6.4 Experimental investigation of the bias correction effect 7 Sinusoidal continuation problem 8 Parameter interpolation KW - TU Berlin/

AMT Part VI: Sinusoidal Modeling 3/53 1 Sinusoids plus noise sound modeling In the previous lectures we have been using a generic representation of sound in terms of the Fourier spectrum. Most of the algorithms so far did not make use of a representation of the sound in terms of an explicit signal model. A signal model was implicitly used for example in the phase vocoder time stretching algorithm [Röb06c, section 3] and for fundamental frequency estimation [Röb06e]. Higher level of sound representation try to distinguish the perceptually different components: sinusoids and noise. In the following we will see how we may represent a sound signal by means of the sinusoids plus noise signal model. An introduction can be found in [Ser97], sound transformation applications are explained in [ABLS02] open source software library for sinusoidal modeling and transformation can be found at http://www.cerlsoundgroup.org/loris/ and http://clam.iua.upf. edu/

AMT Part VI: Sinusoidal Modeling 4/53 1.1 Sinusoids Why sinusoids? real world excitation signals (source filter model) are often periodic such that they can be represented by means of a superposition of harmonically related sinusoids. free oscillation of physical systems can generally be characterized by means of a superposition of modes, where each mode contributes a sinusoid with characteristic frequency to the output signal. if modes are not too dense the related sound will be perceived as rather clean. Each sinusoidal component is identified by its index k and each individual component has time varying amplitude a k (n) and time varying phase φ k (n). A single sinusoidal component can be represented as P k (n) = a k (n) cos(φ k (n)). (1) or in complex notation P k (n) = a k (n)e φ k (n). (2)

AMT Part VI: Sinusoidal Modeling 5/53 For a time continuous sinusoids the frequency is the time derivative of the phase. It is convenient to define the frequency of the discrete time sinusoid as the phase difference of subsequent samples. ω k (n) = φ k(n + 1) φ k (n 1) 2 (3) Without any further constraints each sound signal could be interpreted as a sinusoid if we would set a 0 (n) = s(n) and φ 0 (n) = 0. The idea however is that the sinusoidal components are perceived as individual entities. As a vague constraint for sinusoidal components it is required that the amplitude a k (n) and the derivative of the unwrapped phase with respect to time of the related continuous time phase φ k (t) t is sufficiently small such that the perceived quality is close to a stationary sinusoid.

AMT Part VI: Sinusoidal Modeling 6/53 The complete set of sinusoidal components of a signal s(n) are represented by means of the superposition s(n) = X k P k (n) = X k a k (n) cos(φ k (n)). (4)

AMT Part VI: Sinusoidal Modeling 7/53 1.2 Noise Having detected all sinusoidal components with parameters a k (n) and φ k (n) we may subtract them from the signal. The remaining signal is called the residual. The residual combines signal noise and modeling error. Noise/sinusoid classification For a sinusoid plus noise model a classification procedure is required that distinguishes sinusoidal and noise peaks of the signal spectrum. Common techniques are based on amplitude level and smoothness of the amplitude and frequency trajectory. In that case sinusoids forming amplitude or frequency trajectories that are not sufficiently smooth are removed from the set of sinusoids. For harmonic sounds the sinusoidal selection is simplified because the frequency positions where sinusoids are expected are confined to the integer multiples of the fundamental frequency. Their exist few algorithms that allow to distinguish between spectral peaks representing sinusoids and noise. Common techniques are based on features that are derived

AMT Part VI: Sinusoidal Modeling 8/53 from the form of the phase and amplitude spectrum [RZR04, HMW01, Rod97, Tho82].

AMT Part VI: Sinusoidal Modeling 9/53 2 Overview over the sinusoidal analysis/synthesis model pre processing : the sinusoidal analysis is performed on the STFT of the signal. The STFT parameters window size, DFT size and frame offset have to be chosen such that the interesting sinusoids are resolved [Röb06b]. peak detection : each STFT frame is analyzed to find the spectral peaks (section 3). sinusoidal parameter estimation : for each peak that has been selected the sinusoidal parameters are estimated (section 4). sinusoidal peak continuation : for synthesis of the sinusoids a complete trajectory of amplitude, frequency, and phase is required. The STFT provides values only at a grid given by the hop size of the analysis. The values in between the frames have to be interpolated and, therefore, peaks in consecutive frames have to be matched (connected) to be able to create complete trajectories. residual creation : if a residual signal is desired, the sinusoidal parameters for all sinusoids have to be interpolated form frame rate to sample rate and the sinusoids have to be synthesized and subtracted from the signal.

AMT Part VI: Sinusoidal Modeling 10/53 noise model : A dedicated noise model can be fitted to the residual spectrum. Common choice is based on a source filter model [Röb06d], using a spectral envelope of the residual and excitation using white noise.

AMT Part VI: Sinusoidal Modeling 11/53 3 Peak detection It is a fundamental property of a sinusoid that it will create a prominent local peak in the spectrum, a spectral peak is a local maximum of the magnitude spectrum, for each spectral frame the spectral peaks are determined by means of searching these local maxima, amplitude thresholds or other classification schemes may be used to prevent the need to process a large number of peaks that later are qualified as noise,

AMT Part VI: Sinusoidal Modeling 12/53 4 Parameter estimation Having selected the candidate peaks one needs to determine the parameters of the related sinusoids. The minimum set of parameters comprises: amplitude and frequency. In many cases phase is estimated as well. Proper phase estimation is essential to be able to subtract the sinusoid from the sound.

AMT Part VI: Sinusoidal Modeling 13/53 4.1 stationary sinusoids Remember: DFT spectrum of stationary sinusoid s(n) = e j(ωn+φ) (5) using analysis window v(n) is given by the window spectrum V (ω) moved to the location of the sinusoid frequency [Röb06a, section 4.1] X(w) = (e j((m+m 1 2 )Ω+φ) ) (e j M 1 2 ω ) V (ω Ω). (6) Due to linearity of DFT transformation the result for sinusoidal amplitude a(n) = A multiplies the result by A. Parameter estimate for a stationary sinusoid in noise: frequency : frequency location of the maximum of the peak ω 0. amplitude : amplitude value at location ω 0 of the spectrum divided by the maximum of

AMT Part VI: Sinusoidal Modeling 14/53 the spectrum of the analysis window. From FT of the analysis window we find max( V (ω) ) = X n=0 M 1v(n) (7) phase : estimated from the phase spectrum at position ω 0. Attention, remove the phase trend first!!! This parameter estimate is assigned to the center of the analysis window. It has been shown that the procedure above implements a maximum likelihood estimate MLE of the sinusoidal parameters. MLE: parameter values that create observed signal with maximum probability.

AMT Part VI: Sinusoidal Modeling 15/53 4.2 DFT interpolation The MLE procedure above fails for the DFT spectrum because the maximizer of the spectrum is always confined to the bin positions. Bin positions do not align with sinusoidal frequencies. Solutions zero padding : increase analysis frame by means of adding zeros after windowing. Zero padding decreases the frequency distance between bins. Processing time scales with DFT size N according to N log(n). zero padding is rather costly. Quadratic interpolation of the DFT spectrum : QIFFT select maximum bin and the two direct neighbors. select perform a second order (quadratic) interpolation of log-amplitude spectrum and unwrapped phase spectrum. apply the parameter estimation procedure to the quadratically interpolated peak spectrum. In the real world applications both solutions are mixed. According to Taylor series ap-

AMT Part VI: Sinusoidal Modeling 16/53 proximation the error of the quadratic interpolation will become smaller with a smaller the distance of the supporting points to the maximum.

AMT Part VI: Sinusoidal Modeling 17/53 5 Estimator performance evaluation The quantitative evaluation is usually performed by means of parameter estimation from single sinusoids in noise. The estimation error is shown as a function of the SNR. Two error contributions, bias and variance, are distinguished. Denote P and unknown parameter to be estimated and ˆP the estimate that an estimator F will produce. Then we can define the bias as: B F = E( ˆP ) P, (8) where E() denotes the expected value, generally the sample mean. average or systematic error of the estimator. The variance is then defined as The bias is the σ 2 F = E(( ˆP E( ˆP )) 2 ). (9) It tells us about the variation of the estimate around its average value.

AMT Part VI: Sinusoidal Modeling 18/53 The mean squared error MSE can now decomposed into bias and variance: MSE(F )= E(( ˆP P ) 2 ) = 1 L = 1 L = 1 L LX ( ˆP (n) E( ˆP ) + E( ˆP ) P ) 2 (10) n=0 LX ( ˆP (n) E( ˆP ) + B F ) 2 (11) n=0 LX (( ˆP (n) E( ˆP )) 2 + 2( ˆP (n) E( ˆP ))B F + B 2 F ) (12) n=0 = σ 2 n + B2 F + 2B F (( 1 L LX ˆP (n)) E( ˆP )) (13) n=0 = σ 2 F + B2 F + 2B F (E( ˆP ) E( ˆP )) (14) = σ 2 F + B2 F (15) This tells us that the mean squared error can be decomposed into the squared bias and

AMT Part VI: Sinusoidal Modeling 19/53 the variance. The squared bias as the average error is the indicator for systematic errors. The variance is the indicator for noise sensitivity.

AMT Part VI: Sinusoidal Modeling 20/53 5.1 Cramer Rao bound The Cramer-Rao theorem provides a lower bound for the variance of an unbiased estimator. An unbiased estimator is an estimator for that B F = 0. If we denote the Cramer Rao bound of the estimation of parameter λ as CRB(ˆλ) and if σˆλ is the variance of an estimator that provides estimates for variable λ then this variance is bounded by the Cramer-Rao bound σ 2 F CRB(ˆλ) (16) The Cramer Rao bound is a function of the Fisher information of the probability distribution of the data x given the parameter λ P (x λ) [Kay88]. The Cramer Rao bounds for sinusoidal parameter estimation for the case of a single stationary complex exponential of length N and amplitude A in stationary complex white Gaussian noise with variance σ z s(n) = Ae j(wn+φ) + z(n) (17)

AMT Part VI: Sinusoidal Modeling 21/53 are [RB98]: Amplitude: CRB(Â) = σ z N (18) Frequency: CRB(ŵ) = 6σ2 z A 2 N 3 (19) Phase: CRB( ˆφ) = σ 2 z 2NA 2 (20) (21) The bounds decrease with increasing observation length and with decreasing noise level.

AMT Part VI: Sinusoidal Modeling 22/53 20 amplitude estimation (2D=0.00 2π/M 2 ) 0 20 amp error [db] 40 60 80 CRB QIFFT rect FFT OV=2 100 QIFFT Hann FFT OV=1 QIFFT Hann FFT OV=0 QIFFT Hann FFT OV=2 120 40 20 0 20 40 60 80 SNR [db] Figure 1: Estimation error and Cramer Rao bound for estimation of sinusoidal amplitude using QIFFT with different zero padding and different analysis windows. (window length M=1000, FFT size N = 2 nextpow2(m)+ov ).

AMT Part VI: Sinusoidal Modeling 23/53 As first example consider an experiment that evaluates different zero padding factors and different analysis windows for the estimation of the amplitude. The axis of the CRB graphs display the SNR as x-axis such that moving to the right will decrease the noise variance. On the y-axis the MSE of the error of the estimator is displayed. The error curves can be divided into three regions. middle section: the error follows the CRB (the error is dominated by the variance) curves are close to the CRB (estimator is rather efficient) left section: section the estimator variance increases stronger than the CRB, threshold effects (noise peaks are selected) right section: with decreasing noise the variance part of the MSE will fall below the bias estimator errors saturate at a fixed level given by the estimator bias,

AMT Part VI: Sinusoidal Modeling 24/53 Conclusion The present curves show clearly that the bias decreases with the zero padding factor (interpolation errors become smaller). Moreover the rectangular window has larger bias then the Hanning window because the mainlobe of the rectangular window is narrower and less well approximated by a quadratic function. Note, however, that the rectangular window is closer to the CRB in the middle section. This shows that the down weighting that the other windows apply to the border regions of the data decreases estimator efficiency.

AMT Part VI: Sinusoidal Modeling 25/53 20 phase estimation (2D=0.00 2π/M 2 ) 0 20 phase error [db] 40 60 80 CRB QIFFT rect FFT OV=2 100 QIFFT Hann FFT OV=1 QIFFT Hann FFT OV=0 QIFFT Hann FFT OV=2 120 40 20 0 20 40 60 80 SNR [db] Figure 2: Estimation error and Cramer Rao bound for estimation of sinusoidal phase using QIFFT with different zero padding and different analysis windows. (window length M=1000, FFT size N = 2 nextpow2(m)+ov ).

AMT Part VI: Sinusoidal Modeling 26/53 The phase estimation error does not show any bias. Because the phase is constant within the peak a small error of the frequency estimator will not change the phase estimate. The threshold effects show a maximum error. This is due to the use of the 2π phase range which cannot create errors larger than ±π.

AMT Part VI: Sinusoidal Modeling 27/53 0 freq estimation (2D=0.00 2π/M 2 ) freq error [db] 20 40 60 80 100 120 CRB 140 QIFFT rect FFT OV=2 QIFFT Hann FFT OV=1 160 QIFFT Hann FFT OV=0 QIFFT Hann FFT OV=2 180 40 20 0 20 40 60 80 SNR [db] Figure 3: Estimation error and Cramer Rao bound for estimation of sinusoidal frequency using QIFFT with different zero padding and different analysis windows. (window length M=1000, FFT size N = 2 nextpow2(m)+ov ).

AMT Part VI: Sinusoidal Modeling 28/53 The frequency estimation error is similar to the amplitude estimation error with bias for high SNR and threshold for low SNR. The main difference is that the frequency error shows largest distance between the CRB and the estimator MSE. This due to the fact that the frequency estimation is the central part of the algorithm. Phase and amplitude use the frequency to determine their estimates. For amplitude and for frequency however, the final estimate does not change strongly with the frequency position such that they are less influenced by noise. Due to the flat top of the peak however, the frequency estimate is influenced much more by the noise such that it shows the largest sensitivity to noise. Note that the sensitivity stronger for Hanning windows which have a mainlobe with a larger plateau which is easily affected by noise;

AMT Part VI: Sinusoidal Modeling 29/53 6 non stationary sinusoids Real world signals are never stationary. Non-stationary sinusoids have been studied either with linear AM/FM s(n) = (A + a(n n 0 ))e i(φ+ω(n n 0 )+D(n n 0 )2), (22) or with linear FM and exponential AM s(n) = Ae a(n n 0 ) e i(φ+ω(n n 0 )+D(n n 0 )2). (23) To understand the impact of the time varying parameters a mathematical study of the spectral peak and its local maximum as a function of the parameters and the analysis window is required. For the complete linear model there exist only approximate solutions if the analysis

AMT Part VI: Sinusoidal Modeling 30/53 window is Gaussian [Pee01]. For the exponential amplitude model and a Gaussian window a complete mathematical solution is possible [AS05]. We reproduce the results for the exponential amplitude evolution and a Gaussian analysis window w(n) = 1 r e n2 p 2σ 2 2 = 2πσ π e pn, (24) with the shortcut notation p = 1 2σ2. Following [AS05] the FT spectrum is X(ω) = X n= w(n)s(ω)e jωn = e u(ω)+jv(ω). (25) The log amplitude spectrum u(ω) is given by u(ω) = log(a) + a2 4p 1 4 log(1 + (D p )2 ) p 4(p 2 + D 2 ) [ω Ω ad p ]2, (26)

AMT Part VI: Sinusoidal Modeling 31/53 and the phase spectrum v(ω) is given by v(ω) = φ + a2 4D + 1 2 atan(d p ) D 4(p 2 + D 2 ) [ω Ω + pa D ]2 (27) Slightly different results are obtained by means of second order Taylor approximation of the FT spectrum of eq. (22). Note, that the amplitude and the phase spectrum of the exponential AM linear FM chirp are exactly quadratic functions such that the QIFFT method can be used to estimate all parameters from the three central bins of the main lobe of the peak.

AMT Part VI: Sinusoidal Modeling 32/53 6.1 Bias in th QIFFT method Apply QIFFT method to the log amplitude an phase spectra to understand the bias. Frequency: local maximum is at the amplitude at that position is ˆΩ = max ω u(ω) = Ω + ad p (28) u(ˆω) = log(â) = elog(a)+a2 4p 1 4 log(1+(d p )2 ), (29) and the phase estimate is v(ˆω) = ˆφ = φ a2 D 4p + 1 2 2 atan(d ). (30) p In the general case these estimates do not match the correct values.

AMT Part VI: Sinusoidal Modeling 33/53 The frequency estimate is biased if frequency slope D and the log amplitude slope a are present, the amplitude estimate  is biased if frequency slope D or log amplitude slope a are present, the phase estimate ˆφ is biased whenever the frequency slope is not zero. Note, that amplitude and phase bias may significantly increase the residual energy if no bias correction scheme is applied. This is especially true for vibrato signals.

AMT Part VI: Sinusoidal Modeling 34/53 6.2 slope estimation The advantage of the analytic results is that the bias can simply be corrected as soon as log amplitude slope and frequency slope are estimated. The first and second order derivatives with respect to ω of the log amplitude spectrum u(ω) and the phase spectrum v(ω) at the position of the local maximum are v (ˆΩ) = a 2p, (31) u (ˆΩ) = p 2(p 2 + D 2 ), (32) v (ˆΩ) = D 2(p 2 + D 2 ). (33) From these equations [AS05] have derived an estimate for a and D as follows â = 2pv (ˆΩ) (34)

AMT Part VI: Sinusoidal Modeling 35/53 These estimates may be used to correct the estimates above. ˆD = p v (ˆΩ) u (ˆΩ). (35) To be able to apply the bias correction scheme to non Gaussian windows a linear scaling of the correction factors has been proposed in [AS05]. Scaling factors have been optimized using signals with slight or medium modulation.

AMT Part VI: Sinusoidal Modeling 36/53 6.3 Alternative approach For eq. (22) the bias disappears completely whenever D = 0. The frequency slope estimator that has been derived in [Pee01] for eq. (22) is the same as the frequency slope estimator for the exponential AM signal. Experimentally one obtains could frequency slope estimation with this estimator even for non Gaussian windows. in this case the effective p of the non Gaussian window is simply the std deviation of the window itself. estimation of the frequency slope using the method shown above demodulation of the signal related to the spectral peak by means of multiplication with a complex exponential chirp with frequency slope D. application of the QIFFT method. s d (n) = e Dn2, (36)

AMT Part VI: Sinusoidal Modeling 37/53 approximate demodulation can be obtained by means of convolution of the spectral peak to be analyzed and the main lobe of the deconvolution signal s d (n).

AMT Part VI: Sinusoidal Modeling 38/53 6.4 Experimental investigation of the bias correction effect Experimental investigation of the estimation errors for different methods using the signal model in eq. (22) and randomly selected signal parameters can be used to compare estimator performance. Range of randomly selected signal parameters (uniform distribution): frequency: Ω selected from [0.1, 0.3]π), phase φ selected from [ π, π], amplitude slope a selected from [ 1, 1]A/M, frequency slope D selected from [ 2, 2]2π/M 2 ( frequency changes within a window by not more then 2π M ). Note that for real world signals with vibrato frequency slope increases linearly with partial number.

AMT Part VI: Sinusoidal Modeling 39/53 80 freq slope estimation (2D=[ 4.00,4.00]2π/M 2 ) 100 freq slope error [db] 120 140 160 CRB PR Gauss 180 AS Gauss AS Hann DE Hann 200 DE Gauss DE sltest 220 20 10 0 10 20 30 40 50 60 70 80 SNR [db] Figure 4: Estimation error and Cramer Rao bound for estimation of frequency slope for using different analysis windows and different estimation procedures. (window length M=1001, FFT size N = 4096).

AMT Part VI: Sinusoidal Modeling 40/53 0 amplitude estimation (2D=[ 4.00,4.00]2π/M 2 ) 20 amp error [db] 40 60 CRB PR Gauss 80 AS Gauss AS Hann DE Hann 100 DE Gauss DE sltest QIFFT Hann 120 20 10 0 10 20 30 40 50 60 70 80 SNR [db] Figure 5: Estimation error and Cramer Rao bound for estimation of amplitude for using different analysis windows and different estimation procedures. (window length M=1001, FFT size N = 4096).

AMT Part VI: Sinusoidal Modeling 41/53 0 freq estimation (2D=[ 4.00,4.00]2π/M 2 ) freq error [db] 20 40 60 80 100 CRB PR Gauss 120 AS Gauss AS Hann 140 DE Hann DE Gauss 160 DE sltest QIFFT Hann 180 20 10 0 10 20 30 40 50 60 70 80 SNR [db] Figure 6: Estimation error and Cramer Rao bound for estimation of frequency for using different analysis windows and different estimation procedures. (window length M=1001, FFT size N = 4096).

AMT Part VI: Sinusoidal Modeling 42/53 20 phase estimation (2D=[ 4.00,4.00]2π/M 2 ) 0 20 phase error [db] 40 CRB 60 PR Gauss AS Gauss 80 AS Hann DE Hann DE Gauss 100 DE sltest QIFFT Hann 120 20 10 0 10 20 30 40 50 60 70 80 SNR [db] Figure 7: Estimation error and Cramer Rao bound for estimation of phase for using different analysis windows and different estimation procedures. (window length M=1001, FFT size N = 4096).

AMT Part VI: Sinusoidal Modeling 43/53 7 Sinusoidal continuation problem After the estimation of the sinusoidal parameters for the spectral peaks in the individual frames these peaks have to be connected to form sinusoidal trajectories. There have been proposed many algorithms to find proper peak connections. Because different situations ( vibrato, polyphony, noise level) require different approaches no algorithm is best for all situations. The original algorithm has been proposed in [MQ86]. It is based on the simple idea to connect each peak in the previous frame to the peak in the next frame that is closest in frequency. This algorithm may create unreasonable jumps. An improved strategy compares amplitude and frequency difference for the candidates to connect and connects only peaks that do not exceed a minimum variation for both parameters. Unconnected peaks belong to dying partials. Peaks without any connections may represent a new born sinusoid [ABLS02]. The variation thresholds can be adapted to favor smoothness of amplitude and frequency trajectories.

AMT Part VI: Sinusoidal Modeling 44/53 Recent algorithms try to incorporate a trajectory model into the peak continuation algorithm [LMR04].

AMT Part VI: Sinusoidal Modeling 45/53 8 Parameter interpolation For synthesis of the sinusoid from the estimated parameters an interpolation from the analysis frame rate to the sample rate has to be obtained. The problem has been solved in [MQ86]. Given are frame parameters of frame at position n i, [A(n i ), φ(n i ), ω(n i )] and the following frame at position n i+1, [A(n i+1 ), φ(n i+1 ), ω(n i+1 )] Use lowest order that uniquely determines an interpolating polynomial. Amplitude interpolation: 2 points given linear interpolation A(n) = A(n i)(n i+1 n) + A(n i+1 )(n n i ) n i+1 n i (37) Phase and frequency are not independent, phase interpolation has to be consistent with the frequencies at the frame boundaries.

AMT Part VI: Sinusoidal Modeling 46/53 4 values are given phase at left and right frame boundary as well as frequency at frame boundaries. lowest polynomial order is 3, third order phase polynomial: second order frequency polynomial: φ(n) = qn 3 + rn 2 + sn + t (38) ω(n) = 3qn 2 + 2rn + s (39) coordinate system located at frame n i argument is time difference n d = n i+1 n i phase and frequency given at left frame boundary yields φ(0) = t = ˆφ(n i ) (40) ω(0) = s = ˆω(n i ) (41) frequency constraints, phase at right boundary is known only up to an integer multiple

AMT Part VI: Sinusoidal Modeling 47/53 of 2π φ(n d ) = qn 3 d + rn2 d + ˆω(n i)n d + ˆφ(n i ) = ˆφ(n i+1 ) + 2πM (42) ω(n d ) = 3qn 2 d + 2rn d + ˆω(n i ) = ˆω(n i+1 ) (43) 3 unknowns and 2 equations, solving for q and r we get a solution depending on M r = 3 n 2 d( ˆφ(n i+1 ) (ˆω(n i )n d + ˆφ(n i )) + 2πM) (44) 1 n d (ˆω(n i+1 ) ˆω(n i )) (45) q = 2 n 3 d( ˆφ(n i+1 ) (ˆω(n i )n d + ˆφ(n i )) + 2πM) (46) + 1 n 2 d(ˆω(n i+1 ) ˆω(n i )) (47) select M we require minimum curvature of the frequency trajectory, curvature is pro-

AMT Part VI: Sinusoidal Modeling 48/53 portional to q, so we select M that minimizes MIN = q(m) 2 = ( 2 φ 4πM + n d (ˆω(n i+1 ) ˆω(n i )) ) 2 (48) n 3 d where φ = ˆφ(n i+1 ) (ˆω(n i )n d + ˆφ(n i )) setting the derivative with respect to M to zero we get and solving for M yields 0 = n d (ˆω(n i+1 ) ˆω(n i )) 2 φ 4πM (49) ˆM = 1 2π (n d 2 (ˆω(n i+1) ˆω(n i )) ˆφ(n i+1 ) + ˆω(n i )n d + ˆφ(n i )) (50) the M selected has to be integer so we select M = round( ˆM).

AMT Part VI: Sinusoidal Modeling 49/53 φ(n)[2π rad] 15 10 5 phase interpolation as a function of M M=10 M=11 M=12 M=13 M=14 0 0 10 20 30 40 50 60 70 80 90 100 time n Figure 8: phase interpolation for varying M.

AMT Part VI: Sinusoidal Modeling 50/53 w(n)[2π rad] 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.1 0.09 frequency interpolation as a function of M M=10 M=11 M=12 M=13 M=14 0.08 0 10 20 30 40 50 60 70 80 90 100 time n Figure 9: frequency interpolation for varying M. Limiting values are ˆω(n i )) = 0.1 and ˆω(n i+1 )) = 0.15 (normalized frequency).

AMT Part VI: Sinusoidal Modeling 51/53 References [ABLS02] X. Amatriain, J. Bonada, A. Loscos, and X. Serra. Spectral processing. In U. Zölzer, editor, Digital Audiuo Effects, chapter 10, pages 373 438. John Wiley & Sons, 2002. 3, 43 [AS05] M. Abe and J. O. Smith. AM/FM rate estimation for time-varying sinusoidal modeling. In Proc. Int. Conf. on Acoustics, Speech and Signal Processing, pages 201 204 (Vol. III), 2005. 30, 34, 35 [HMW01] S.W. Hainsworth, M.D. Macleod, and P.J. Wolfe. Analysis of reassigned spectrograms for musical transcription. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 23 26, 2001. 8 [Kay88] S. Kay. Modern Spectral Estimation. Prentice Hall, 1988. 20 [LMR04] [MQ86] M. Lagrange, S. Marchand, and J-B. Rault. Using linear prediction to enhance the tracking of partials. In Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), 2004. 44 R. J. McAulay and T. F. Quatieri. Speech analysis-synthesis based on a sinu-

AMT Part VI: Sinusoidal Modeling 52/53 soidal representation. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(4):744 754, 1986. 43, 45 [Pee01] G. Peeters. Modèles et modification du signal sonore adapté à ses charactéristiques locales. PhD thesis, Univertsité Paris 6, 2001. french only, http://recherche.ircam.fr/equipes/analysesynthese/peeters/articles/peeters 2001 PhDThesisv1.1.pdf. 30, 36 [RB98] B. Ristic and B. Boashash. Comments on The Cramer-Rao lower bounds for signals with constant amplitude and polynomial phase. IEEE Transactions on Signal Processing, 46(6):1708 1709, 1998. 21 [Röb06a] A. Röbel. Analysis, modelling and transformation of audio signals - Part I: Fundamentals of discrete fourier analysis. lecture slides, 2006. AMT : Part I. 13 [Röb06b] A. Röbel. Analysis, modelling and transformation of audio signals - Part II: Analysis/resynthesis with the short time fourier transform. lecture slides, 2006. AMT : Part II. 9 [Röb06c] A. Röbel. Analysis, modelling and transformation of audio signals - Part III: signal modifications using the STFT. lecture slides, 2006. AMT : Part III. 3

AMT Part VI: Sinusoidal Modeling 53/53 [Röb06d] A. Röbel. Analysis, modelling and transformation of audio signals - Part IV: Source filter modeling and spectral envelope estimation. lecture slides, 2006. AMT : Part IV. 10 [Röb06e] A. Röbel. Analysis, modelling and transformation of audio signals - Part V: Fundamental frequency estimation. lecture slides, 2006. AMT : Part V. 3 [Rod97] [RZR04] X. Rodet. Musical sound signal analysis/synthesis: Sinusoidal+residual and elementary waveform models. In Proc IEEE Time-Frequency and Time-Scale Workshop 97, (TFTS 97), page??, 1997. 8 A. Röbel, M. Zivanovic, and X. Rodet. Signal decomposition by means of classification of spectral peaks. In Proc. Int. Computer Music Conference (ICMC), pages 446 449, 2004. 8 [Ser97] X. Serra. Musical signal processing, chapter Musical Sound Modeling with Sinusoids and Noise, pages 91 122. Studies on New Music Research. Swets & Zeitlinger B. V., 1997. 3 [Tho82] D. J. Thomson. Spectrum estimation and harmonic analysis. Proceedings of the IEEE, 70(9):1055 1096, 1982. 8