Spectral Estimation & Examples of Signal Analysis

Spectral Estimation & Examples of Signal Analysis Examples from research of Kyoung Hoon Lee, Aaron Hastings, Don Gallant, Shashikant More, Weonchan Sung Herrick Graduate Students

Estimation: Bias, Variance and Mean Square Error Let φ denote the thing that we are trying to estimate. Let ˆφ denote the result of an estimation based on one data set with N pieces of information. Each data set used for estimate à a different estimate of φ. Bias: True value - the average of all possible estimates Variance: b( ˆφ) = φ E[ ˆφ] σ 2 = E[ ( ˆφ E[ ˆφ]) 2 ] Measure of the spread of the estimates about the mean of all estimates. Mean Square Error: m.s.e. = E[ ( ˆφ φ) 2 ] = b 2 +σ 2

Estimation: Some definitions Estimate is consistent if, when we use more data to form the estimate, the mean square error is reduced. If we have two ways of estimating the same thing, we say that the estimator that leads to the smaller mean square error is more efficient than the other estimator. φ = (a,b) b true value bias xx x estimates xx x x mean of all x estimates a

Examples Bias and variance of an estimate of the mean: X, ˆµ = 1 N E[ ˆµ] = E 1 N X N n n=1 = 1 N E X N n = 1 N µ = µ (unbiased) n=1 N n=1 σ 2 ˆµ E 2 ( N 2 ˆµ E[ ˆµ] )2 = E 1 X N n µ = E 1 N ( X n=1 N n µ ) n=1 = 1 N N N 2 E ( X m µ )( X n µ ) n=1 m=1 = 1 ( N 2 N 2 N ) E ( X n µ )( X m µ ) + N E ( X n µ )2 = 1 ( N 2 N 2 N ) E ( X n µ ) E ( X m µ ) + N E ( X n µ )2 = 1 N 2 N E ( X n µ )2 = 1 N σ x 2 N X n n=1 Derivation assuming that the samples X n are independent of one another. Separate into terms where n does not equal m and where n=m

Examples Biased Estimate of the variance of a set of in measurements: 1 N N (X n ˆµ) 2 n=1 Unbiased Estimates of the variance of a set of N measurements:: N 1 (X N 1 n ˆµ) 2 N 1 and (X n=1 N n µ) 2 n=1 First estimate the mean, and use that estimate in this calculate (have lost 1 degree of freedom) Special case where the mean is known and doesn t need to be estimated from the data

Estimation of Autocovariance functions Two methods of estimating R xx (τ) from T sec. of data. 1. Dividing by the integration time: T- τ Estimation was unbiased but had very high variance, particularly when τ is close to T. 2. Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T- τ )/T. This window attenuates the high variance estimates. x(t) x(t) x(t+τ) T secs τ time Calculating the average value of [x(t) x(t+τ)] from T seconds of data.

Estimation of Autocovariance functions Two methods of estimating R xx (τ) from T sec. of data. 1. Dividing by the integration time: T- τ Estimation was unbiased but had very high variance, particularly when τ is close to T. x(t) 2. Dividing by total time: T Estimation was biased (asymptotically unbiased). This was equivalent to multiplying first estimate by a triangular window (T- τ )/T. This window attenuates the high variance estimates. x(t) x(t) x(t) x(t) τ τ x(t+τ) x(t+τ) x(t) x(t+τ) time x(t+τ) T secs τ τ τ Calculating the average value of [x(t) x(t+τ)] from T seconds of data.

Estimation of Cross Covariance Same issues as for Auto-Covariance: Bigger τ less averaging for finite T. x(t) y(t-τ) x(t) T τ time y(t) x(t) y(t+τ) T time x(t) and y(t), zero mean, weakly stationary random processes. Average value of [x(t) y(t+τ)]. Additional problem: must make T large enough to accommodate system delays.

Estimation of Covariance With fast computation of spectra, these are now more usually estimated by inverse Fourier transforming the power and cross spectral density estimates. Inverse transform of RAW PSD or CSD ESTIMATE equivalent to Method 2 for calculating covaraiance functions with triangular window for data of size T r

Power Spectral Density Estimation Definition: S xx ( f ) = Estimation: 1. Could Fourier Transform the Autocorrelation Function estimate (not computationally efficient). 2. Could use the frequency domain definition directly. Raw Estimate = lim T E X T T * X T = Ŝ xx ( f ) = X T * X T T + R xx (τ )e j2π f τ dτ. No averaging! Extremely poor variance characteristics. Variance is S xx ( f ) 2 and is unaffected by T, the length of data used.

Power Spectral Density Estimation (Continued) Smoothed estimate from segment averaging. x(t) w(t) Tr time 1. Break signal up into Nseg segments, Tr seconds long. 2. For each segment: 1. Apply a window to smooth transition at ends of segments 2. Fourier Transform windowed segment à X T (f) 3. Calculate a raw power spectral density: X Tr 2 /Tr estimate 3. Average the results from each segment to get the smoothed estimate and do a power compensation for the window used.!s xx ( f ) = 1 NSEG.Pcomp NSEG Ŝ xxi ( f ) Pcomp = 1 T w2 (t)dt i=1

Power Spectral Density Estimation (Continued) x(t) Smoothed estimate from segment averaging. w(t) Tr time Overlap: For some windows segment overlap makes sense. A Hann window, 50% overlap means that data de-emphasized in one windowed segment is strong emphasized in the next window (and vice versa). Bias: Note PSD estimate bias is controlled by the size of the window (T r ) which controls the frequency resolution (1/T r ). Larger window, smoother transitions à less power leakage à less bias

Power Spectral Density (PSD) Estimation (Continued) We argue that the distribution of the smoothed PSD was related to that of a Chi-squared random variable (χ ν 2 ) with ν = 2.NSEG degrees of freedom, if Tr was large enough so we could ignore bias errors. Therefore: Variance 2.Nseg. S! xx S xx and rearranging we showed that: = 4.Nseg2 Variance 2 S xx Variance[! S xx ] =! S xx S xx 2 Nseg = 2(2.Nseg) Therefore, we can control variance by averaging more segments. Note: shorter segments mean larger bias, so for a fixed T seconds of data, there is a trade-off between Segment Length (Tr), which controls the bias, and Number of Segments (NSEG), which controls the variance: T=Tr.NSEG.

Definition: S xy ( f ) = Cross Spectral Density (CSD) lim T E X T T * Y T = Estimation: Could Fourier Transform the Cross-correlation function estimate (not computationally efficient). Could use the frequency domain definition directly. Raw Estimate = Ŝ xy ( f ) = X T * Y T T As with PSD,this has extremely poor variance characteristics, so divide the time histories into segments, + R xy generate a raw estimate from each segment, and (τ )e j2π f τ dτ. average to reduce variance and produce a smoothed estimate.

Cross Spectral Density Estimation: Segment Averaging x(t) y(t) w(t) w(t) Tr Tr time time Fourier Transform of Windowed Segments à X T (f) & Y T (f). Raw Estimate from ith segment = Ŝ xyi ( f ) = X Tr * ( f )Y Tr ( f ) Smoothed Estimate =!S xy ( f ) = 1 Tr Nseg Ŝ Nseg xyi ( f ) i=1

Issues with Cross Spectral Density Estimates 1. Reduce bias by choosing the segment length (Tr) as large as possible. (Bias greatest where the phase changes rapidly.) 2. Reduce variance by averaging many segments. 3. Might require a large amount of averaging to reduce noise effects: y m (t) = y(t) + n(t) = h(t) x(t) + n(t) x(t), n(t) zero mean, weakly stationary, uncorrelated random processes SNR ym = S yy = H( f ) S ny n y S ny n y 2 Sxx!S xy H( f )! S xx +! S xn H( f )! S xx 4. Time delays between x and y cause problems, if the time delay (t o ) is greater than a small fraction of the segment length (T r ). Can estimate t 0 and offset y segments, but need T+t 0 seconds of data.

Cross Spectral Density Estimation: Segment Averaging with System Delays x(t) w(t) y(t) estimated t 0 Tr w(t) time Tr time Fourier Offsetting Transform y segements of essentially Windowed removes Segments most of à the X T delay (f) & from Y T (f). the estimated frequency response function. Can put back delay effects in by multiplying estimate of H(f) by: e j2π f ˆt 0

Coherence Function Estimation: Substitute in Smoothed Estimates of Spectral Densities Coherence takes values in the range 0 to 1. 2 = S xy 2 S xx S yy Definition: γ xy ; Estimate:!γ xy 2 =! S xy 2!S xx! Syy Substituting raw spectral density estimates into formula results in 1 A result where the coherence = 1 at all frequencies from measured signals should be treated with a high degree of suspicion. Estimate highly sensitive to bias in spectral density estimates, which is particularly bad where the phase of the cross spectral density changes rapidly (at maxima and minima in Sxy ). COHERENCE à 0 because of: NOISE NONLINEARITY BIAS ERRORS IN ESTIMATION LINEAR RELATIONSHIP BETWEEN SIGNALS VERY WEAK

Example: System with Some Nonlinearities (cubic stiffness) and Noisy Measurements Nonlinearity causes spread of energy here, around 3x and 5x this frequency Nonlinear Mode Poor SNRy Poor SNRy Nonlineary causes broad dips in coherence function. If you drive the system harder these regions become wider Dips due to Bias Errors Poor SNR on output causing this

Example: Linear System with Noisy Output Measurements High SNR; Tr = 512/fs High SNR; Tr = 2048/fs Low SNR on output; Tr = 512/fs Dip filled in with noise Less Averaging compared to N=512 case: fewer segments à greater variance but bias effects are less Bias greatest where phase change is fastest Dips mainly due to bias. and thus get smaller as resolution increases SNR y also affecting coherence here

H1 and H2 Estimates of H: Effects of Noise If the system is linear and there is No Noise (ignoring all other estimation erros): H(f) = S xy (f)/s xx (f) (H1 approach) = S yy (f)/s yx (f) (H2 approach) Cases with Noise: Assume that estimation errors are small (Tr and Nseg both large). H1estimate = S xm y m /S xm x m = [S xy (f)/s xx (f)]/[1+ S nx n x /S xx ] = H(f)/[1+ S nx n x /S xx ] Noise on the input adversely affects this estimate of H. Theory: H1estimate < H H2 estimate = S ym y m /S * x m y m = [S yy (f)/s * xy (f)].[1+ S n y n y /S yy ] = H(f).[1+ S ny n y /S yy ] Noise on the output adversely affects this estimate of H. Theory: H2 estimate > H Note that with bias errors due to windowing (Tr not as large as you would like) these inequalities may not hold, but H1estimate < H2estimate

Note that, e.g., Estimation of H E[Ĥ] = E! Sxy!S xy E[ S! xy ] E[ S! xx ] Frequency response function estimates are extremely sensitive to bias errors which are worse at peaks and troughs. Require large segment sizes to overcome bias, but this means less segments to average, thus higher variance. Note: A low coherence function does not necessarily imply a poor frequency response function estimate. If the coherence function is low because of noise on the response (input), then the H1 (H2) frequency response estimation should be accurate, provided sufficient averaging was done to reduce the variance of the estimates.

Calibration of PSD and CSD in MatLab psd - old program pwelch new program cpsd gives complex conjugate of want you want mean square value of the time signal (variance), should give the same result as integrating the PSD. (Parseval s theorem) Check for whether you are getting a two-sided or a one-sided PSD. One sided: Add negative and positive frequency contributions (not for the components at f=0 and fs/2, though, which should be zero anyway) this is what Matlab does Two sided: When you integrate the spectrum (0 to fs/2) you ll get about half of what you expect (no addition of positive and negative frequency contributions has occurred) Matlab also doubles the CPSD from 0 to fs/2, which doesn t make sense, because it is convenient when you estimated the frequency response function because the doubling cancels.

Calibration (continued) Power Spectral Density Estimates Using DTFs: Recall that for fs/2 < f < fs/2, X T ( f ) f =k fs N Δ.DFT( w(nδ).x(nδ),n = 0,1,...N 1) = Δ.X k Ŝ xx ( f k ) = X * T ( f k )X T ( f k ) T.wcomp Δ 2. X * k X k NΔ.wcomp = Δ X k 2 N.wcomp

Calibration Continued: Energy Spectral Density We sometimes have segments that contain a single transient (tap testing of structures) and we average the raw spectra from each segment to remove noise effects. [Be careful with applying this random process theory to different types of signals, each segment used in the estimation should contain similar information.] If we choose different Tr, i.e., allow a shorter or longer time between successive transients, (transient should have died away in the segment), the PSD will change because of the divide by Tr in the formula. Tr time - s To overcome this problem we estimate an Energy Spectral Density (ESD) (remove the divide by Tr in the raw PSD estimate.) Raw ESD estimate = X Tr (f) 2 Δ 2 X k 2 (Volts/Hz) 2 [You also need to be careful with window choice here so as not to distort the transient]

Calibration Continued: Power Spectrum Power Spectrum Segment averaging is often applied to signals that have periodic and random components. In a power spectrum (works great for periodic signals), as resolution increases (frequency spacing gets smaller) the noise floor decreases. Total power = sum of power at each spectral component. Recall: C k = X k /N, if you synchronize, don t alias and there is no noise. Power Spectral Density (PSD) (ideal for random signals level unaffected by changes in frequency resolution window size) Total power = the integral of the PSD = sum of PSD Values x Freq.Resolution Power estimate = X k 2 /N 2 = Raw PSD estimate. (frequency resolution) = (Δ X k 2 /N ). (fs/n) V 2

PSDs for Sines + Noise The power spectral density of a sinusoid is: A 2 2 δ( f f 1 ) + A2 2 δ( f + f 1 ) But by using windows Tr seconds long, the delta functions become sinc or sinc-like functions with maximum height affected by window size Tr. Level - db 75 70 65 60 55 50 45 40 35 Left Original Simulated 30 1000 1050 1100 1150 1200 1250 1300 Frequency - Hz If Tr is too small the sinc functions will be buried in the noise. But as Tr is increased the sinc functions begin to emerge from the noise. So if you expect a peak in your spectrum is due to sinewave, increase the window size (better frequency resolution) and see if the peak gets larger, as you would expect if it were truly a sine wave.

Sines + Noise S xm x m = S xx + S nn = AT r 2 sinc ( π ( f f 1 )T ) r + AT 2 r 2 sinc ( π ( f f 1 )T ) r T r + S nn A2 T r 4 sinc2 ( π ( f f 1 )T r ) + A2 T r 4 sinc2 ( π ( f f 1 )T r ) + Ŝnn Note here we have assumed averaging is sufficient to make cross terms small compared to the terms retained.

Sines + Broadband Random Noise PSD - V 2 /Hz Tr = NΔ N=4,096 N=8,192 N=16,384 sinc function emerging from noise as Tr increases. Frequency - Hertz Variation in estimated PSD due to lack of averaging. Tr larger Nseg smaller, larger variance.

Frequency Variations in Sinusoidal Components When there are frequency variations, sinusoidal power is spread over group of frequencies and amplitudes are reduced. Sometimes more noticeable at higher harmonics when variations are small, like signal 17 below, versus signal 22, where there appears to be very little frequency variation (sinusoidal components are narrow, even at high frequencies. Power Spectral Densities - db Frequency - Hz

Fast Modulations in Frequency Modulated (FM) Sounds FM tones of the form (randomized variation of tones) : t w(t) y(t) = Asin 2π f o t + 2π B r(t) dt 0 f c r(t) f (t) 700 + B f 0 = 700 700 B Instantaneous Frequency tim e r(t) : random noise pass through a 4 th order Butterworth filter f c : cutoff freq., B : the range of frequency modulation f 0 : center freq (700 Hz)., Sampling frequency : 44.1 khz

Sounds Power Spectra of a Frequency Modulated Tone f c Filter cut-off frequency: controls frequency content of frequency variation f c = 10 Hz 50 Hz 100 Hz 200 Hz B controls the range of frequency modulation B = 25 Hz 50 Hz 75 Hz 100 Hz Spectral Estimation Parameters: Hanning Window f s = 44.1 khz Δ f = 1 Hz 100 segments 50% overlap

Power Spectra of a FM Tone with Trackable FMs Made Stationary f c Filter cut-off frequency: controls frequency content of frequency variation f c = 10 Hz 50 Hz 100 Hz 200 Hz B controls the range of frequency modulation B = 25 Hz 50 Hz 75 Hz 100 Hz Spectral Estimation Parameters: Hanning Window f s = 44.1 khz Δ f = 1 Hz 100 segments 50% overlap

Another Example of Spectral Manipulation to Help in Estimation of Tonality Metrics Recording (> 5s) à Finely Resolved Spectrum Signal Decomposition: (1) Significant Tones, (2) Insignificant Tones, (3) Noise Floor Signal Reconstruction: (2)+(3) à (4) new noise floor, (1)+(4) Inverse DFT to give sound.

Time-Varying (Non-Stationary) Signals Spectrograms: Apply stationary spectral methods over short periods of time with overlapping windows limits averaging for random parts of signals short windows means more bias, and tones not so prominent Humming/ Whining Motor Driven Device Frequency 0 à 2000 Hz. Time 0 à 3 seconds

Spectrogram: Non-stationary Sounds Aircraft flyover: Tones with Doppler Shift & Ground Effects

Spectrograms: Sliding Spectral Estimates Have to play with window sizes a. Listen to see if there are any obvious variations you can track, try a window size about 1/10 of a variation period (Ta). In Matlab: nfft = nearest power of 2 to ( 0.1 fs. Ta). Typically we choose a Hann window with 50% overlap. b. Identify fundamental frequencies of tone complexes to identify lowest desirable frequency resolution. Based on frequency analysis and understanding of repetition rates in your machine, the minimum window size should be the inverse of (fundamental frequency/7) for a Hann window. (One harmonic series example.) c. Make window smaller (if harmonics remain well separated) to see if there are faster fluctuations. As you continue to make windows smaller, the frequency resolution in Hz (inverse of window size in seconds) will get bigger. Eventually harmonic separation and spectral resolution will merge (not good). d. Always a trade-off between spectral and temporal resolution

References (for ME 579 at Purdue)