PART II Practical problems in the spectral analysis of speech signals

Similar documents
PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual.

Frequency Domain Representation of Signals

Chapter 5 Window Functions. periodic with a period of N (number of samples). This is observed in table (3.1).

Discrete Fourier Transform (DFT)

ME scope Application Note 02 Waveform Integration & Differentiation

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

SAMPLING THEORY. Representing continuous signals with discrete numbers

FFT analysis in practice

Acoustic spectra for radio DAB and FM, comparison time windows Leszek Gorzelnik

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Linguistic Phonetics. Spectral Analysis

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

Advanced Audiovisual Processing Expected Background

Hideo Okawara s Mixed Signal Lecture Series. DSP-Based Testing Fundamentals 14 FIR Filter

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Fundamentals of Time- and Frequency-Domain Analysis of Signal-Averaged Electrocardiograms R. Martin Arthur, PhD

Signal Characteristics

ME scope Application Note 01 The FFT, Leakage, and Windowing

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

IADS Frequency Analysis FAQ ( Updated: March 2009 )

Spectrum Analysis - Elektronikpraktikum

FFT 1 /n octave analysis wavelet

Introduction. Chapter Time-Varying Signals

F I R Filter (Finite Impulse Response)

6 Sampling. Sampling. The principles of sampling, especially the benefits of coherent sampling

Complex Sounds. Reading: Yost Ch. 4

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Window Functions And Time-Domain Plotting In HFSS And SIwave

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

Removal of Line Noise Component from EEG Signal

Spur Detection, Analysis and Removal Stable32 W.J. Riley Hamilton Technical Services

Frequency Division Multiplexing Spring 2011 Lecture #14. Sinusoids and LTI Systems. Periodic Sequences. x[n] = x[n + N]

P. Robert, K. Kodera, S. Perraut, R. Gendrin, and C. de Villedary

Acoustics, signals & systems for audiology. Week 9. Basic Psychoacoustic Phenomena: Temporal resolution

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

145M Final Exam Solutions page 1 May 11, 2010 S. Derenzo R/2. Vref. Address encoder logic. Exclusive OR. Digital output (8 bits) V 1 2 R/2

Hideo Okawara s Mixed Signal Lecture Series. DSP-Based Testing Fundamentals 22 Trend Removal (Part 2)

Fourier Methods of Spectral Estimation

EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Timbral Distortion in Inverse FFT Synthesis

Signal Processing for Digitizers

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal.

Sound synthesis with Pure Data

Windows and Leakage Brief Overview

Outline. Introduction to Biosignal Processing. Overview of Signals. Measurement Systems. -Filtering -Acquisition Systems (Quantisation and Sampling)

The Fundamentals of FFT-Based Signal Analysis and Measurement Michael Cerna and Audrey F. Harvey

Definitions. Spectrum Analyzer

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Time and Frequency Domain Windowing of LFM Pulses Mark A. Richards

Digital Signal Processing

ELEC3242 Communications Engineering Laboratory Amplitude Modulation (AM)

Window Method. designates the window function. Commonly used window functions in FIR filters. are: 1. Rectangular Window:

L19: Prosodic modification of speech

Noise Measurements Using a Teledyne LeCroy Oscilloscope

Question 1 Draw a block diagram to illustrate how the data was acquired. Be sure to include important parameter values

Definition of Sound. Sound. Vibration. Period - Frequency. Waveform. Parameters. SPA Lundeen

AUDL GS08/GAV1 Signals, systems, acoustics and the ear. Loudness & Temporal resolution

Pitch Detection Algorithms

Wavelets and wavelet convolution and brain music. Dr. Frederike Petzschner Translational Neuromodeling Unit

Temporal resolution AUDL Domain of temporal resolution. Fine structure and envelope. Modulating a sinusoid. Fine structure and envelope

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

Time Series/Data Processing and Analysis (MATH 587/GEOP 505)

Michael F. Toner, et. al.. "Distortion Measurement." Copyright 2000 CRC Press LLC. <

Post-processing using Matlab (Advanced)!

Chapter 3 Data Transmission COSC 3213 Summer 2003

Trigonometric functions and sound

Fourier Theory & Practice, Part I: Theory (HP Product Note )

FAST Fourier Transform (FFT) and Digital Filtering Using LabVIEW

Fourier and Wavelets

6.02 Practice Problems: Modulation & Demodulation

Design of FIR Filters

When and How to Use FFT

Generalised spectral norms a method for automatic condition monitoring

EE 438 Final Exam Spring 2000

SOPA version 2. Revised July SOPA project. September 21, Introduction 2. 2 Basic concept 3. 3 Capturing spatial audio 4

FIR/Convolution. Visulalizing the convolution sum. Frequency-Domain (Fast) Convolution

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Introduction of Audio and Music

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Chapter Three. The Discrete Fourier Transform

CMPT 468: Delay Effects

How to Utilize a Windowing Technique for Accurate DFT

Sampling and Reconstruction of Analog Signals

Final Exam Solutions June 14, 2006

Pitch Shifting Using the Fourier Transform

WaveSurfer. Basic acoustics part 2 Spectrograms, resonance, vowels. Spectrogram. See Rogers chapter 7 8

Signal Processing. Level IV/V CITE, BTS/DUT/Licence. i_5. i_6 LOWPASS. 5 in2. w0 =6000rad/s xi =.8; G =3 lp2a1. mul0. Filtre passe-bas.

Since the advent of the sine wave oscillator

Design of FIR Filter for Efficient Utilization of Speech Signal Akanksha. Raj 1 Arshiyanaz. Khateeb 2 Fakrunnisa.Balaganur 3

Design Digital Non-Recursive FIR Filter by Using Exponential Window

GUJARAT TECHNOLOGICAL UNIVERSITY

Electrical & Computer Engineering Technology

LAB #7: Digital Signal Processing

6.555 Lab1: The Electrocardiogram

Transcription:

PART II Practical problems in the spectral analysis of speech signals

We have now seen how the Fourier analysis recovers the amplitude and phase of an input signal consisting of a superposition of multiple components. In speech, we are not usually interested in phase as such, so the most useful display is usually amplitude as a function of frequency. This is what we will examine in most of the following examples.

For a practical example we will use a signal consisting of sines at 100, 500, 1500, 2500 and 3500Hz. (A kind of very primitive approximation to schwa with a fundamental frequency of 100Hz.) The amplitudes were chosen to be 1, 1, 0.5, 0.25, 0.125 respectively. We will now also use a db scale for amplitude as it is more appropriate for most speech signals, and will also make it easier to see an important issue in spectral analysis.

2.5 2 f: 100 500 1500 2500 3500 A: 1 1 0.5 0.25 0.125 Phi: 0 1.5 1 0.5 0 0.5 1 1.5 2 2.5 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 Time (s)

0 Fourier analysis of one pitch period of pseudo schwa 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

We will use the spectrum in the previous figure as a reference. For it, we were able to select precisely one pitch period for analysis. However, in the majority of cases with speech signals we will not know in advance the pitch of the signal to be analyzed, and in any case the pitch will be changing over time. So we will not be able to analyze the data in segments corresponding exactly to one pitch period (and it is often preferable to calculate the FFT with a signal length (in samples)that is a power of two (that is what makes the FFT "fast")). So what will the spectrum look like if we analyze the previous signal over 128 samples (instead of 100)?

0 Pseudo schwa using 128 point FFT 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

This looks very messy! The relative amplitudes of the sine components have changed, and the valleys between the peaks are much more shallow. In short, the structure of the spectrum has been considerably smeared. To understand why this happens, we need to look at the signal actually seen by the Fourier analysis (a segment of 128 samples of data)

2.5 Signal seen by the 128 point FFT 2 1.5 1 0.5 0 0.5 1 1.5 2 2.5 0 20 40 60 80 100 120 140 Time in samples

Fourier analysis treats the signal as if it is periodic. However, there is a big discontinuity between the last sample and the first sample if we imagine this signal being periodically repeated. Remember that an impulse has a flat spectrum. Since a discontinuity is a kind of impulse we find a smearing of energy across the spectrum to frequencies not present in the original signal.

Another way of thinking of this is that the FFT here analyses the signal at frequencies that are multiples of samplerate/128. These frequencies do not necessarily correspond to the frequencies in the input signal. Let us now see what happens when we use a longer FFT (512 points).

0 Pseudo schwa using 512 point FFT 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

This is a bit better, but corresponds to using a window length of about 50ms, which is already quite long for analyzing speech (where the spectrum may change a lot even within 10 or 20ms). So a further increase in the length of the window is not really feasible. Faced with the present problem, the standard procedure is to use a window function

The next figure shows a typical window function (known as a Hamming window), and the effect of multiplying the input signal point by point with the corresponding point in the window function.

Input signal Window Windowed signal 5 0 Illustration of window function in time domain 5 0 20 40 60 80 100 120 140 1 0.5 0 0 20 40 60 80 100 120 140 2 0 2 0 20 40 60 80 100 120 140

The key feature is that the signal is tapered smoothly towards zero at the start and end, so there will be much less of a discontinuity if this signal is regarded as repeating periodically.

Note, however, that a windowed version of a single sinusoidal signal will no longer be a pure sine wave. Thus the result of the Fourier analysis will inevitably contain further frequency components in addition to the frequency of the input signal

0 Pseudo schwa using 512 point FFT and Hamming window 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

This certainly gives a tidier picture. There are many different window functions. The next figure shows the same analysis, but now performed with a Blackman window.

0 Pseudo schwa using 512 point FFT and Blackman window 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

The Blackman window obviously gives much lower valleys between the peaks than the Hamming window. But this comes at a price: The peaks are wider. So while the Blackman window will show the peaks more clearly above the background noise, it may result in very closely spaced peaks becoming merged. Thus, the best choice of window depends to some extent on the kind of signal that is to be analyzed.

In the previous example the pitch period of the signal was an integer number of samples: With F0=100Hz and samplerate=10000, one pitch period corresponds to exactly 100 samples. What happens when one pitch period does not correspond to an integer number of samples?

We will examine this with another "pseudo schwa" but now based on a fundamental frequency of 107Hz. (The other frequencies are the same multiples of F0 as in the previous example based on F0=100Hz.) Precise length (in samples) of pitch period = 93.4579

For the Fourier analysis we have to round this to the nearest integer.

0 Fourier analysis over 93 samples 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

Clearly, this also results in an unsatisfactory analysis: The height of the peaks relative to the valleys is very low. This should not come as a surprise: We have in effect once again introduced a discontinuity into the signal. Even the apparently slight difference between the true length of a pitch period, and the length used in the analysis is enough to cause problems.

The following slides show in turn 128 point FFT without window 512 point FFT without window 512 point FFT with Blackman window Once again, only in the last case does a reasonably tidy picture of the spectrum emerge.

0 Pseudo schwa (107 Hz) using 128 point FFT 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

0 Pseudo schwa (107 Hz) using 512 point FFT 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

Pseudo schwa (107 Hz) using 512 point FFT and Blackman window 0 10 20 Amplitude (db) 30 40 50 60 70 80 90 0 1000 2000 3000 4000 5000 Frequency (Hz)

These examples show that for practical analysis of speech use of a window function is essential.