Pitch and Harmonic to Noise Ratio Estimation

Similar documents
Friedrich-Alexander Universität Erlangen-Nürnberg. Lab Course. Pitch Estimation. International Audio Laboratories Erlangen. Prof. Dr.-Ing.

Speech Enhancement Using Microphone Arrays

Harmonic Percussive Source Separation

Digital Signal Processing

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

8.3 Basic Parameters for Audio

Experiments #6. Convolution and Linear Time Invariant Systems

Frequency Domain Representation of Signals

Digital Signal Processing PW1 Signals, Correlation functions and Spectra

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

EE 422G - Signals and Systems Laboratory

Adaptive Filters Application of Linear Prediction

ECE 201: Introduction to Signal Analysis

Laboratory Assignment 4. Fourier Sound Synthesis

Theory of Telecommunications Networks

COMP 546, Winter 2017 lecture 20 - sound 2

Contents. Introduction 1 1 Suggested Reading 2 2 Equipment and Software Tools 2 3 Experiment 2

Lab 8. Signal Analysis Using Matlab Simulink

COM325 Computer Speech and Hearing

Lab S-8: Spectrograms: Harmonic Lines & Chirp Aliasing

Aberehe Niguse Gebru ABSTRACT. Keywords Autocorrelation, MATLAB, Music education, Pitch Detection, Wavelet

Lab S-2: Direction Finding: Time-Difference or Phase Difference

Figure 1: Block diagram of Digital signal processing

DFT: Discrete Fourier Transform & Linear Signal Processing

Digital Video and Audio Processing. Winter term 2002/ 2003 Computer-based exercises

y(n)= Aa n u(n)+bu(n) b m sin(2πmt)= b 1 sin(2πt)+b 2 sin(4πt)+b 3 sin(6πt)+ m=1 x(t)= x = 2 ( b b b b

Fourier Methods of Spectral Estimation

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Fall Music 320A Homework #2 Sinusoids, Complex Sinusoids 145 points Theory and Lab Problems Due Thursday 10/11/2018 before class

1 Introduction and Overview

MUSC 316 Sound & Digital Audio Basics Worksheet

II. Random Processes Review

Music 171: Amplitude Modulation

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Fourier Signal Analysis

Electrical & Computer Engineering Technology

Digital Signal Processing ETI

FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA

ECEGR Lab #8: Introduction to Simulink

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Frequency Division Multiplexing Spring 2011 Lecture #14. Sinusoids and LTI Systems. Periodic Sequences. x[n] = x[n + N]

Complex Sounds. Reading: Yost Ch. 4

Discrete Fourier Transform (DFT)

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Tempo and Beat Tracking

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Basic Signals and Systems

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Lab week 4: Harmonic Synthesis

Lecture 9. Lab 16 System Identification (2 nd or 2 sessions) Lab 17 Proportional Control

Digital Signal Processing ETI

GEORGIA INSTITUTE OF TECHNOLOGY. SCHOOL of ELECTRICAL and COMPUTER ENGINEERING

Laboratory Manual 2, MSPS. High-Level System Design

Enhanced Waveform Interpolative Coding at 4 kbps

Detection, Interpolation and Cancellation Algorithms for GSM burst Removal for Forensic Audio

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Matched filter. Contents. Derivation of the matched filter

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Tempo and Beat Tracking

ME scope Application Note 01 The FFT, Leakage, and Windowing

1. Clearly circle one answer for each part.

Nonuniform multi level crossing for signal reconstruction

Chapter 16. Waves and Sound

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Here are some of Matlab s complex number operators: conj Complex conjugate abs Magnitude. Angle (or phase) in radians

Vibroseis Correlation An Example of Digital Signal Processing (L. Braile, Purdue University, SAGE; April, 2001; revised August, 2004, May, 2007)

Introduction. Chapter Time-Varying Signals

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

L19: Prosodic modification of speech

The Formula for Sinusoidal Signals

Lab 3 FFT based Spectrum Analyzer

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Chapter 2: Signal Representation

Spectrum Analysis: The FFT Display

NOISE ESTIMATION IN A SINGLE CHANNEL

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Sampling and Reconstruction of Analog Signals

Broadband Signal Enhancement of Seismic Array Data: Application to Long-period Surface Waves and High-frequency Wavefields

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Spring 2018 EE 445S Real-Time Digital Signal Processing Laboratory Prof. Evans. Homework #1 Sinusoids, Transforms and Transfer Functions

(i) Understanding of the characteristics of linear-phase finite impulse response (FIR) filters

ANALOGUE TRANSMISSION OVER FADING CHANNELS

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

SGN Audio and Speech Processing

ELT COMMUNICATION THEORY

Outline. Communications Engineering 1

REAL-TIME PROCESSING ALGORITHMS

The quality of the transmission signal The characteristics of the transmission medium. Some type of transmission medium is required for transmission:

X. SPEECH ANALYSIS. Prof. M. Halle G. W. Hughes H. J. Jacobsen A. I. Engel F. Poza A. VOWEL IDENTIFIER

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

CG401 Advanced Signal Processing. Dr Stuart Lawson Room A330 Tel: January 2003

Knowledge Integration Module 2 Fall 2016

Transcription:

Friedrich-Alexander-Universität Erlangen-Nürnberg Lab Course Pitch and Harmonic to Noise Ratio Estimation International Audio Laboratories Erlangen Prof. Dr.-Ing. Bernd Edler Friedrich-Alexander Universität Erlangen-Nürnberg International Audio Laboratories Erlangen Lehrstuhl Semantic Audio Processing Am Wolfsmantel 33, 958 Erlangen bernd.edler@audiolabs-erlangen.de International Audio Laboratories Erlangen A Joint Institution of the Friedrich-Alexander Universität Erlangen-Nürnberg (FAU) and the Fraunhofer-Institut für Integrierte Schaltungen IIS

Authors: Stefan Bayer, Nils Werner, Goran Marković Tutors: Konstantin Schmidt, Goran Marković Contact: Nils Werner, Konstantin Schmidt, Goran Marković Friedrich-Alexander Universität Erlangen-Nürnberg International Audio Laboratories Erlangen Lehrstuhl Semantic Audio Processing Am Wolfsmantel 33, 958 Erlangen nils.werner@audiolabs-erlangen.de konstantin.schmidt@audiolabs-erlangen.de goran.markovic@iis.fraunhofer.de This handout is not supposed to be redistributed. Pitch and Harmonic to Noise Ratio Estimation, c July 7, 27

Lab Course Pitch and Harmonic to Noise Ratio Estimation Abstract Humans easily distinguish between harmonic and noise like components when listening. It is of a great use to do the same in many applications of audio signal processing. By separating harmonic and noise like components we can calculate ratio of their energies, called Harmonic to Noise Ratio (HNR). HNR then describes how harmonic or noise like a signal is. The distinction between harmonic and noise like components is that harmonic components exhibit a periodic structure. The frequency of the repeating period is named the fundamental frequency and is usually denoted as F. The fundamental frequency is closely related to the so called pitch of the source. The pitch is defined as how low or high a harmonic or tone-like source is perceived. Strictly speaking it is a perceptual property and is not necessarily equal to the fundamental frequency. The term pitch is however often used as a synonym for the fundamental frequency and we will use it in this way in the remaining text. The estimation of the pitch and the HNR can be used, together with other information, to efficiently code the signal or to generate a synthetic signal. In this laboratory we will restrict ourselves to speech signals consisting of a single speaker. We will develop simple estimators for both, the pitch and the HNR, and compare the results to state-of-the-art solutions. Pitch Estimation As stated above, we model an audio signal, or more specifically a speech signal, as a mixture of a harmonic signal and a noise signal: s(t) = h(t) + n(t) () where s(t) is the speech signal, h(t) is the harmonic component, and n(t) ist the noise component. For time-discrete signal the equation becomes: s[k] = h[k] + n[k] (2) k being the sample index. In this section we will have a closer look at the harmonic component h(t), which can be expressed as the sum of its partial tones, which are sinusoidals where the frequencies of the individual partial tones are integer multiples of the fundamental frequency F : h(t) = N a n sin (2πnF t + φ n ) (3) n= where a n are the individual amplitudes and φ n are the phases for the individual partial tones. This model assumes that the F, a n and φ n stay constant. In real world signals, especially in speech, the amplitudes and the fundamental frequency are slowly changing over time. To take this into account, we compart the signals into small enough time sections that we may assume to be quasi-stationary. So the first step towards a pitch estimation is to divide the signal into small enough blocks. The length of the blocks is determined by the lowest pitch we want to detect. In addition, for most algorithms, at least two periods of the harmonic component should be contained within one block to give a reliable estimate. Table gives a rough overview of the pitch ranges in human speech. The simplest pitch estimation method can be implemented using the zero crossings of the signal. Although this method is very efficient, it is not well suited if higher partials have big amplitudes

lower limit upper limit male 75 Hz 5 Hz female 25 Hz 25 Hz child 6 Hz Table : Typical fundamental frequencies in human speech or if the noise component is very strong. Most pitch algorithms are based on other methods; for a simple overview go to []. In this laboratory we will develop an estimation algorithm based on the autocorrelation [2]. For discrete time wide-sense stationary ergodic signals the autocorrelation is defined as: R xx [l] = lim N 2N + N k= N x[k]x[k l] (4) where l is the so called pitch lag. We only consider positive lags since the resulting autocorrelation sequence is symmetric around l =. This definition assumes stationarity of the signal and is not practical, as we can deal only with signals of finite length. Thus we estimate the autocorrelation on a block of N : R xx [l] = N k=l x[k]x[k l] (5) and call it biased autocorrelation estimate. Replacing N with N l we obtain unbiased autocorrelation estimate: R xx [l] = x[k]x[k l] (6) N l In contrast to the biased autocorrelation, the unbiased takes the decreasing number of involved in the summation into account. The difference between the biased and the unbiased autocorrelation is demonstrated at Figure - the biased tapers off towards high lags. When we include in the autocorrelation equations our assumption that the signal is periodic with a periodicity T = f s /F : k=l x[k] x[k + mt ], m Z (7) we see that for such a signal we can expect local maxima of the autocorrelation sequence for lags that are a multiple of T. By finding the maximum of the autocorrelation we get an estimate of the fundamental frequency. Note that the autocorrelation function always has a maximum at l =, so to not erroneously detect the zero lag as maximum, it is wise to restrict the search within lags that correspond to the upper and lower limits of the fundamental frequency range under consideration. The global maximum might not be at the lag corresponding to the true fundamental frequency but can possibly be an integer multiple of it. Due to this, the maximum can jump in consecutive frame between lags corresponding to multiples of T leading also to jumps in the F -estimate. These effects are called octave-jumps. For a more robust estimation this must be taken into account.

.4.2 time sequence.2. autocorrelation sequence biased unbiased.2.4 2 4 6. 2 4 6 lag Figure : Comparison of the biased and unbiased autocorrelation sequence for a periodic signal (part of a vowel of a male speaker). Homework Excercise Pitch estimation: Theory. Given is the time sequence x[k] = {4, 2, 3,, 5, }. Calculate both the biased and unbiased autocorrelation sequences using pen and paper. Sketch the time and the autocorrelation sequence. 2. Calculate the necessary block length (both in ms and in for a sampling frequency of f s = 6Hz) for an autocorrelation based pitch estimator that should detect typical pitches for human speech as given in table. 3. Calculate the minimum and maximum lag in the autocorrelation domain for said estimator for the desired F range. 4. What is R xx [] equal to? 5. What is the relationship between the autocorrelation and the power spectral density (PSD)? 6. Think about strategies to avoid octave jumps and errors in the autocorrelation based pitch estimation.

Time Sequence 5 Fourier Transform Harmonic 2 4 6 5 2 3 2 Noise 2 4 6 2 2 3 4 H+N 2 2 4 6 2 2 3 Figure 2: Example of a signal consisting of a harmonic part and a noise part. 2 Harmonic to Noise Ratio Estimation For a signal that can be represented using the equation 2, we define the Harmonic to Noise Ratio (HNR) as the ratio of the component energies: k= HNR = h[k]2 (8) k= n[k]2 As for the pitch estimation, we assume that the energies of the components are slowly changing and that they are almost constant over small enough blocks. However, for a real world signal neither h[k] nor n[k] are known. For example, in figure 2 in both time sequence and Fourier transformed representation, there is no clear distinction between the harmonic and the noise components in the mixture. Thus we have to find an estimation of the HNR. To find an estimation we assume that: h[k] and n[k] are uncorrelated we already know F n[k] is white Gaussian noise Inserting the equation 2 into the equation 6 we get: R xx [l] = (h[k] + n[k])(h[k l] + n[k l]) (9) N l k=l

For l = T, we expand the equation 9: R xx [T ] = N T ( k=t h[k]h[k T ] + k=t h[k]n[k T ] + k=t h[k T ]n[k] + k=t n[k]n[k T ]) () Under the assumptions from above (no correlation, white noise), the last three sums will be approximately zero, that is: We now insert the approximation of equation 7: R xx [T ] h[k]h[k T ] () N T k=t R xx [T ] h[k]h[k] (2) N T k=t and see that the autocorrelation at lag l = T is approximately the energy of the entire harmonic component. As R xx [] is equal to the energy of the combined signal, we can now estimate the HNR: HNR = R xx [T ] R xx [] R xx [T ]. (3) This estimate of the HNR can be easily implemented. There are many other approaches in time-, frequency- or cepstrum-domain [3]. Feel free to search for them. Homework Excercise 2 Harmonic to Noise Ration: Theory. Why can we assume that the last three sums in equation are approximately zero? 2. Which autocorrelation should be used for the HNR estimation, the biased or the unbiased? Why? 3. Estimate the HNR for the sequence given in home work part using the calculated autocorrelation and the estimation of equation 3 (Hint: take the position of the first maximum of the autocorrelation as T ). If the result seems to be not in line with the theory find an explanation for that. 4. Search for or think about other possibilities to estimate the HNR. 3 The Experiment 3. Matlab based estimation The Matlab directory contains stubs for the F estimation function and the HNR estimation function called f_estimation.m and hnr_estimation.m. Furthermore for the evaluation of the pitch estimation against a given reference, a GUI called APLab_pitch.m exists. A screenshot of the GUI can be seen in figure 3. A similar GUI for the HNR estimation exists, called APLab_hnr.m. The subdirectory audiofiles contains several example audio files, you can bring your own files. Additionally, the GUIs allow to make recordings on the fly.

Figure 3: Screenshot of the Matlab GUI for comparing the implemented pitch estimation against the given reference.

3.2 Exercises Lab Experiment Pitch Estimation: Instructions. Create a new file and implement the autocorrelation of equations 5 and 6 as Matlab functions and compare the results for different signals to the Matlab function xcorr(). If the results differ, find an explanation for the difference. 2. Implement a first version of the F -estimator in the existing f estimate.m. Let the comments in f estimate.m guide you. 3. Compare the results using the APLab pitch GUI to the results of the reference F estimator. Tip: F plot may be zoomed in. 4. Implement a refinement to reduce octave errors and jumps. 5. Compare the results using the APLab pitch GUI to the results of the reference F estimator. 6. Explain your solution. side note: Be careful that Matlab indexing starts from.

Lab Experiment 2 Harmonic to Noise Ratio Estimation: Instructions. Implement the HNR estimation derived in section 2 within the existing HNR estimate.m. For this use the already implemented functions for the autocorrelation and follow the comments in HNR estimate.m. 2. Load the files vowel.wav and fricative.wav into the Matlab workspace. Calculate the pitch and the HNR estimates for both signals using your implementations (Fs=6) on the complete items. Note that for this exercise you should not use the APLab HNR tool. 3. Compare your implementation of the HNR estimate to the reference using the APLab HNR tool. Compare using different input files. 4. If your HNR estimates differ a lot from the reference, investigate the cause. (Hint: plotting is helpful) side note: Notice that HNR estimate has as a parameter F. HNR estimate is not using f estimate.m implemented in the first part nor is F obtained using f estimate.m. side note: Think about validity of the value of F. References [] Wikipedia. Pitch detection algorithm. [Online]. Available: https://en.wikipedia.org/wiki/ Pitch estimation [2]. Autocorrelation. [Online]. Available: https://en.wikipedia.org/wiki/autocorrelation [3]. Cepstrum. [Online]. Available: https://en.wikipedia.org/wiki/cepstrum