Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Similar documents
Advanced audio analysis. Martin Gasser

Speech Signal Analysis

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Cepstrum alanysis of speech signals

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Topic 2. Signal Processing Review. (Some slides are adapted from Bryan Pardo s course slides on Machine Perception of Music)

Topic. Filters, Reverberation & Convolution THEY ARE ALL ONE

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Introduction of Audio and Music

Isolated Digit Recognition Using MFCC AND DTW

FFT analysis in practice

Music 171: Amplitude Modulation

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Rotating Machinery Fault Diagnosis Techniques Envelope and Cepstrum Analyses

MUSC 316 Sound & Digital Audio Basics Worksheet

Filter Banks I. Prof. Dr. Gerald Schuller. Fraunhofer IDMT & Ilmenau University of Technology Ilmenau, Germany. Fraunhofer IDMT

SGN Audio and Speech Processing

Complex Sounds. Reading: Yost Ch. 4

Understanding Digital Signal Processing

SGN Audio and Speech Processing

Discrete Fourier Transform (DFT)

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

8.3 Basic Parameters for Audio

Adaptive Filters Application of Linear Prediction

Frequency Domain Representation of Signals

I D I A P. Mel-Cepstrum Modulation Spectrum (MCMS) Features for Robust ASR R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

Definition of Sound. Sound. Vibration. Period - Frequency. Waveform. Parameters. SPA Lundeen

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Speech Synthesis; Pitch Detection and Vocoders

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Lecture 7 Frequency Modulation

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

Discrete Fourier Transform, DFT Input: N time samples

Lab 3 FFT based Spectrum Analyzer

Fourier Series and Gibbs Phenomenon

Lab S-8: Spectrograms: Harmonic Lines & Chirp Aliasing

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois.

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Lecture 3, Multirate Signal Processing

2.1 BASIC CONCEPTS Basic Operations on Signals Time Shifting. Figure 2.2 Time shifting of a signal. Time Reversal.

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Advanced Music Content Analysis

I D I A P. On Factorizing Spectral Dynamics for Robust Speech Recognition R E S E A R C H R E P O R T. Iain McCowan a Hemant Misra a,b

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Figure 1: Block diagram of Digital signal processing

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

Signals, Sound, and Sensation

Advanced Audiovisual Processing Expected Background

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope

DISCRETE FOURIER TRANSFORM AND FILTER DESIGN

Sampling and Reconstruction of Analog Signals

Speech Production. Automatic Speech Recognition handout (1) Jan - Mar 2009 Revision : 1.1. Speech Communication. Spectrogram. Waveform.

Measurement System for Acoustic Absorption Using the Cepstrum Technique. Abstract. 1. Introduction

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Applications of Music Processing

Speech Synthesis using Mel-Cepstral Coefficient Feature

Signal processing preliminaries

Department of Electronic Engineering NED University of Engineering & Technology. LABORATORY WORKBOOK For the Course SIGNALS & SYSTEMS (TC-202)

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

SAMPLING THEORY. Representing continuous signals with discrete numbers

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

y(n)= Aa n u(n)+bu(n) b m sin(2πmt)= b 1 sin(2πt)+b 2 sin(4πt)+b 3 sin(6πt)+ m=1 x(t)= x = 2 ( b b b b

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

International Journal of Engineering and Techniques - Volume 1 Issue 6, Nov Dec 2015

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

Automatic Speech Recognition handout (1)

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Fourier Transform. Any signal can be expressed as a linear combination of a bunch of sine gratings of different frequency Amplitude Phase

Fundamentals of Music Technology

Lecture 5: Sinusoidal Modeling

Rhythm Analysis in Music

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

F I R Filter (Finite Impulse Response)

EE482: Digital Signal Processing Applications

PYKC 27 Feb 2017 EA2.3 Electronics 2 Lecture PYKC 27 Feb 2017 EA2.3 Electronics 2 Lecture 11-2

Introduction to Digital Signal Processing (Discrete-time Signal Processing)

What is Sound? Part II

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Lecture 2 Review of Signals and Systems: Part 1. EE4900/EE6720 Digital Communications

Audio Imputation Using the Non-negative Hidden Markov Model

Limitations of Sum-of-Sinusoid Signals

FIR/Convolution. Visulalizing the convolution sum. Convolution

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005

COM325 Computer Speech and Hearing

COMP 546, Winter 2017 lecture 20 - sound 2

EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM

Audio processing methods on marine mammal vocalizations

DIGITAL SIGNAL PROCESSING CCC-INAOE AUTUMN 2015

Lecture Schedule: Week Date Lecture Title

Lab 4 Fourier Series and the Gibbs Phenomenon

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

ECE438 - Laboratory 7a: Digital Filter Design (Week 1) By Prof. Charles Bouman and Prof. Mireille Boutin Fall 2015

Chapter Two. Fundamentals of Data and Signals. Data Communications and Computer Networks: A Business User's Approach Seventh Edition

Signal Analysis. Young Won Lim 2/9/18

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones

Auditory Based Feature Vectors for Speech Recognition Systems

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW

Transcription:

Topic Spectrogram Chromagram Cesptrogram

Short time Fourier Transform Break signal into windows Calculate DFT of each window

The Spectrogram spectrogram(y,1024,512,1024,fs,'yaxis'); A series of short term DFTs Typically just displays the magnitudes of X from 0 Hz to Nyquist rate

Equal Temperament Octave is a relationship by power of 2. There are 12 half-steps in an octave n number of half-steps from the reference pitch frequency of desired pitch f = 2 12 f ref frequency of the reference pitch

Spiral Pitch representation

Chroma: Many to one Chroma = log2(freq) floor(log2(freq)) Chroma periodic in range 0 to (almost) 1 Chroma map on to pitch classes 900 0.0 800 700 800 Hz frequency 600 500 400 300 400 Hz 0.75 CHROMA 0.25 200 100 200 Hz 100 Hz 0 0 50 100 150 200 250 300 350 time 0.5

Making a Chromagram Decide how to quantize (bin) the chroma range. 12 pitch classes? 120 bins? Equal temperment? Make a spectrogram For each time-step in the spectrogram find the chroma for each frequency from 0 to N/2 Sum the amplitude of all frequencies with the same chroma bin (Some chromagrams also add in the energy from the odd harmonics) Place that value in the chroma bin

Overtone Series Approximate notated pitch for the harmonics (overtones) of a frequency f 2f 3f 4f 5f 6f 7f 8f 9f 10f 11f 12f C C G C E G Bb C D E F# G EECS 352: Machine Perception of Music and Audio Bryan Pardo 2008

A fancier chromagram For complex sounds (like the bassoon example from class) you might want to consider adding up energy from more harmonics than just the octaves (1f, 2f, 4f etc). Try taking the energy from the 3 rd, 5 th and 7 th harmonics as well.

Chromagram of Clarinet C C# D D# E F F# G G# A A# B 100 200 300 400 500 600 700 800 900

Chromagram of Clarinet

Mel Scale Stevens, Volkmann and Newmann (1937) A scale of pitches judged by listeners to be equidistant. The reference point: 1000 mels = 1000 Hz at 40 db SPL Below 500Hz mel ~= hertz Above 1000 Hz mel ~= log(hertz) From: Appleton and Perera, eds., The Development and Practice of Electronic Music, Prentice-Hall, 1975, p. 56; after Stevens and Bryan Pardo, 2008, Northwestern University EECS 352: Machine Davis, Hearing Perception of Music and Audio

Mel Filter Bank Filters spaced equally in the log of the frequency. Mels are (more or less) related to frequency by f f 2595log 1 700 = + mel 10 Edge of each filter = center frequency of adjacent filter Typically, 40 filters are used

Source-Filter Model Source Signal x(t) Filter h(t) Output Signal y(t) x ( t)* h( t) = y( t ) Convolution

The Cepstrum Filtering is Convolution in the time domain A product in the frequency domain What if we want to make it an addition operation? [ ] = [ ] [ ] Y k X k H k [ ] = [ ] [ ] Y k X k H k ( [ ] ) [ ] ( ) ( [ ] ) log Y k = log X k + log H k

The Cepstrum Filtering is Convolution in the time domain A product in the frequency domain What if we want to make it an addition operation? They do this by defining the cepstrum. Cep x (q) = Z 1 (log X (z) ) A frequency representation Quefrency The Inverse Z transform (general case of the Inverse Discrete Fourier Transform)

What is the Cepstrum for? Invented for finding echoes (aftershocks) in seismograph data. If something is useful for finding echoes, it is useful for finding impulse response functions which makes it useful for finding filter coefficients. Let s look at an example

Some terms Spectrum Spectrogram Frequency Filtering Cepstrum Cepstrogram Quefrency Liftering

The Cepstrum Gives information about rate of change in the different quefrency bands. Popular representation for speech and music Distinguishing FILTER from the SIGNAL Some quefrencies represent the filter (what instrument), others represent the signal (what pitch) For these applications, the spectrum is usually first transformed to Mel Frequency bands. Result: Mel Frequency Cepstral Coefficients (MFCC)

Making a Mel Freq Cepstrogram Sample number xn ( ) Sliding Window Signal in jth window s j ( n) DFT Frequency index S ( k) j Mel filter bank Cep () i j Quefrency index DCT log ( χ ( )) j m logarithm χ j ( m) Here DCT = Discrete Cosine Transform Mel filter index

Let s have a look! (Go to bassoon/tuba demo)