Music Signal Processing

Similar documents
Tempo and Beat Tracking

Tempo and Beat Tracking

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

Rhythm Analysis in Music

Rhythm Analysis in Music

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

A SEGMENTATION-BASED TEMPO INDUCTION METHOD

Rhythm Analysis in Music

Applications of Music Processing

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Energy-Weighted Multi-Band Novelty Functions for Onset Detection in Piano Music

Transcription of Piano Music

Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Lecture 3: Audio Applications

SGN Audio and Speech Processing

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Advanced audio analysis. Martin Gasser

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

SGN Audio and Speech Processing

Harmonic Percussive Source Separation

Introduction of Audio and Music

Audio Content Analysis. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

Automatic Evaluation of Hindustani Learner s SARGAM Practice

ROBUST F0 ESTIMATION IN NOISY SPEECH SIGNALS USING SHIFT AUTOCORRELATION. Frank Kurth, Alessia Cornaggia-Urrigshardt and Sebastian Urrigshardt

Musical tempo estimation using noise subspace projections

REpeating Pattern Extraction Technique (REPET)

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

Drum Transcription Based on Independent Subspace Analysis

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

Chapter 12. Preview. Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect. Section 1 Sound Waves

Sound Synthesis Methods

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

MULTI-FEATURE MODELING OF PULSE CLARITY: DESIGN, VALIDATION AND OPTIMIZATION

Onset detection and Attack Phase Descriptors. IMV Signal Processing Meetup, 16 March 2017

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Enhanced Waveform Interpolative Coding at 4 kbps

Deep learning architectures for music audio classification: a personal (re)view

Speech/Music Change Point Detection using Sonogram and AANN

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Survey Paper on Music Beat Tracking

Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

Converting Speaking Voice into Singing Voice

Single-channel Mixture Decomposition using Bayesian Harmonic Models

REAL-TIME BEAT-SYNCHRONOUS ANALYSIS OF MUSICAL AUDIO

Advanced Music Content Analysis

What is Sound? Part II

EVALUATING THE ONLINE CAPABILITIES OF ONSET DETECTION METHODS

Complex Sounds. Reading: Yost Ch. 4

A CONSTRUCTION OF COMPACT MFCC-TYPE FEATURES USING SHORT-TIME STATISTICS FOR APPLICATIONS IN AUDIO SEGMENTATION

MUSIC is to a great extent an event-based phenomenon for

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Properties and Applications

The psychoacoustics of reverberation

MULTIPLE F0 ESTIMATION IN THE TRANSFORM DOMAIN

describe sound as the transmission of energy via longitudinal pressure waves;

EE482: Digital Signal Processing Applications

Advanced Audiovisual Processing Expected Background

Query by Singing and Humming

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Exploring the effect of rhythmic style classification on automatic tempo estimation

L19: Prosodic modification of speech

Lecture 6: Nonspeech and Music

Lecture 6: Nonspeech and Music. Music & nonspeech

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting


Pitch Estimation of Singing Voice From Monaural Popular Music Recordings

SOUND SOURCE RECOGNITION AND MODELING

Accurate Tempo Estimation based on Recurrent Neural Networks and Resonating Comb Filters

Audio processing methods on marine mammal vocalizations

Speech Signal Analysis

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Using Audio Onset Detection Algorithms

Fundamentals of Music Technology

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

arxiv: v1 [cs.sd] 24 May 2016

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

ROBUST MULTIPITCH ESTIMATION FOR THE ANALYSIS AND MANIPULATION OF POLYPHONIC MUSICAL SIGNALS

FEATURE ADAPTED CONVOLUTIONAL NEURAL NETWORKS FOR DOWNBEAT TRACKING

FIR/Convolution. Visulalizing the convolution sum. Convolution

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Robust Detection of Multiple Bioacoustic Events with Repetitive Structures

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005

COM325 Computer Speech and Hearing

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Pitch Detection Algorithms

Automatic Transcription of Monophonic Audio to MIDI

Real-time beat estimation using feature extraction

ALTERNATING CURRENT (AC)

Transcription:

Tutorial Music Signal Processing Meinard Müller Saarland University and MPI Informatik meinard@mpi-inf.mpg.de Anssi Klapuri Queen Mary University of London anssi.klapuri@elec.qmul.ac.uk

Overview Part I: Pitch and Harmony Part II: Tempo and Beat Coffee Break Part III: Timbre Part IV: Melody

Overview Part II: Tempo and Beat Introduction Onset detection Tempo estimation Beat/pulse estimation Applications

Introduction Basic beat tracking task: Given an audio recording of a piece of music, determine the periodic sequence of beat positions. Tapping the foot when listening to music

Introduction Example: Queen Another One Bites The Dust

Introduction Example: Queen Another One Bites The Dust

Introduction Example: Happy Birthday to you Pulse level: Measure

Introduction Example: Happy Birthday to you Pulse level: Tactus (beat)

Introduction Example: Happy Birthday to you Pulse level: Tatum (temporal atom)

Introduction Example: Chopin Mazurka Op. 68-3 Pulse level: Quarter note Tempo:???

Introduction Example: Chopin Mazurka Op. 68-3 Pulse level: Quarter note Tempo: 50-200 BPM Tempo curve Tempo (BPM) 200 50 Time (beats)

Introduction Example: Borodin String Quartet No. 2 Pulse level: Quarter note Tempo: 120-140 BPM (roughly) Beat tracker without any prior knowledge Beat tracker with prior knowledge on rough tempo range

Introduction Challenges in beat tracking Pulse level often unclear Local/sudden tempo changes (e.g. rubato) Vague information (e.g., soft onsets, extracted onsets corrupt) Sparse information (often only note onsets are used)

Introduction Tasks Onset detection Beat tracking Tempo estimation

Introduction Tasks Onset detection Beat tracking Tempo estimation

Introduction Tasks Onset detection Beat tracking Tempo estimation phase period

Introduction Tasks Onset detection Beat tracking Tempo estimation Tempo := 60 / period Beats per minute (BPM) period

Onset Detection Finding start times of perceptually relevant acoustic events in music signal Onset is the time position where a note is played Onset typically goes along with a change of the signal s properties: energy or loudness pitch or harmony timbre

Onset Detection Finding start times of perceptually relevant acoustic events in music signal Onset is the time position where a note is played Onset typically goes along with a change of the signal s properties: energy or loudness pitch or harmony timbre [Bello et al., IEEE-TASLP 2005]

Onset Detection (Energy-Based) Steps Waveform

Onset Detection (Energy-Based) Steps 1. Amplitude squaring Squared waveform

Onset Detection (Energy-Based) Steps 1. Amplitude squaring 2. Windowing Energy envelope

Onset Detection (Energy-Based) Steps 1. Amplitude squaring 2. Windowing 3. Differentiation Capturing energy changes Differentiated energy envelope

Onset Detection (Energy-Based) Steps 1. Amplitude squaring 2. Windowing 3. Differentiation 4. Half wave rectification Only energy increases are relevant for note onsets Novelty curve

Onset Detection (Energy-Based) Steps 1. Amplitude squaring 2. Windowing 3. Differentiation 4. Half wave rectification 5. Peak picking Peak positions indicate note onset candidates

Onset Detection (Energy-Based) Energy envelope

Onset Detection (Energy-Based) Energy envelope / note onsets positions

Onset Detection Energy curves often only work for percussive music Many instruments such as strings have weak note onsets No energy increase may be observable in complex sound mixtures More refined methods needed that capture changes of spectral content changes of pitch changes of harmony [Bello et al., IEEE-TASLP 2005]

Onset Detection (Spectral-Based) Magnitude spectrogram X Steps: 1. Spectrogram Frequency (Hz) Aspects concerning pitch, harmony, or timbre are captured by spectrogram Allows for detecting local energy changes in certain frequency ranges

Onset Detection (Spectral-Based) Compressed spectrogram Y Steps: 1. Spectrogram 2. Logarithmic compression Frequency (Hz) Y = log( 1+ C X ) Accounts for the logarithmic sensation of sound intensity Dynamic range compression Enhancement of low-intensity values Often leading to enhancement of high-frequency spectrum

Onset Detection (Spectral-Based) Spectral difference Steps: 1. Spectrogram 2. Logarithmic compression 3. Differentiation Frequency (Hz) First-order temporal difference Captures changes of the spectral content Only positive intensity changes considered

Onset Detection (Spectral-Based) Frequency (Hz) Spectral difference Steps: 1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation Frame-wise accumulation of all positive intensity changes Encodes changes of the spectral content t Novelty curve

Onset Detection (Spectral-Based) Steps: 1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation Novelty curve

Onset Detection (Spectral-Based) Novelty curve Substraction of local average Steps: 1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization

Onset Detection (Spectral-Based) Normalized novelty curve Steps: 1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization

Onset Detection (Spectral-Based) Steps: Normalized novelty curve 1. Spectrogram 2. Logarithmic compression 3. Differentiation 4. Accumulation 5. Normalization 6. Peak picking

Onset Detection (Spectral-Based) Logarithmic compression is essential X Frequency (Hz) Novelty curve Ground-truth onsets [Klapuri et al., IEEE-TASLP 2006]

Onset Detection (Spectral-Based) Logarithmic compression is essential Y = log( 1+ C X ) Frequency (Hz) C = 1 Novelty curve Ground-truth onsets [Klapuri et al., IEEE-TASLP 2006]

Onset Detection (Spectral-Based) Logarithmic compression is essential Y = log( 1+ C X ) Frequency (Hz) C = 10 Novelty curve Ground-truth onsets [Klapuri et al., IEEE-TASLP 2006]

Onset Detection (Spectral-Based) Logarithmic compression is essential Y = log( 1+ C X ) Frequency (Hz) C = 1000 Novelty curve Ground-truth onsets [Klapuri et al., IEEE-TASLP 2006]

Onset Detection Peak picking Peaks of the novelty curve indicate note onset candidates

Onset Detection Peak picking Peaks of the novelty curve indicate note onset candidates In general many spurious peaks Usage of local thresholding techniques Peak-picking very fragile step in particular for soft onsets

Onset Detection Shostakovich 2 nd Waltz Borodin String Quartet No. 2

Beat and Tempo What is a beat? Steady pulse that drives music forward and provides the temporal framework of a piece of music Sequence of perceived pulses that are equally spaced in time The pulse a human taps along when listening to the music [Parncutt 1994] [Sethares 2007] [Large/Palmer 2002] [Lerdahl/ Jackendoff 1983] [Fitch/ Rosenfeld 2007] The term tempo then refers to the speed of the pulse.

Beat and Tempo Strategy Analyze the novelty curve with respect to reoccurring or quasiperiodic patterns Avoid the explicit determination of note onsets (no peak picking)

Beat and Tempo Strategy Analyze the novelty curve with respect to reoccurring or quasiperiodic patterns Avoid the explicit determination of note onsets (no peak picking) Methods [Scheirer, JASA 1998] [Ellis, JNMR 2007] Comb-filter methods Autocorrelation Fourier transfrom [Davies/Plumbley, IEEE-TASLP 2007] [Peeters, JASP 2007] [Grosche/Müller, ISMIR 2009]

Tempogram Definition: A tempogram is a time-tempo representation that encodes the local tempo of a music signal over time. Tempo (BPM) Intensity

Tempogram (Fourier) Definition: A tempogram is a time-tempo represenation that encodes the local tempo of a music signal over time. Fourier-based method Compute a spectrogram (STFT) of the novelty curve Convert frequency axis (given in Hertz) into tempo axis (given in BPM) Magnitude spectrogram indicates local tempo

Tempogram (Fourier) Tempo (BPM) Novelty curve

Tempogram (Fourier) Tempo (BPM) Novelty curve (local section)

Tempogram (Fourier) Tempo (BPM) Windowed sinusoidal

Tempogram (Fourier) Tempo (BPM) Windowed sinusoidal

Tempogram (Fourier) Tempo (BPM) Windowed sinusoidal

Tempogram (Autocorrelation) Definition: A tempogram is a time-tempo represenation that encodes the local tempo of a music signal over time. Autocorrelation-based method Compare novelty curve with time-lagged local sections of itself Convert lag-axis (given in seconds) into tempo axis (given in BPM) Autocorrelogram indicates local tempo

Tempogram (Autocorrelation) Lag (seconds) Novelty curve (local section)

Tempogram (Autocorrelation) Lag (seconds) Windowed autocorrelation

Tempogram (Autocorrelation) Lag (seconds) Lag = 0 (seconds)

Tempogram (Autocorrelation) Lag (seconds) Lag = 0.26 (seconds)

Tempogram (Autocorrelation) Lag (seconds) Lag = 0.52 (seconds)

Tempogram (Autocorrelation) Lag (seconds) Lag = 0.78 (seconds)

Tempogram (Autocorrelation) Lag (seconds) Lag = 1.56 (seconds)

Tempogram (Autocorrelation) Lag (seconds)

Tempogram (Autocorrelation) 30 40 Tempo (BPM) 60 80 120 300

Tempogram (Autocorrelation) 600 500 Tempo (BPM) 400 300 200 100

Tempogram Fourier Autocorrelation Tempo (BPM)

Tempogram Fourier Autocorrelation Tempo (BPM) 210 70 Tempo@Tatum = 210 BPM Tempo@Measure = 70 BPM

Tempogram Fourier Autocorrelation Tempo (BPM) Emphasis of tempo harmonics (integer multiples) Emphasis of tempo subharmonics (integer fractions) [Peeters, JASP 2007] [Grosche et al., ICASSP 2010]

Tempogram (Summary) Fourier Novelty curve is compared with sinusoidal kernels each representing a specific tempo Convert frequency (Hertz) into tempo (BPM) Reveals novelty periodicities Emphasizes harmonics Suitable to analyze tempo on tatum and tactus level Autocorrelation Novelty curve is compared with time-lagged local (windowed) sections of itself Convert time-lag (seconds) into tempo (BPM) Reveals novelty self-similarities Emphasizes subharmonics Suitable to analyze tempo on tatum and measure level

Beat Tracking Given the tempo, find the best sequence of beats Complex Fourier tempogram contains magnitude and phase information The magnitude encodes how well the novelty curve resonates with a sinusoidal kernel of a specific tempo The phase optimally aligns the sinusoidal kernel with the peaks of the novelty curve [Peeters, JASP 2005]

Beat Tracking Tempo (BPM) Intensity [Peeters, JASP 2005]

Beat Tracking Tempo (BPM) Intensity [Peeters, JASP 2005]

Beat Tracking Tempo (BPM) Intensity [Peeters, JASP 2005]

Beat Tracking Tempo (BPM) Intensity

Beat Tracking Tempo (BPM) Intensity [Grosche/Müller, IEEE-TASLP 2011]

Beat Tracking Novelty Curve Predominant Local Pulse (PLP) [Grosche/Müller, IEEE-TASLP 2011]

Beat Tracking Novelty Curve Indicates note onset candidates Extraction errors in particular for soft onsets Simple peak-picking problematic Predominant Local Pulse (PLP) Periodicity enhancement of novelty curve Accumulation introduces error robustness Locality of kernels handles tempo variations [Grosche/Müller, IEEE-TASLP 2011]

Beat Tracking Borodin String Quartet No. 2 Tempo (BPM) [Grosche/Müller, IEEE-TASLP 2011]

Beat Tracking Borodin String Quartet No. 2 Strategy: Exploit additional knowledge (e.g. rough tempo range) Tempo (BPM) [Grosche/Müller, IEEE-TASLP 2011]

Applications Feature design (beat-synchronous features, adaptive windowing) Digital DJ / audio editing (mixing and blending of audio material) Music classification Music recommendation Performance analysis (extraction of tempo curves)

Application: Feature Design Fixed window size [Ellis et al., ICASSP 2008] [Bello/Pickens, ISMIR 2005]

Application: Feature Design Fixed window size Adaptive window size [Ellis et al., ICASSP 2008] [Bello/Pickens, ISMIR 2005]

Application: Feature Design Fixed window size (100 ms)

Application: Feature Design Adative window size (roughly 1200 ms) Note onset positions define boundaries

Application: Feature Design Adative window size (roughly 1200 ms) Note onset positions define boundaries Denoising by excluding boundary neighborhoods

Application: Audio Editing (Digital DJ) http://www.mixxx.org/

Application: Beat-Synchronous Light Effects