Onset detection and Attack Phase Descriptors. IMV Signal Processing Meetup, 16 March 2017

Similar documents
Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Using Audio Onset Detection Algorithms

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Real-time beat estimation using feature extraction

Tempo and Beat Tracking

Music Signal Processing

Tempo and Beat Tracking

PULSAR DUAL LFO OPERATION MANUAL

Rhythm Analysis in Music

Evaluating Input Devices for Dance Research

Rhythm Analysis in Music

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005

Valve vs. solid-state microphone preamplifier: a comparative study

VK-1 Viking Synthesizer

Quick Start. Overview Blamsoft, Inc. All rights reserved.

What is Sound? Part II

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Advanced Music Content Analysis

MULTI-FEATURE MODELING OF PULSE CLARITY: DESIGN, VALIDATION AND OPTIMIZATION

thank you for choosing the Vengeance Producer Suite: Multiband Sidechain (which will be abbreviated to VPS MBS throughout this document).

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Table of Contents. Owner s Manual. 1. Overview & Getting Started. 2. Engines. 3. FX Modules. 4. Rhythms. 5. Flux. 6. X-Y Pad & Macros. 7.

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

BoomTschak User s Guide

Applications of Music Processing

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

MUS 302 ENGINEERING SECTION

Onset Detection Revisited

The Logic Pro ES1 Synth vs. a Simple Synth

TWO-DIMENSIONAL FOURIER PROCESSING OF RASTERISED AUDIO

Contents. MIDI Test Additional Setup Latency On to Making Music... 41

Pre- and Post Ringing Of Impulse Response

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Lecture 5: Pitch and Chord (1) Chord Recognition. Li Su

Additional Reference Document

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

SGN Audio and Speech Processing

EQ Uncovered: Demo Chapter

Rhythm Analysis in Music

8.3 Basic Parameters for Audio

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

SOUND SOURCE RECOGNITION AND MODELING

Mixer Section. Sample & Hold (S\H) Section MIXER S\H

Singing Expression Transfer from One Voice to Another for a Given Song

Music Technology Advanced Unit 4: Analysing and Producing

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones

Audio Engineering Society. Convention Paper. Presented at the 117th Convention 2004 October San Francisco, CA, USA

Aalto Quickstart version 1.1

COMP 546, Winter 2017 lecture 20 - sound 2

Deep learning architectures for music audio classification: a personal (re)view

Signal Processing First Lab 20: Extracting Frequencies of Musical Tones

Energy-Weighted Multi-Band Novelty Functions for Onset Detection in Piano Music

Complex Sounds. Reading: Yost Ch. 4

A SEGMENTATION-BASED TEMPO INDUCTION METHOD

Rainbow is copyright (c) 2000 Big Tick VST Plugin-In Technology by Steinberg. VST is a trademark of Steinberg Soft- und Hardware GmbH

Transient Capture Andy Cogbill,

What is an EQ? Subtract Hz to fix a problem Add Hz to cover up / hide a problem

Audio Watermarking Based on Music Content Analysis: Robust against Time Scale Modification

AUTOMATED MUSIC TRACK GENERATION

COMPARING ONSET DETECTION & PERCEPTUAL ATTACK TIME

CHORD DETECTION USING CHROMAGRAM OPTIMIZED BY EXTRACTING ADDITIONAL FEATURES

SGN Audio and Speech Processing

Introduction to 4Dyne

I personally hope you enjoy this release and find it to be an inspirational addition to your musical toolkit.

Music 171: Sinusoids. Tamara Smyth, Department of Music, University of California, San Diego (UCSD) January 10, 2019

ALM-015 Akemie s Taiko. - Operation Manual -

A-120 VCF Introduction. doepfer System A VCF 1 A-120

A Database of Anechoic Microphone Array Measurements of Musical Instruments

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

ROBUST MULTIPITCH ESTIMATION FOR THE ANALYSIS AND MANIPULATION OF POLYPHONIC MUSICAL SIGNALS

Comparing the Timing of Movement Events for Air-Drumming Gestures

ZERO-G WHOOSH DESIGNER USER MANUAL

Math and Music: Understanding Pitch

SurferEQ 2. User Manual. SurferEQ v Sound Radix, All Rights Reserved

5.3 EQ & Filter Sample EQ The Loop Menu Interface Main Page Loop Controls Volume Envelopes...

Introduction to Equalization

GEN/MDM INTERFACE USER GUIDE 1.00

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

Atlantis Manual. Atlantis. Dual Oscillator Subtractive Synth Voice. Manual Revision:

RS380 MODULATION CONTROLLER

MSc Project Report. A Design of a Digital, Parameter-automated, Dynamic Range Compressor. Name: Dimitrios Giannoulis. Student No.

Lab 18 Delay Lines. m208w2014. Setup. Delay Lines

Drum Transcription Based on Independent Subspace Analysis

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

Noise Engineering. Basimilus Iteritas Alter. Analog-inspired parameterized drum synthesizer

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

2. Experiment with your basic ring modulator by tuning the oscillators to see and hear the output change as the sound is modulated.

Changing the pitch of the oscillator. high pitch as low as possible, until. What can we do with low pitches?

GRM TOOLS CLASSIC VST

Project Two - Building a complete song

Many powerful new options were added to the MetaSynth instrument architecture in version 5.0.

Brainwave Entrainment Techniques

BELGRAD. dual peak multimode state variable filter. Model of operator s manual rev. 1976/1.0

Written by Jered Flickinger Copyright 2017 Future Retro

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Transcription:

Onset detection and Attack Phase Descriptors IMV Signal Processing Meetup, 16 March 217

I Onset detection VS Attack phase description I MIREX competition: I Detect the approximate temporal location of new onsets in an audio file. I Algoritims are compared against manual expert annotation (which is inherently imprecise). I False positives and false negatives are penalized I Attack phase description I What are the slient time points in the beginning of this sound event? I What are the relations between these time points? I Paper: I K. Nymoen, A. Danielsen and J. London: Attack Phase Descriptor Estimation in Matlab toolboxes. Submitted for SMC217, Helsinki. I Comparing the MIRtoolbox (Lartillot) and the Timbre Toolbox (Peeters)

onset detection Audio waveform.8.6.4 amplitude.2 -.2 -.4 -.6 -.8 1 2 3 4 5 6 7 time (s) 1 Onset curve (Envelope) amplitude.9.8.7.6.5.4.3.2.1 1 2 3 4 5 6 7 time (s)

are these really onsets? 1 Onset curve (Envelope) amplitude.9.8.7.6.5.4.3.2.1 1 2 3 4 5 6 7 time (s)

are these really onsets? 1 Onset curve (Envelope) amplitude.9.8.7.6.5.4.3.2.1 1 2 3 4 5 6 7 time (s) I What would our research question typically be when using this function in the MIRtoolbox? I Segmentation I Melody I Rhythm(?) I Microrhythm

are these really onsets? 1 Onset curve (Envelope) amplitude.9.8.7.6.5.4.3.2.1 1 2 3 4 5 6 7 time (s) I What would our research question typically be when using this function in the MIRtoolbox? I Segmentation I Melody I Rhythm(?) I Microrhythm I Are we interested in onsets, or rather perceived moments of metrical alignment?

Salient time points in the initial phase of a sonic event Perceptual Attack Energy peak Perceptual Onset Physical Onset

Salient time points in the initial phase of a sonic event Perceptual Attack Energy peak Perceptual Onset Physical Onset

Salient time points in the initial phase of a sonic event Perceptual Attack Energy peak Perceptual Onset Physical Onset Schae er (196x) Gordon (1987) Collins (26) Wright (28) Villing (21)

Attack phase descriptors Perceptual Attack Energy peak Temporal centroid Perceptual Onset Attack Slope Attack Leap Physical Onset Attack time Rise time Log-Attack Time = log(attack time) I Time points I Time spans I Energy spans I (Energy points)

Attack phase descriptors (our definitions) Name Type Description Physical onset phtp Time point where the sound energy first rises from. Perceptual onset petp Time point when the sound event becomes audible. Perceptual attack petp Time point perceived as the rhythmic emphasis of the sound. Energy peak phtp Time point when the energy envelope reaches its maximum value. Rise time phts Time span between physical onset and energy peak. Attack time pets Time span between perceptual onset and perceptual attack. Log-Attack Time phts The base 1 logarithm of attack time. Attack slope pees Weighted average of the energy envelope slope in the attack phase. Attack leap pees The di erence between energy level at perceptual attack and perceptual onset. Temporal centroid phtp The temporal barycentre of the sound event s energy envelope.

Attack phase descriptors (our definitions) Name Type Description Physical onset phtp Time point where the sound energy first rises from. Perceptual onset petp Time point when the sound event becomes audible. Perceptual attack petp Time point perceived as the rhythmic emphasis of the sound. Energy peak phtp Time point when the energy envelope reaches its maximum value. Rise time phts Time span between physical onset and energy peak. Attack time pets Time span between perceptual onset and perceptual attack. Log-Attack Time phts The base 1 logarithm of attack time. Attack slope pees Weighted average of the energy envelope slope in the attack phase. Attack leap pees The di erence between energy level at perceptual attack and perceptual onset. Temporal centroid phtp The temporal barycentre of the sound event s energy envelope. Log-Attack Time is a commonly used descriptor in the MIR community. No consensus: some use physical descriptors, some use perceptual, and some use a combination to estimate it.

Perceptual Attack Energy peak Temporal centroid Perceptual Onset Attack Slope Attack Leap Physical Onset Attack time Rise time Log-Attack Time = log(attack time) I Time points I Time spans I Energy spans I (Energy points)

Attack phase descriptors step 1: Envelope extraction Timbre Toolbox I Apply Hilbert transform to the audio signal, I followed by a 3rd-order Butterworth lowpass filter with cuto frequency at 5 Hz. I No compensation for filter group delay MIRtoolbox I Spectrogram, hanning window, 1ms frame, 1% hop I Envelope = sum of columns in spectrogram.5 -.5 Audio waveform 1.5 Energy Envelope MIRtoolbox (D) MIRtoolbox (A) Timbre Toolbox (D) Timbre toolbox (A).5.1.15.2.25 Time (seconds)

Attack phase descriptors step 2: Salient time steps I Both the MIRtoolbox and the Timbre toolbox provide equvalents to beginning of attack and end of attack. Timbre Toolbox attack phase estimation Effort Function Mean effort.2.1 MIRToolbox attack phase estimation Time derivative Peak position Threshold 1 θ 1 θ 2 Energy envelope Attack start end.5 Energy envelope Attack start end θ 1.2.4.6.8 1 time (seconds).2.4.6.8 1 time (seconds)

Attack phase descriptors step 2: Salient time steps I Both the MIRtoolbox and the Timbre toolbox provide equvalents to beginning of attack and end of attack. Timbre Toolbox attack phase estimation Effort Function Mean effort.2.1 MIRToolbox attack phase estimation Time derivative Peak position Threshold 1 θ 1 θ 2 Energy envelope Attack start end.5 Energy envelope Attack start end θ 1.2.4.6.8 1 time (seconds).2.4.6.8 1 time (seconds) I But are these supposed to reflect physical or perceptual features?

Attack phase descriptors step 3:...

Attack phase descriptors step 3:... I Decide on the definitions of attack phase descriptors, and the methods for extracting salient time points

Into the nitty-gritty Timbre toolbox I NB! Make sure that you download the latest version from github... (Don t trust the CIRMMT link 1st hit on Google which gives you version 1.2 from 23) I My impression: Best used if you need a large range of audio descriptors for a large audio set, and don t want to fiddle with choosing parameters for your functions I Need to dig deep into the code to change the parameters (hard-coded): I I I Lowpass filter cuto frequency value Fix group delay problem

Into the nitty-gritty MIRtoolbox I Quite user-friendly: well documented, easy to access most parameters I mironsets() function - attack option I threshold value is hard-coded I mirgetdata-problem: I uncell(get(a, AttackPosUnit )) I uncell(get(a, PeakPosUnit ))

Perceptual experiment I Task: align a repeated musical sound to a click track. I 17 participants I 9 sound stimuli (8 musical instruments + click) I inter-stimuli-interval of 6 ms I click track and stimuli started with a random o set. I controlling sync using a keyboard and/or a slider on the screen.

Parameter optimisation and perceptual results Time relative to physical onset (in milliseconds) 8 6 4 2-2 -4-6 -8 Bright Snare Piano Drum Dark Piano Kick Drum Fiddle Shaker Synth Bass MIRtoolbox (D) MIRtoolbox (O) Timbre Toolbox (D) Timbre toolbox (O) Perceptual results Arco Click Bass Frame size (seconds).1.2.3.4.5.6.7.8.9.1.11.12.13.14.15 Jaccard index for MIRtoolbox (mean for all sounds).5.1.15.2.25.3.35.4.45.5.55.6.65.7.75 Treshold (fraction of e peak) Toolbox Envelope parameter Threshold parameter Timbre toolbox MIRtoolbox LPfilter cuto frequency Default: 5 Hz Optimised: 37 Hz Frame size Default:.1 s Optimised:.3 s Default: 3 Optimised: 3.75 fraction of e peak Default: 2% Optimised: 7.5%.55.5.45.4.35.3.25.2.15.1 Jaccard Index