Lecture 6: Nonspeech and Music

Similar documents
Lecture 6: Nonspeech and Music. Music & nonspeech

Lecture 6: Nonspeech and Music

Lecture 5: Sinusoidal Modeling

Sound Synthesis Methods

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

E : Lecture 8 Source-Filter Processing. E : Lecture 8 Source-Filter Processing / 21

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Synthesis Techniques. Juan P Bello

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Advanced audio analysis. Martin Gasser

SOUND SOURCE RECOGNITION AND MODELING

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Lecture 5: Speech modeling. The speech signal

EE482: Digital Signal Processing Applications

Lecture 6: Speech modeling and synthesis

L19: Prosodic modification of speech

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

University of Washington Department of Electrical Engineering Computer Speech Processing EE516 Winter 2005

Between physics and perception signal models for high level audio processing. Axel Röbel. Analysis / synthesis team, IRCAM. DAFx 2010 iem Graz

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Tempo and Beat Tracking

Complex Sounds. Reading: Yost Ch. 4

Overview of Code Excited Linear Predictive Coder

Digital Speech Processing and Coding

INTRODUCTION TO COMPUTER MUSIC. Roger B. Dannenberg Professor of Computer Science, Art, and Music. Copyright by Roger B.

Signal Analysis. Peak Detection. Envelope Follower (Amplitude detection) Music 270a: Signal Analysis

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

ALTERNATING CURRENT (AC)

NAME STUDENT # ELEC 484 Audio Signal Processing. Midterm Exam July Listening test

COM325 Computer Speech and Hearing

CS 591 S1 Midterm Exam

Tempo and Beat Tracking

Linear Frequency Modulation (FM) Chirp Signal. Chirp Signal cont. CMPT 468: Lecture 7 Frequency Modulation (FM) Synthesis

FFT analysis in practice

CMPT 468: Frequency Modulation (FM) Synthesis

Professorial Inaugural Lecture, 26 April 2001 DIGITAL SYNTHESIS OF MUSICAL SOUNDS. B.T.G. Tan Department of Physics National University of Singapore

Speech Synthesis using Mel-Cepstral Coefficient Feature

INTRODUCTION TO COMPUTER MUSIC SAMPLING SYNTHESIS AND FILTERS. Professor of Computer Science, Art, and Music

Converting Speaking Voice into Singing Voice

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Applications of Music Processing

Digitalising sound. Sound Design for Moving Images. Overview of the audio digital recording and playback chain

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Music Signal Processing

EE 225D LECTURE ON MEDIUM AND HIGH RATE CODING. University of California Berkeley

Physics 101. Lecture 21 Doppler Effect Loudness Human Hearing Interference of Sound Waves Reflection & Refraction of Sound

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Music 171: Amplitude Modulation

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Toward Automatic Transcription -- Pitch Tracking In Polyphonic Environment

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

B.Tech III Year II Semester (R13) Regular & Supplementary Examinations May/June 2017 DIGITAL SIGNAL PROCESSING (Common to ECE and EIE)

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Speech Signal Analysis

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Since the advent of the sine wave oscillator

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Speech Synthesis; Pitch Detection and Vocoders

SGN Audio and Speech Processing

Enhanced Waveform Interpolative Coding at 4 kbps

Lecture 9: Time & Pitch Scaling

Formant Synthesis of Haegeum: A Sound Analysis/Synthesis System using Cpestral Envelope

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Spectrum. Additive Synthesis. Additive Synthesis Caveat. Music 270a: Modulation

Sound waves. septembre 2014 Audio signals and systems 1

Music 270a: Modulation

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Falcon Singles - Oud for Falcon

A Look at Un-Electronic Musical Instruments

Distortion products and the perceived pitch of harmonic complex tones

TURN2ON BLACKPOLE STATION POLYPHONIC SYNTHESIZER MANUAL. version device by Turn2on Software

Outline. Communications Engineering 1

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark

JOURNAL OF OBJECT TECHNOLOGY

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

REpeating Pattern Extraction Technique (REPET)

HCS 7367 Speech Perception

Transcription of Piano Music

Developing a Versatile Audio Synthesizer TJHSST Senior Research Project Computer Systems Lab

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Laboratory Assignment 4. Fourier Sound Synthesis

Acoustics and Fourier Transform Physics Advanced Physics Lab - Summer 2018 Don Heiman, Northeastern University, 1/12/2018

POLYPHONIC PITCH DETECTION BY MATCHING SPECTRAL AND AUTOCORRELATION PEAKS. Sebastian Kraft, Udo Zölzer

SGN Audio and Speech Processing

Communications Theory and Engineering

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Plaits. Macro-oscillator

Linguistic Phonetics. Spectral Analysis

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

Transcription:

EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music 1 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis <dpwe@ee.columbia.edu> http://www.ee.columbia.edu/~dpwe/e682/ Columbia University Dept. of Electrical Engineering Spring 26 E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-1

1 Music & nonspeech What is nonspeech? - according to research effort: a little music - in the world: most everything high speech music Information content low wind & water natural animal sounds attributes? contact/ collision Origin machines & engines man-made E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-2

Sound attributes Attributes suggest model parameters What do we notice about general sound? - psychophysics: pitch, loudness, timbre - bright/dull; sharp/soft; grating/soothing - sound is not abstract : tendency is to describe by source-events Ecological perspective - what matters about sound is what happened our percepts express this more-or-less directly E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-3

Motivations for modeling Describe/classify - cast sound into model because want to use the resulting parameters Store/transmit - model implicitly exploits limited structure of signal Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-4

Analysis and synthesis Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound Can exist apart: - analysis for classification - synthesis of artificial sounds Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-5

Outline 1 2 3 4 5 Music and nonspeech Environmental sounds - Collision sounds - Sound textures Music synthesis techniques Sinewave synthesis Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-6

2 Environmental Sounds Where sound comes from: mechanical interactions - contact / collisions - rubbing / scraping - ringing / vibrating Interest in environmental sounds - carry information about events around us.. including indirect hints - need to create them in virtual environments.. including soundtracks Approaches to synthesis - recording / sampling - synthesis algorithms E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-7

(from Gaver 1993) Collision sounds Factors influencing: - colliding bodies: size, material, damping - local properties at contact point (hardness) - energy of collision Source-filter model - source = excitation of collision event (energy, local properties at contact) - filter = resonance and radiation of energy (body properties) Variety of strike/scraping sounds - resonant freqs ~ size/shape - damping ~ material - HF content in excitation/strike ~ mallet, force t f E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-8

Sound textures What do we hear in: - a city street - a symphony orchestra How do we distinguish: - waterfall - rainfall - applause - static Applause4 Rain1 5 5 4 4 freq / Hz 3 2 freq / Hz 3 2 1 1 1 2 3 4 time / s 1 2 3 4 time / s Levels of ecological description... E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-9

Sound texture modeling (Athineos) Model broad spectral structure with LPC - could just resynthesize with noise Model fine temporal structure in residual with linear prediction in time domain y[n] Sound TD-LP y[n] = ia i y[n-i] Σ Per-frame spectral parameters e[n] Whitened residual DCT E[k] Residual spectrum - precise dual of LPC in frequency - poles model temporal events Temporal envelopes (4 poles, 256ms) FD-LP E[k] = Σib i E[k-i] Per-frame temporal envelope parameters.2 -.2 amplitude.3.2.1.5.1.15.2.25 time / sec Allows modification / synthesis? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-1

Outline 1 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques - Framework - Historical development Sinewave synthesis Music analysis elements? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-11

3 Music synthesis techniques What is music? - could be anything flexible synthesis needed! Key elements of conventional music - instruments note-events (time, pitch, accent level) melody, harmony, rhythm - patterns of repetition & variation Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-12

The nature of musical instrument notes Characterized by instrument (register), note, loudness/emphasis, articulation... Frequency 4 3 2 1 Piano 4 3 2 1 Violin Frequency 1 2 3 4 Time Clarinet 4 3 2 1 1 2 3 4 Time Trumpet 4 3 2 1 1 2 3 4 Time distinguish how? 1 2 3 4 Time E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-13

Development of music synthesis Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) Earliest computer systems in 196s (voice synthesis, algorithmic) Pure synthesis approaches: - 197s: Analog synths - 198s: FM (Stanford/Yamaha) - 199s: Physical modeling, hybrids Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-14

Analog synthesis The minimum to make an interesting sound Trigger Pitch + Vibrato + Cutoff freq Envelope t Oscillator t Filter f + Gain Sound Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based Result: - time-varying spectrum, independent pitch E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-15

FM synthesis Fast frequency modulation sidebands: cos( ω c t + βsin( ω m t) ) phase modulation - a harmonic series if ω c = r ω m J n (β) is a Bessel function: = n = J n ( β) cos( ( ω c + nω m )t) 1 J J 1 J2 J 3 J4.5 J n (β) for β < n - 2 -.5 1 2 3 4 5 6 7 8 9 modulation index β Complex harmonic spectra by varying β 4 ω c ω m = 2Hz = 2Hz freq / Hz 3 2 1 what use?.1.2.3.4.5.6.7.8 time / s E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-16

Sampling synthesis Resynthesis from real notes vary pitch, duration, level Pitch: stretch (resample) waveform.2.1.2.2.1 -.1 596 Hz 894 Hz.1 -.2.1.2 time -.1 -.2.2.4.6.8 time / s -.1 -.2.2.4.6.8 time / s Duration: loop a sustain section.2.1.174.176 -.1.24.26.2.1 -.1.2.1 -.1 -.2 -.2.1.2.3 time / s.1.2.3 time / s Level: cross-fade different examples Soft mix veloc -.2 -.2.5.1.15 time / s.5.1.15 time / s.2.1 -.1 - need to line up source samples Loud good & bad? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-17

Outline 1 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis (detail) - Sinewave modeling - Sines + residual... Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-18

4 Sinewave synthesis If patterns of harmonics are what matter, why not generate them all explicitly: sn [ ] = A k [ n] cos( k ω [ n] n) k - particularly powerful model for pitched signals Analysis (as with speech): - find peaks in STFT S[ω,n] & track - or track fundamental ω (harmonics / autoco) freq / Hz & sample STFT at k ω set of A k [n] to duplicate tone: 8 6 2 mag 4 1 2.2 5.1.5.1.15.2 time / s freq / Hz time / s Synthesis via bank of oscillators E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-19

Steps to sinewave modeling - 1 The underlying STFT: N 1 X[ k, n ] = xn+ n n = [ ] wn [ ] exp j 2πkn ------------ N What value for N (FFT length & window size)? What value for H (hop size: n = r H, r =, 1, 2...)? STFT window length determines freq. resol n: X w ( e jω ) = X( e jω ) W( e jω ) Choose N long enough to resolve harmonics 2-3x longest (lowest) fundamental period - e.g. 3-6 ms = 48-96 samples @ 16 khz - choose H N/2 N too long lost time resolution - limits sinusoid amplitude rate of change E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-2 *

Steps to sinewave modeling - 2 level / db 2 1-1 -2 Choose candidate sinusoids at each time by picking peaks in each STFT frame: freq / Hz level / db 8 6 4 2.2.4.6.8.1.12.14.16.18 2-2 -4 Quadratic fit for peak: time / s -6 1 2 3 4 5 6 7 freq / Hz y ab 2 /4 b/2 y = ax(x-b) x phase / rad -5-1 4 6 8 freq / Hz 4 6 8 freq / Hz + linear interpolation of unwrapped phase E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-21

Steps to sinewave modeling - 3 Which peaks to pick? Want true sinusoids, not noise fluctuations - prominence threshold above smoothed spec. level / db 2-2 -4-6 1 2 3 4 5 6 7 freq / Hz Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time compare with adjacent time frames to test? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-22

Steps to sinewave modeling - 4 Grow tracks by appending newly-found peaks to existing tracks: freq existing tracks birth death new peaks time - ambiguous assignments possible Unclaimed new peak - birth of new track - backtrack to find earliest trace? No continuation peak for existing track - death of track - or: reduce peak threshold for hysteresis E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-23

Resynthesis of sinewave models freq / Hz level 3 2 1 7 6 5 After analysis, each track defines contours in frequency, amplitude f k [n], A k [n] (+ phase?) - use to drive a sinewave oscillators & sum up A k [n].5.1.15.2 n f k [n] A k [n] cos(2πf k [n] t) time / s.5.1.15.2 time / s Regularize to exactly harmonic f k [n] = k f [n] 3 2 1-1 -2-3 freq / Hz 6 4 2.5.1.15.2 time / s freq / Hz 7 65 6 55.5.1.15.2 time / s what to do? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-24

Modification in sinewave resynthesis Change duration by warping timebase - may want to keep onset unwarped 5 freq / Hz 4 3 2 1.5.1.15.2.25.3.35.4.45.5 time / s Change pitch by scaling frequencies - either stretching or resampling envelope level / db 4 3 2 1 1 2 3 4 freq / Hz level / db 4 3 2 1 1 2 3 4 freq / Hz Change timbre by interpolating params E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-25

Sinusoids + residual Only prominent peaks became tracks - remainder of spectral energy was noisy? model residual energy with noise How to obtain non-harmonic spectrum? - zero-out spectrum near extracted peaks? - or: resynthesize (exactly) & subtract waveforms e s [ n] = sn [ ] A k [ n] cos( 2πn f k [ n] ) k mag / db 2-2 -4.. must preserve phase! sinusoids original -6-8 LPC 1 2 3 4 5 6 7 freq / Hz residual Can model residual signal with LPC flexible representation of noisy residual E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-26

Sinusoids + noise + transients Sound represented as sinusoids and noise: sn [ ] = A k [ n] cos( 2πn f k [ n] ) + k Sinusoids Parameters are {A k [n], f k [n]}, h n [n] h n [ n] bn [ ] * Residual e s [ n] freq / Hz 8 6 4 2.2.4.6 time / s 8 6 4 2 8 6 4 2.2.4.6 {A k [n], f k [n]} h n [n] Separate out abrupt transients in residual? e s [ n] = t k [ n] + h n [ n] b' [ n] k * - more specific more flexible E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-27

Outline 1 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis - Instrument identification - Pitch tracking E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-28

5 Music analysis What might we want to get out of music? Instrument identification - different levels of specificity - registers within instruments Score recovery - transcribe the note sequence - extract the performance Ensemble performance - gestalts : chords, tone colors Broader timescales - phrasing & musical structure - artist / genre clustering and classification E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-29

Instrument identification Research looks for perceptual timbre space dull procedure? low attack bright low flux hi flux hi attack Cues to instrument identification - onset (rise time), sustain (brightness) Hierarchy of instrument families - strings / reeds / brass - optimize features at each level E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-3

Pitch tracking Fundamental frequency ( pitch) is a key attribute of musical sounds pitch tracking as a key technology Pitch tracking for speech - voice pitch & spectrum highly dynamic - speech is voiced and unvoiced ground truth? Applications - voice coders (excitation description) - harmonic modeling E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-31

Pitch tracking for music 4 Pitch in music - pitch is more stable (although vibrato) - but: multiple pitches Frequency 3 2 1??.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time Applications - harmonic modeling - music transcription ( storage, resynthesis) - source separation Approaches: place & time E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-32

Meddis & Hewitt pitch model Autocorrelation (time) based pitch extraction - fundamental period peak(s) in autocorrelation xt () xt ( + T) r xx ( T ) = xt ()xt ( + T) max Waveform x[n].2.1 -.1 -.2 2 4 6 8 1 time / samples Autocorrelation r xx [l] 3 2 1-1 2 4 6 8 1 lag / samples Compute separately in each frequency band & summarize across (perceptual) channels Periodicity detection CF / Hz 4 Autocorrelogram sound Bandpass filters Rectification & low-pass filter 1924 866 328 Cross-channel sum Summary ACG 8 1 2.5 5. 7.5 1. 12.5 lag / ms E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-33

sound Tolonen & Karjalainen simplification Multiple frequency channels can have different dominant pitches... But equalizing (flattening) the spectrum works: Prewhitening Highpass @ 1kHz Lowpass @ 1kHz Rectify & low-pass Summary AC as a function of time: Periodogram for M/F voice mix f/hz 1 4 Periodicity detection Periodicity detection + SACF enhance ESACF 2 1 5..1.2.3.4.5.6.7.8 Summary autocorrelation at t=.775 s time/s lag vs. freq? 2 Hz (.5s) 125 Hz (.8s).1.2.3.4.6.1.2 lag/s - Enhancement = cancel subharmonics E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-34

Post-processing of pitch tracks Remove outliers with median filtering 5-pt median Octave errors are common: - if x(t) x(t + T ) then x(t) x(t + 2T ) etc. dynamic programming/hmm time Validity - is there a pitch at this time? - voiced/unvoiced decision for speech Event detection - when does a pitch slide indicate a new note? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-35

Summary Nonspeech audio - i.e. sound in general - characteristics: ecological Music synthesis - control of pitch, duration, loudness, articulation - evolution of techniques - sinusoids + noise + transients Music analysis - different aspects: instruments, pitches, performance and beyond? E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-36