Lecture 6: Nonspeech and Music. Music & nonspeech

Size: px

Start display at page:

Download "Lecture 6: Nonspeech and Music. Music & nonspeech"

Arabella Fletcher
5 years ago
Views:

1 EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 26 E682 SAPR - Dan Ellis L6 - Nonspeech & Music Music & nonspeech What is nonspeech? - according to research effort: a little music - in the world: most everything high speech music Information content low natural animal sounds wind & water attributes? contact/ collision Origin machines & engines man-made E682 SAPR - Dan Ellis L6 - Nonspeech & Music

2 Sound attributes Attributes suggest model parameters What do we notice about general sound? - psychophysics: pitch, loudness, timbre - bright/dull; sharp/soft; grating/soothing - sound is not abstract : tendency is to describe by source-events Ecological perspective - what matters about sound is what happened our percepts express this more-or-less directly E682 SAPR - Dan Ellis L6 - Nonspeech & Music Motivations for modeling Describe/classify - cast sound into model because want to use the resulting parameters Store/transmit - model implicitly exploits limited structure of signal Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E682 SAPR - Dan Ellis L6 - Nonspeech & Music

3 Analysis and synthesis Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound Can exist apart: - analysis for classification - synthesis of artificial sounds Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E682 SAPR - Dan Ellis L6 - Nonspeech & Music Outline Music and nonspeech Environmental sounds - Collision sounds - Sound textures Music synthesis techniques Sinewave synthesis Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

4 2 Environmental Sounds Where sound comes from: mechanical interactions - contact / collisions - rubbing / scraping - ringing / vibrating Interest in environmental sounds - carry information about events around us.. including indirect hints - need to create them in virtual environments.. including soundtracks Approaches to synthesis - recording / sampling - synthesis algorithms E682 SAPR - Dan Ellis L6 - Nonspeech & Music (from Gaver 993) Collision sounds Factors influencing: - colliding bodies: size, material, damping - local properties at contact point (hardness) - energy of collision Source-filter model - source = excitation of collision event (energy, local properties at contact) - filter = resonance and radiation of energy (body properties) Variety of strike/scraping sounds - resonant freqs ~ size/shape - damping ~ material - HF content in excitation/strike ~ mallet, force t f E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Sound textures What do we hear in: - a city street - a symphony orchestra How do we distinguish: - waterfall - rainfall - applause - static Applause4 Rain 5 5 4 3 2 4 3 2 2 3 4 time / s 2 3 4 time /

.. E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-9 Sound texture modeling (Athineos) Model broad spectral structure with LPC - could just resynthesize with noise Model fine temporal structure

5 Sound textures What do we hear in: - a city street - a symphony orchestra How do we distinguish: - waterfall - rainfall - applause - static Applause4 Rain time / s time / s Levels of ecological description... E682 SAPR - Dan Ellis L6 - Nonspeech & Music Sound texture modeling (Athineos) Model broad spectral structure with LPC - could just resynthesize with noise Model fine temporal structure in residual with linear prediction in time domain y[n] Sound TD-LP y[n] = Σia i y[n-i] e[n] Whitened residual Per-frame spectral parameters DCT E[k] Residual spectrum - precise dual of LPC in frequency - poles model temporal events Temporal envelopes (4 poles, 256ms) FD-LP E[k] = Σib i E[k-i] Per-frame temporal envelope parameters amplitude time / sec Allows modification / synthesis? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

6 Outline Music and nonspeech Environmental sounds Music synthesis techniques - Framework - Historical development Sinewave synthesis Music analysis elements? E682 SAPR - Dan Ellis L6 - Nonspeech & Music Music synthesis techniques What is music? - could be anything flexible synthesis needed! Key elements of conventional music - instruments note-events (time, pitch, accent level) melody, harmony, rhythm - patterns of repetition & variation Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E682 SAPR - Dan Ellis L6 - Nonspeech & Music

The nature of musical instrument notes Frequency 4 3 2 Characterized by instrument (register),

.. Piano 4 3 2 Violin Frequency 2 3 4 Time Clarinet 4 3 2 2 3 4 Time Trumpet 4 3 2 2 3 4 Time

2 3 4 Time E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-3 Development of music synthesis

(quality) Earliest computer systems in 96s (voice synthesis, algorithmic) Pure synthesis

7 The nature of musical instrument notes Frequency Characterized by instrument (register), note, loudness/emphasis, articulation... Piano Violin Frequency Time Clarinet Time Trumpet Time distinguish how? Time E682 SAPR - Dan Ellis L6 - Nonspeech & Music Development of music synthesis Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) Earliest computer systems in 96s (voice synthesis, algorithmic) Pure synthesis approaches: - 97s: Analog synths - 98s: FM (Stanford/Yamaha) - 99s: Physical modeling, hybrids Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

8 Analog synthesis The minimum to make an interesting sound Trigger Pitch + Vibrato + Cutoff freq Envelope t Oscillator t Filter f + Gain Sound Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based Result: - time-varying spectrum, independent pitch E682 SAPR - Dan Ellis L6 - Nonspeech & Music FM synthesis Fast frequency modulation sidebands: cos( ω c t + βsin( ω m t) ) phase modulation n = - a harmonic series if ω c = r ω m J n (β) is a Bessel function: = J J J2 J 3 J4 J n ( β) cos( ( ω c + nω m )t).5 J n (β) for β < n modulation index β Complex harmonic spectra by varying β 4 ω c ω m = 2Hz = 2Hz 3 2 what use? time / s E682 SAPR - Dan Ellis L6 - Nonspeech & Music

9 Sampling synthesis Resynthesis from real notes vary pitch, duration, level Pitch: stretch (resample) waveform Hz Hz time time / s time / s Duration: loop a sustain section time / s..2.3 time / s Level: cross-fade different examples.2 Soft. mix.2 Loud veloc time / s.5..5 time / s - need to line up source samples good & bad? E682 SAPR - Dan Ellis L6 - Nonspeech & Music Outline Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis (detail) - Sinewave modeling - Sines + residual... Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

10 4 Sinewave synthesis If patterns of harmonics are what matter, why not generate them all explicitly: sn [ ] = A k [ n] cos( k ω [ n] n) k - particularly powerful model for pitched signals Analysis (as with speech): - find peaks in STFT S[ω,n] & track - or track fundamental ω (harmonics / autoco) & sample STFT at k ω set of A k [n] to duplicate tone: mag time / s time / s Synthesis via bank of oscillators E682 SAPR - Dan Ellis L6 - Nonspeech & Music Steps to sinewave modeling - The underlying STFT: N X[ k, n ] = xn+ n n = [ ] wn [ ] exp j 2πkn N What value for N (FFT length & window size)? What value for H (hop size: n = r H, r =,, 2...)? STFT window length determines freq. resol n: X w ( e jω ) = X( e jω ) W( e jω ) Choose N long enough to resolve harmonics 2-3x longest (lowest) fundamental period - e.g. 3-6 ms = khz - choose H N/2 N too long lost time resolution - limits sinusoid amplitude rate of change E682 SAPR - Dan Ellis L6 - Nonspeech & Music *

11 level / db Steps to sinewave modeling - 2 Choose candidate sinusoids at each time by picking peaks in each STFT frame: level / db Quadratic fit for peak: time / s y ab 2 /4 b/2 y = ax(x-b) x phase / rad linear interpolation of unwrapped phase E682 SAPR - Dan Ellis L6 - Nonspeech & Music level / db Steps to sinewave modeling - 3 Which peaks to pick? Want true sinusoids, not noise fluctuations - prominence threshold above smoothed spec Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time compare with adjacent time frames to test? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

12 Steps to sinewave modeling - 4 Grow tracks by appending newly-found peaks to existing tracks: freq existing tracks birth death new peaks time - ambiguous assignments possible Unclaimed new peak - birth of new track - backtrack to find earliest trace? No continuation peak for existing track - death of track - or: reduce peak threshold for hysteresis E682 SAPR - Dan Ellis L6 - Nonspeech & Music Resynthesis of sinewave models level After analysis, each track defines contours in frequency, amplitude f k [n], A k [n] (+ phase?) - use to drive a sinewave oscillators & sum up A k [n] n f k [n] A k [n] cos(2πf k [n] t) time / s time / s Regularize to exactly harmonic f k [n] = k f [n] time / s time / s what to do? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

13 Modification in sinewave resynthesis Change duration by warping timebase - may want to keep onset unwarped time / s Change pitch by scaling frequencies - either stretching or resampling envelope level / db level / db Change timbre by interpolating params E682 SAPR - Dan Ellis L6 - Nonspeech & Music mag / db Sinusoids + residual Only prominent peaks became tracks - remainder of spectral energy was noisy? model residual energy with noise How to obtain non-harmonic spectrum? - zero-out spectrum near extracted peaks? - or: resynthesize (exactly) & subtract waveforms e s [ n] = sn [ ] A k [ n] cos( 2πn f k [ n] ) k.. must preserve phase! LPC sinusoids original Can model residual signal with LPC flexible representation of noisy residual E682 SAPR - Dan Ellis L6 - Nonspeech & Music residual

14 Sinusoids + noise + transients Sound represented as sinusoids and noise: sn [ ] = A k [ n] cos( 2πn f k [ n] ) + h n [ n] bn [ ] k Sinusoids Parameters are {A k [n], f k [n]}, h n [n] * Residual e s [ n] time / s {A k [n], f k [n]} h n [n] Separate out abrupt transients in residual? e s [ n] = t k [ n] + h n [ n] b' [ n] k * - more specific more flexible E682 SAPR - Dan Ellis L6 - Nonspeech & Music Outline Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis - Instrument identification - Pitch tracking E682 SAPR - Dan Ellis L6 - Nonspeech & Music

15 5 Music analysis What might we want to get out of music? Instrument identification - different levels of specificity - registers within instruments Score recovery - transcribe the note sequence - extract the performance Ensemble performance - gestalts : chords, tone colors Broader timescales - phrasing & musical structure - artist / genre clustering and classification E682 SAPR - Dan Ellis L6 - Nonspeech & Music Instrument identification Research looks for perceptual timbre space dull procedure? bright low flux low attack hi attack hi flux Cues to instrument identification - onset (rise time), sustain (brightness) Hierarchy of instrument families - strings / reeds / brass - optimize features at each level E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Pitch tracking Fundamental frequency ( pitch) is a key attribute of musical sounds pitch tracking as a key technology Pitch tracking for speech - voice pitch & spectrum

Applications - voice coders (excitation description) - harmonic modeling E682 SAPR - Dan Ellis L6 - Nonspeech & Music 26-2-23-3 Pitch tracking for music 4 Pitch in music -

16 Pitch tracking Fundamental frequency ( pitch) is a key attribute of musical sounds pitch tracking as a key technology Pitch tracking for speech - voice pitch & spectrum highly dynamic - speech is voiced and unvoiced ground truth? Applications - voice coders (excitation description) - harmonic modeling E682 SAPR - Dan Ellis L6 - Nonspeech & Music Pitch tracking for music 4 Pitch in music - pitch is more stable (although vibrato) - but: multiple pitches Frequency Applications - harmonic modeling - music transcription ( storage, resynthesis) - source separation Approaches: place & time Time?? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

17 Meddis & Hewitt pitch model Autocorrelation (time) based pitch extraction - fundamental period peak(s) in autocorrelation xt () xt ( + T) Waveform x[n] r xx ( T ) = xt ()xt ( + T) max time / samples Autocorrelation r xx [l] lag / samples sound Bandpass filters Compute separately in each frequency band & summarize across (perceptual) channels Rectification & low-pass filter Periodicity detection CF / Hz Autocorrelogram Cross-channel sum Summary ACG lag / ms E682 SAPR - Dan Ellis L6 - Nonspeech & Music Tolonen & Karjalainen simplification Prewhitening sound Multiple frequency channels can have different dominant pitches... But equalizing (flattening) the spectrum works: khz khz Rectify & low-pass Summary AC as a function of time: Periodogram for M/F voice mix f/hz 4 Periodicity detection Periodicity detection + SACF enhance ESACF Summary autocorrelation at t=.775 s time/s lag vs. freq? 2 Hz (.5s) 25 Hz (.8s) lag/s - Enhancement = cancel subharmonics E682 SAPR - Dan Ellis L6 - Nonspeech & Music

18 Post-processing of pitch tracks Remove outliers with median filtering 5-pt median Octave errors are common: - if x(t) x(t + T ) then x(t) x(t + 2T ) etc. dynamic programming/hmm time Validity - is there a pitch at this time? - voiced/unvoiced decision for speech Event detection - when does a pitch slide indicate a new note? E682 SAPR - Dan Ellis L6 - Nonspeech & Music Summary Nonspeech audio - i.e. sound in general - characteristics: ecological Music synthesis - control of pitch, duration, loudness, articulation - evolution of techniques - sinusoids + noise + transients Music analysis - different aspects: instruments, pitches, performance and beyond? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Lecture 6: Nonspeech and Music

EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music 1 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis