Lecture 6: Nonspeech and Music

Size: px

Start display at page:

Download "Lecture 6: Nonspeech and Music"

Georgia Copeland
5 years ago
Views:

1 EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis <dpwe@ee.columbia.edu> Columbia University Dept. of Electrical Engineering Spring 26 E682 SAPR - Dan Ellis L6 - Nonspeech & Music

2 1 Music & nonspeech What is nonspeech? - according to research effort: a little music - in the world: most everything high speech music Information content low wind & water natural animal sounds attributes? contact/ collision Origin machines & engines man-made E682 SAPR - Dan Ellis L6 - Nonspeech & Music

3 Sound attributes Attributes suggest model parameters What do we notice about general sound? - psychophysics: pitch, loudness, timbre - bright/dull; sharp/soft; grating/soothing - sound is not abstract : tendency is to describe by source-events Ecological perspective - what matters about sound is what happened our percepts express this more-or-less directly E682 SAPR - Dan Ellis L6 - Nonspeech & Music

4 Motivations for modeling Describe/classify - cast sound into model because want to use the resulting parameters Store/transmit - model implicitly exploits limited structure of signal Resynthesize/modify - model separates out interesting parameters Sound Model parameter space E682 SAPR - Dan Ellis L6 - Nonspeech & Music

5 Analysis and synthesis Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound Can exist apart: - analysis for classification - synthesis of artificial sounds Often used together: - encoding/decoding of compressed formats - resynthesis based on analyses - analysis-by-synthesis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

6 Outline Music and nonspeech Environmental sounds - Collision sounds - Sound textures Music synthesis techniques Sinewave synthesis Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

7 2 Environmental Sounds Where sound comes from: mechanical interactions - contact / collisions - rubbing / scraping - ringing / vibrating Interest in environmental sounds - carry information about events around us.. including indirect hints - need to create them in virtual environments.. including soundtracks Approaches to synthesis - recording / sampling - synthesis algorithms E682 SAPR - Dan Ellis L6 - Nonspeech & Music

8 (from Gaver 1993) Collision sounds Factors influencing: - colliding bodies: size, material, damping - local properties at contact point (hardness) - energy of collision Source-filter model - source = excitation of collision event (energy, local properties at contact) - filter = resonance and radiation of energy (body properties) Variety of strike/scraping sounds - resonant freqs ~ size/shape - damping ~ material - HF content in excitation/strike ~ mallet, force t f E682 SAPR - Dan Ellis L6 - Nonspeech & Music

5 4 4 freq / Hz 3 2 freq / Hz 3 2 1 1 1 2 3 4 time / s 1 2 3 4 time / s Levels

9 Sound textures What do we hear in: - a city street - a symphony orchestra How do we distinguish: - waterfall - rainfall - applause - static Applause4 Rain freq / Hz 3 2 freq / Hz time / s time / s Levels of ecological description... E682 SAPR - Dan Ellis L6 - Nonspeech & Music

10 Sound texture modeling (Athineos) Model broad spectral structure with LPC - could just resynthesize with noise Model fine temporal structure in residual with linear prediction in time domain y[n] Sound TD-LP y[n] = ia i y[n-i] Σ Per-frame spectral parameters e[n] Whitened residual DCT E[k] Residual spectrum - precise dual of LPC in frequency - poles model temporal events Temporal envelopes (4 poles, 256ms) FD-LP E[k] = Σib i E[k-i] Per-frame temporal envelope parameters amplitude time / sec Allows modification / synthesis? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

11 Outline Music and nonspeech Environmental sounds Music synthesis techniques - Framework - Historical development Sinewave synthesis Music analysis elements? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

12 3 Music synthesis techniques What is music? - could be anything flexible synthesis needed! Key elements of conventional music - instruments note-events (time, pitch, accent level) melody, harmony, rhythm - patterns of repetition & variation Synthesis framework: instruments: common framework for many notes score: sequence of (time, pitch, level) note events E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Violin Frequency 1 2 3 4 Time Clarinet 4

13 The nature of musical instrument notes Characterized by instrument (register), note, loudness/emphasis, articulation... Frequency Piano Violin Frequency Time Clarinet Time Trumpet Time distinguish how? Time E682 SAPR - Dan Ellis L6 - Nonspeech & Music

14 Development of music synthesis Goals of music synthesis: - generate realistic / pleasant new notes - control / explore timbre (quality) Earliest computer systems in 196s (voice synthesis, algorithmic) Pure synthesis approaches: - 197s: Analog synths - 198s: FM (Stanford/Yamaha) - 199s: Physical modeling, hybrids Analysis-synthesis methods: - sampling / wavetables - sinusoid modeling - harmonics + noise (+ transients) others? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

15 Analog synthesis The minimum to make an interesting sound Trigger Pitch + Vibrato + Cutoff freq Envelope t Oscillator t Filter f + Gain Sound Elements: - harmonics-rich oscillators - time-varying filters - time-varying envelope - modulation: low frequency + envelope-based Result: - time-varying spectrum, independent pitch E682 SAPR - Dan Ellis L6 - Nonspeech & Music

16 FM synthesis Fast frequency modulation sidebands: cos( ω c t + βsin( ω m t) ) phase modulation - a harmonic series if ω c = r ω m J n (β) is a Bessel function: = n = J n ( β) cos( ( ω c + nω m )t) 1 J J 1 J2 J 3 J4.5 J n (β) for β < n modulation index β Complex harmonic spectra by varying β 4 ω c ω m = 2Hz = 2Hz freq / Hz what use? time / s E682 SAPR - Dan Ellis L6 - Nonspeech & Music

17 Sampling synthesis Resynthesis from real notes vary pitch, duration, level Pitch: stretch (resample) waveform Hz 894 Hz time time / s time / s Duration: loop a sustain section time / s time / s Level: cross-fade different examples Soft mix veloc time / s time / s need to line up source samples Loud good & bad? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

18 Outline Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis (detail) - Sinewave modeling - Sines + residual... Music analysis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

19 4 Sinewave synthesis If patterns of harmonics are what matter, why not generate them all explicitly: sn [ ] = A k [ n] cos( k ω [ n] n) k - particularly powerful model for pitched signals Analysis (as with speech): - find peaks in STFT S[ω,n] & track - or track fundamental ω (harmonics / autoco) freq / Hz & sample STFT at k ω set of A k [n] to duplicate tone: mag time / s freq / Hz time / s Synthesis via bank of oscillators E682 SAPR - Dan Ellis L6 - Nonspeech & Music

20 Steps to sinewave modeling - 1 The underlying STFT: N 1 X[ k, n ] = xn+ n n = [ ] wn [ ] exp j 2πkn N What value for N (FFT length & window size)? What value for H (hop size: n = r H, r =, 1, 2...)? STFT window length determines freq. resol n: X w ( e jω ) = X( e jω ) W( e jω ) Choose N long enough to resolve harmonics 2-3x longest (lowest) fundamental period - e.g. 3-6 ms = khz - choose H N/2 N too long lost time resolution - limits sinusoid amplitude rate of change E682 SAPR - Dan Ellis L6 - Nonspeech & Music *

21 Steps to sinewave modeling - 2 level / db Choose candidate sinusoids at each time by picking peaks in each STFT frame: freq / Hz level / db Quadratic fit for peak: time / s freq / Hz y ab 2 /4 b/2 y = ax(x-b) x phase / rad freq / Hz freq / Hz + linear interpolation of unwrapped phase E682 SAPR - Dan Ellis L6 - Nonspeech & Music

22 Steps to sinewave modeling - 3 Which peaks to pick? Want true sinusoids, not noise fluctuations - prominence threshold above smoothed spec. level / db freq / Hz Sinusoids exhibit stability... - of amplitude in time - of phase derivative in time compare with adjacent time frames to test? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

23 Steps to sinewave modeling - 4 Grow tracks by appending newly-found peaks to existing tracks: freq existing tracks birth death new peaks time - ambiguous assignments possible Unclaimed new peak - birth of new track - backtrack to find earliest trace? No continuation peak for existing track - death of track - or: reduce peak threshold for hysteresis E682 SAPR - Dan Ellis L6 - Nonspeech & Music

24 Resynthesis of sinewave models freq / Hz level After analysis, each track defines contours in frequency, amplitude f k [n], A k [n] (+ phase?) - use to drive a sinewave oscillators & sum up A k [n] n f k [n] A k [n] cos(2πf k [n] t) time / s time / s Regularize to exactly harmonic f k [n] = k f [n] freq / Hz time / s freq / Hz time / s what to do? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

25 Modification in sinewave resynthesis Change duration by warping timebase - may want to keep onset unwarped 5 freq / Hz time / s Change pitch by scaling frequencies - either stretching or resampling envelope level / db freq / Hz level / db freq / Hz Change timbre by interpolating params E682 SAPR - Dan Ellis L6 - Nonspeech & Music

26 Sinusoids + residual Only prominent peaks became tracks - remainder of spectral energy was noisy? model residual energy with noise How to obtain non-harmonic spectrum? - zero-out spectrum near extracted peaks? - or: resynthesize (exactly) & subtract waveforms e s [ n] = sn [ ] A k [ n] cos( 2πn f k [ n] ) k mag / db must preserve phase! sinusoids original -6-8 LPC freq / Hz residual Can model residual signal with LPC flexible representation of noisy residual E682 SAPR - Dan Ellis L6 - Nonspeech & Music

27 Sinusoids + noise + transients Sound represented as sinusoids and noise: sn [ ] = A k [ n] cos( 2πn f k [ n] ) + k Sinusoids Parameters are {A k [n], f k [n]}, h n [n] h n [ n] bn [ ] * Residual e s [ n] freq / Hz time / s {A k [n], f k [n]} h n [n] Separate out abrupt transients in residual? e s [ n] = t k [ n] + h n [ n] b' [ n] k * - more specific more flexible E682 SAPR - Dan Ellis L6 - Nonspeech & Music

28 Outline Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis - Instrument identification - Pitch tracking E682 SAPR - Dan Ellis L6 - Nonspeech & Music

29 5 Music analysis What might we want to get out of music? Instrument identification - different levels of specificity - registers within instruments Score recovery - transcribe the note sequence - extract the performance Ensemble performance - gestalts : chords, tone colors Broader timescales - phrasing & musical structure - artist / genre clustering and classification E682 SAPR - Dan Ellis L6 - Nonspeech & Music

30 Instrument identification Research looks for perceptual timbre space dull procedure? low attack bright low flux hi flux hi attack Cues to instrument identification - onset (rise time), sustain (brightness) Hierarchy of instrument families - strings / reeds / brass - optimize features at each level E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Pitch tracking Fundamental frequency ( pitch) is a key attribute of musical sounds pitch tracking as a key technology Pitch tracking for speech - voice pitch & spectrum highly

31 Pitch tracking Fundamental frequency ( pitch) is a key attribute of musical sounds pitch tracking as a key technology Pitch tracking for speech - voice pitch & spectrum highly dynamic - speech is voiced and unvoiced ground truth? Applications - voice coders (excitation description) - harmonic modeling E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Pitch tracking for music 4 Pitch in music - pitch is more stable (although vibrato) - but: multiple pitches Frequency 3 2 1??.5 1 1.5 2 2.5 3 3.5 4 4.

32 Pitch tracking for music 4 Pitch in music - pitch is more stable (although vibrato) - but: multiple pitches Frequency 3 2 1?? Time Applications - harmonic modeling - music transcription ( storage, resynthesis) - source separation Approaches: place & time E682 SAPR - Dan Ellis L6 - Nonspeech & Music

33 Meddis & Hewitt pitch model Autocorrelation (time) based pitch extraction - fundamental period peak(s) in autocorrelation xt () xt ( + T) r xx ( T ) = xt ()xt ( + T) max Waveform x[n] time / samples Autocorrelation r xx [l] lag / samples Compute separately in each frequency band & summarize across (perceptual) channels Periodicity detection CF / Hz 4 Autocorrelogram sound Bandpass filters Rectification & low-pass filter Cross-channel sum Summary ACG lag / ms E682 SAPR - Dan Ellis L6 - Nonspeech & Music

34 sound Tolonen & Karjalainen simplification Multiple frequency channels can have different dominant pitches... But equalizing (flattening) the spectrum works: Prewhitening 1kHz 1kHz Rectify & low-pass Summary AC as a function of time: Periodogram for M/F voice mix f/hz 1 4 Periodicity detection Periodicity detection + SACF enhance ESACF Summary autocorrelation at t=.775 s time/s lag vs. freq? 2 Hz (.5s) 125 Hz (.8s) lag/s - Enhancement = cancel subharmonics E682 SAPR - Dan Ellis L6 - Nonspeech & Music

35 Post-processing of pitch tracks Remove outliers with median filtering 5-pt median Octave errors are common: - if x(t) x(t + T ) then x(t) x(t + 2T ) etc. dynamic programming/hmm time Validity - is there a pitch at this time? - voiced/unvoiced decision for speech Event detection - when does a pitch slide indicate a new note? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

36 Summary Nonspeech audio - i.e. sound in general - characteristics: ecological Music synthesis - control of pitch, duration, loudness, articulation - evolution of techniques - sinusoids + noise + transients Music analysis - different aspects: instruments, pitches, performance and beyond? E682 SAPR - Dan Ellis L6 - Nonspeech & Music

Lecture 6: Nonspeech and Music. Music & nonspeech

EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis