Lecture 6: Nonspeech and Music

Size: px

Start display at page:

Download "Lecture 6: Nonspeech and Music"

Polly Allison
6 years ago
Views:

1 EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music 1 Music & nonspeech Dan Ellis <dpwe@ee.columbia.edu> Michael Mandel <mim@ee.columbia.edu> 2 Environmental Sounds Columbia University Dept. of Electrical Engineering dpwe/e682 3 Music Synthesis Techniques 4 Sinewave Synthesis February 26, 29 E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 1 / 3

2 Outline 1 Music & nonspeech 2 Environmental Sounds 3 Music Synthesis Techniques 4 Sinewave Synthesis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 2 / 3

3 Music & nonspeech What is nonspeech? according to research effort: a little music in the world: most everything high speech music Information content low wind & water natural animal sounds contact/ collision Origin machines & engines man-made attributes? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 3 / 3

4 Sound attributes Attributes suggest model parameters What do we notice about general sound? psychophysics: pitch, loudness, timbre bright/dull; sharp/soft; grating/soothing sound is not abstract : tendency is to describe by source-events Ecological perspective what matters about sound is what happened our percepts express this more-or-less directly E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 4 / 3

5 Motivations for modeling Describe/classify cast sound into model because want to use the resulting parameters Store/transmit model implicitly exploits limited structure of signal Resynthesize/modify model separates out interesting parameters Sound Model parameter space E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 5 / 3

6 Analysis and synthesis Analysis is the converse of synthesis: Model / representation Synthesis Analysis Sound Can exist apart: analysis for classification synthesis of artificial sounds Often used together: encoding/decoding of compressed formats resynthesis based on analyses analysis-by-synthesis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 6 / 3

7 Outline 1 Music & nonspeech 2 Environmental Sounds 3 Music Synthesis Techniques 4 Sinewave Synthesis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 7 / 3

8 Environmental Sounds Where sound comes from: mechanical interactions contact / collisions rubbing / scraping ringing / vibrating Interest in environmental sounds carry information about events around us.. including indirect hints need to create them in virtual environments.. including soundtracks Approaches to synthesis recording / sampling synthesis algorithms E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 8 / 3

9 Collision sounds Factors influencing: colliding bodies: size, material, damping local properties at contact point (hardness) energy of collision Source-filter model source = excitation of collision event (energy, local properties at contact) filter = resonance and radiation of energy (body properties) Variety of strike/scraping sounds resonant freqs size/shape damping material HF content in excitation/strike mallet, force (from Gaver, 1993) E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 9 / 3

3 2 freq / Hz 4 3 2 1 1 1 2 3 4 time / s Levels of ecological description.

10 Sound textures What do we hear in: a city street a symphony orchestra How do we distinguish: waterfall rainfall applause static Applause4 Rain1 5 5 freq / Hz freq / Hz time / s Levels of ecological description E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 1 / 3 time / s

11 Sound texture modeling (Athineos) Model broad spectral structure with LPC could ust resynthesize with noise Model fine temporal structure in residual with linear prediction in time domain y[n] Sound TD-LP y[n] = Σia i y[n-i] e[n] Whitened residual Per-frame spectral parameters DCT E[k] Residual spectrum FD-LP E[k] = Σib i E[k-i] Per-frame temporal envelope parameters precise dual of LPC in frequency poles model temporal events Temporal envelopes (4 poles, 256ms) amplitude time / sec Allows modification / synthesis? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

12 Outline 1 Music & nonspeech 2 Environmental Sounds 3 Music Synthesis Techniques 4 Sinewave Synthesis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

13 Music synthesis techniques S A T B What is music? M E S S I A H 44. chorus could be anything flexible synthesis HALLELUJAH! needed! Key elements of conventional Allegro music # Soprano instruments & # c. J Œ. J note-events (time, pitch, accent level) # melody, harmony, Alto rhythm & # c Hal - - lu - ah!. Œ. patterns of repetition & variation # Tenor Synthesis framework: V # c Hal. le - lu - ah! J J J Œ - instruments: common? # framework for many notes Bass # c Hal - le - lu - ah!. score: sequence of (time, pitch, level) J J note J Œ. events 7 # & # J # & # # V # J J Œ le - lu - ah, Œ le - lu - ah, Œ le - lu - ah,? # # J Œ. Hal - le - lu - ah! J Hal - le - lu - ah,. Hal - le - lu - ah, J Œ Œ. J J J Œ Hal - le - lu - ah,. J J J Œ. R R Hal - le - lu - ah! Hal - le - r r Hal. le - lu - ah! Hal - le J J J R R - - Hal - le - lu - ah! Hal - le - J J J R R Hal - le - lu - ah! Hal - le - J J R R Hal - le - lu - ah, Hal - le -. r r Hal - le - lu - ah, Hal - le -. J J J R R Hal - le - lu - ah, Hal - le -. J J J R R J J R R J J lu - ah! J r lu - ah! Hal - le - lu - ah r J Hal - le - lu - ah J J R R J J lu - ah! Hal - le - lu - ah J J R R J J lu - ah! Hal - le - lu - ah J J R R J lu - ah, lu - ah, J Hal - le - lu - ah, r r Hal - le - lu - ah, J J R R J lu - ah, HA G. F. J Hal - le - lu - ah, J J R R J J E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

1 2 3 4 Time 1 2 3 4 Time Frequency 4 3 2

14 The nature of musical instrument notes Characterized by instrument (register), note, loudness/emphasis, articulation... Frequency Piano Violin Time Time Frequency Clarinet Trumpet Time Distinguish how? Time E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

15 Development of music synthesis Goals of music synthesis: generate realistic / pleasant new notes control / explore timbre (quality) Earliest computer systems in 196s (voice synthesis, algorithmic) Pure synthesis approaches: 197s: Analog synths 198s: FM (Stanford/Yamaha) 199s: Physical modeling, hybrids Analysis-synthesis methods: sampling / wavetables sinusoid modeling harmonics + noise (+ transients) others? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

16 Analog synthesis The minimum to make an interesting sound Trigger Pitch + Oscillator Vibrato t + Cutoff freq Filter f Envelope + t Gain Sound Elements: harmonics-rich oscillators time-varying filters time-varying envelope modulation: low frequency + envelope-based Result: time-varying spectrum, independent pitch E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

17 FM synthesis Fast frequency modulation sidebands: cos(ω c t + β sin(ω m t)) = n= J n(β) cos((ω c + nω m )t) a harmonic series if ωc = rω m J n (β) is a Bessel function: 1 J J 1 J2 J3 J4.5 J n (β) for β < n modulation index β Complex harmonic spectra by varying β 4 3 freq / Hz time / s ωc = 2 Hz, ω m = 2 Hz what use? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

18 Sampling synthesis Resynthesis from real notes vary pitch, duration, level Pitch: stretch (resample) waveform Hz 894 Hz time / s Duration: loop a sustain section time / s time / s time / s Level: cross-fade different examples.2 Soft.1 mix.2 Loud veloc time / s time / s need to line up source samples good & bad? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

19 Outline 1 Music & nonspeech 2 Environmental Sounds 3 Music Synthesis Techniques 4 Sinewave Synthesis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

20 Sinewave synthesis If patterns of harmonics are what matter, why not generate them all explicitly: s[n] = k A k[n] cos(k ω [n] n) particularly powerful model for pitched signals Analysis (as with speech): find peaks in STFT S[ω, n] & track or track fundamental ω (harmonics / autocorrelation) & sample STFT at k ω set of A k [n] to duplicate tone: freq / Hz mag time / s freq / Hz time / s Synthesis via bank of oscillators E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 2 / 3

21 Steps to sinewave modeling - 1 The underlying STFT: X [k, n ] = N 1 n= ( ) 2πkn x[n + n ] w[n] exp N what value for N (FFT length & window size)? what value for H (hop size: n = r H, r =, 1, 2... )? STFT window length determines freq. resolution: X w (e ω ) = X (e ω ) W (e ω ) Choose N long enough to resolve harmonics 2-3x longest (lowest) fundamental period e.g. 3-6 ms = khz choose H N/2 N too long lost time resolution limits sinusoid amplitude rate of change E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

22 Steps to sinewave modeling - 2 Choose candidate sinusoids at each time by picking peaks in each STFT frame: freq / Hz level / db Quadratic fit for peak: level / db time / s freq / Hz y ab 2 /4 b/2 y = ax(x-b) phase / rad freq / Hz x linear interpolation of unwrapped phase freq / Hz E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

23 Steps to sinewave modeling - 3 Which peaks to pick? Want true sinusoids, not noise fluctuations prominence threshold above smoothed spectrum level / db freq / Hz Sinusoids exhibit stability... of amplitude in time of phase derivative in time compare with adacent time frames to test? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

24 Steps to sinewave modeling - 4 Grow tracks by appending newly-found peaks to existing tracks: freq existing tracks birth death new peaks time ambiguous assignments possible Unclaimed new peak birth of new track backtrack to find earliest trace? No continuation peak for existing track death of track or: reduce peak threshold for hysteresis E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

25 Resynthesis of sinewave models After analysis, each track defines contours in frequency, amplitude f k [n], A k [n] (+ phase?) use to drive a sinewave oscillators & sum up freq / Hz level A k [n] n f k [n] A k [n] cos(2πf k [n] t) time / s Regularize to exactly harmonic f k [n] = k f [n] freq / Hz freq / Hz time / s time / s time / s E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

26 Modification in sinewave resynthesis Change duration by warping timebase may want to keep onset unwarped 5 freq / Hz time / s Change pitch by scaling frequencies either stretching or resampling envelope level / db freq / Hz level / db freq / Hz Change timbre by interpolating parameters E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

27 Sinusoids + residual Only prominent peaks became tracks remainder of spectral energy was noisy? model residual energy with noise How to obtain non-harmonic spectrum? zero-out spectrum near extracted peaks? or: resynthesize (exactly) & subtract waveforms e s [n] = s[n] k A k [n] cos(2πn f k [n]).. must preserve phase! mag / db 2-2 sinusoids original -4-6 residual LPC freq / Hz Can model residual signal with LPC flexible representation of noisy residual E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

28 Sinusoids + noise + transients Sound represented as sinusoids and noise: s[n] = k A k [n] cos(2πn f k [n]) + h n [n] b[n] Sinusoids Parameters are A k [n], f k [n], h n [n] freq / Hz time / s Separate out abrupt transients in residual? e s [n] = k t k[n] + h n [n] b [n] more specific more flexible Residual e s [n] E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

29 Summary Nonspeech audio i.e. sound in general characteristics: ecological Music synthesis control of pitch, duration, loudness, articulation evolution of techniques sinusoids + noise + transients Music analysis... and beyond? E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, / 3

30 References W.W. Gaver. Synthesizing auditory icons. In Proc. Conference on Human factors in computing systems INTERCHI-93, pages Addison-Wesley, M. Athineos and D. P. W. Ellis. Autoregressive modeling of temporal envelopes. IEEE Tr. Signal Proc., 15(11): , 27. URL X. Serra and J. Smith III. Spectral Modeling Synthesis: A Sound Analysis/Synthesis System Based on a Deterministic Plus Stochastic Decomposition. Computer Music Journal, 14(4):12 24, 199. T. S. Verma and T. H. Y. Meng. An analysis/synthesis tool for transient signals that allows aflexible sines+ transients+ noise model for audio. In Proc. ICASSP, pages VI , Seattle, E682 (Ellis & Mandel) L6: Nonspeech and Music February 26, 29 3 / 3

Lecture 6: Nonspeech and Music. Music & nonspeech

EE E682: Speech & Audio Processing & Recognition Lecture 6: Nonspeech and Music 2 3 4 5 Music and nonspeech Environmental sounds Music synthesis techniques Sinewave synthesis Music analysis Dan Ellis