A Continuous Time-Frequency Approach To Representing Rhythmic Strata

Similar documents
BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Tempo and Beat Tracking

Drum Transcription Based on Independent Subspace Analysis

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

Tempo and Beat Tracking

Music Signal Processing

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

(Time )Frequency Analysis of EEG Waveforms

Practical Application of Wavelet to Power Quality Analysis. Norman Tse

Advanced audio analysis. Martin Gasser

Complex Sounds. Reading: Yost Ch. 4

Rhythm Analysis in Music

VU Signal and Image Processing. Torsten Möller + Hrvoje Bogunović + Raphael Sahann

Rhythm Analysis in Music

TRANSFORMS / WAVELETS

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES

Practical Applications of the Wavelet Analysis

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Signals, Sound, and Sensation

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

What Does Bach Have in Common with World 1-1: Automatic Platformer Gestalt Analysis

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

INFLUENCE OF FREQUENCY DISTRIBUTION ON INTENSITY FLUCTUATIONS OF NOISE

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

LOCAL MULTISCALE FREQUENCY AND BANDWIDTH ESTIMATION. Hans Knutsson Carl-Fredrik Westin Gösta Granlund

III. Publication III. c 2005 Toni Hirvonen.

Orthonormal bases and tilings of the time-frequency plane for music processing Juan M. Vuletich *

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Wavelet Transform Based Islanding Characterization Method for Distributed Generation

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique

Auditory modelling for speech processing in the perceptual domain

Application of The Wavelet Transform In The Processing of Musical Signals

TNS Journal Club: Efficient coding of natural sounds, Lewicki, Nature Neurosceince, 2002

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Whole geometry Finite-Difference modeling of the violin

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Analysis and design of filters for differentiation

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Real-time beat estimation using feature extraction

Fourier and Wavelets

CSC475 Music Information Retrieval

Theory of Telecommunications Networks

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

Music 171: Amplitude Modulation

Joint Time/Frequency Analysis, Q Quality factor and Dispersion computation using Gabor-Morlet wavelets or Gabor-Morlet transform

A3D Contiguous time-frequency energized sound-field: reflection-free listening space supports integration in audiology

Perception of low frequencies in small rooms

Time-Frequency Analysis of Shock and Vibration Measurements Using Wavelet Transforms

A New Method of Emission Measurement

FFT analysis in practice

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

Signals. Continuous valued or discrete valued Can the signal take any value or only discrete values?

The psychoacoustics of reverberation

WAVELET TRANSFORMS FOR SYSTEM IDENTIFICATION AND ASSOCIATED PROCESSING CONCERNS

Wavelet Transform for Bearing Faults Diagnosis

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

Chapter 5. Signal Analysis. 5.1 Denoising fiber optic sensor signal

Wavelets and wavelet convolution and brain music. Dr. Frederike Petzschner Translational Neuromodeling Unit

Estimation of speed, average received power and received signal in wireless systems using wavelets

ON MEASURING SYNCOPATION TO DRIVE AN INTERACTIVE MUSIC SYSTEM

Dept. of Computer Science, University of Copenhagen Universitetsparken 1, DK-2100 Copenhagen Ø, Denmark

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

8.3 Basic Parameters for Audio

Speech and Music Discrimination based on Signal Modulation Spectrum.

AN AUDITORILY MOTIVATED ANALYSIS METHOD FOR ROOM IMPULSE RESPONSES

Image Denoising Using Complex Framelets

EE 791 EEG-5 Measures of EEG Dynamic Properties

Sound Synthesis Methods

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

COMP 546, Winter 2017 lecture 20 - sound 2

Abstract shape: a shape that is derived from a visual source, but is so transformed that it bears little visual resemblance to that source.

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Wavelet Transform for Classification of Voltage Sag Causes using Probabilistic Neural Network

SIGNAL PROCESSING OF POWER QUALITY DISTURBANCES

Studying the Effect of Metre Perception on Rhythm and Melody Modelling with LSTMs

COM325 Computer Speech and Hearing

Introduction to Wavelets Michael Phipps Vallary Bhopatkar

Biosignal Analysis Biosignal Processing Methods. Medical Informatics WS 2007/2008

AMAJOR difficulty of audio representations for classification

DETECTION AND CLASSIFICATION OF POWER QUALITY DISTURBANCE WAVEFORM USING MRA BASED MODIFIED WAVELET TRANSFROM AND NEURAL NETWORKS

New Windowing Technique Detection of Sags and Swells Based on Continuous S-Transform (CST)

+ a(t) exp( 2πif t)dt (1.1) In order to go back to the independent variable t, we define the inverse transform as: + A(f) exp(2πif t)df (1.

Computer Audio. An Overview. (Material freely adapted from sources far too numerous to mention )

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Object Perception. 23 August PSY Object & Scene 1

Do wavelet filters provide more accurate estimates of reverberation times at low frequencies.

THE PERCEPTION OF ALL-PASS COMPONENTS IN TRANSFER FUNCTIONS

FROM BLIND SOURCE SEPARATION TO BLIND SOURCE CANCELLATION IN THE UNDERDETERMINED CASE: A NEW APPROACH BASED ON TIME-FREQUENCY ANALYSIS

Direct Imaging of Group Velocity Dispersion Curves in Shallow Water Christopher Liner*, University of Houston; Lee Bell and Richard Verm, Geokinetics

SIGNALS AND SYSTEMS LABORATORY 13: Digital Communication

SGN Audio and Speech Processing

Detection of gear defects by resonance demodulation detected by wavelet transform and comparison with the kurtogram

Blind Dereverberation of Single-Channel Speech Signals Using an ICA-Based Generative Model

Applications of Music Processing

Understanding Digital Signal Processing

Transcription:

A Continuous Time-Frequency Approach To Representing Rhythmic Strata Leigh M. Smith and Peter Kovesi Department of Computer Science University of Western Australia Motivation Modelling the cognition of musical rhythm offers insights into the theory of time perception, quantification of musical theories of performance and epression [6], and non-verbal artificial intelligence knowledge representation. Eisting models of rhythm have used various approaches including grammars, epectancy [1], statistics, Minskyian agents, oscillator entrainment [5] and other self-organising connectionist systems. A common problem confronted and addressed in a diverse manner by these approaches is the representation of temporal contet, order and hierarchy, and the role of epressive timing within the eisting rhythmic structure. This paper describes an approach to ehaustively represent the periodicities which are created by temporal relationships between beats over multiple timescales, including both metrical and agogic times. This multiple resolution analysis is performed using Morlet wavelets [] over frequencies from.1 to 1Hz. The result is a decomposition of a musical rhythm into short term low frequency components to reveal transient details from which may be constructed an eecutable theory [6] of rhythmic structure. Rhythmic strata, tactus and epectancy Yeston has argued for the conception and representation of rhythm as a hierarchy of strata [11], with a meter arising from accents created by the interaction between hierarchial levels. Lerdahl and Jackendoff [7] have argued for two hierarchies, a metrical structure and a grouping structure. Their metrical structure is a decomposition of the listeners sense of repetitive beat, again by interactions between levels of their metrical grid. The tactus (roughly the foot tapping rate), is considered as the most salient hierarchial level. The grouping structure is responsible for establishing boundaries of hierarchial grouping over phrases, motives and other units of musical time, interacting with the metrical structure. Desain s decomposable rhythm perspective has similarities to a wavelet approach [1]. He forward projects epectancy curves in time which are composed from Gaussian sections with parameters determined from the ratio of previous time intervals. The curves are also weighted by an absolute time component, creating tempo dependency. The epectancy curve is calculated by summing the epectancies determined from all of the possible intervals between all onsets. Each time point of highest epectancy positions a window where beats are identified.

Frequency approaches An isochronous rhythm may be considered as a single periodicity of a measurable frequency the reciprocal of the inter-onset-interval (IOI) between beats. Objective (for eample dynamic) accenting of regularly spaced beats creates a meter and thereby two frequencies, that implied by the period of the beat, and that by the period of the measure, a lower frequency. Epressive timing deviations from a strict metrical grid may be reviewed as relatively short term frequency deviations and modulations from a metrical carrier frequency. A rhythm composed of syncopations and varied IOIs may be conceived of as a linear combination of short term frequency components, in the sense of Fourier theory. A sufficiently fine grained time-frequency analysis will then reveal these components over their time period. Recovering the rhythm from an acoustic signal can be achieved by rectification [8], alternatively, capturing the monophonic timing from a MIDI controller avoids this processing. The rhythm is represented as impulses at the time of note onset, weighted by it s MIDI velocity value. This makes an assumption of a linear relationship between intensity and cognitive salience and ignores timbre, pitch and duration effects. It does however succintly represent the cognitive structure of percussive rhythms, which can be memorised and recalled independent of timbre, pitch and duration. Neural oscillator entrainment models use a hierarchy of oscillators to effectively respond to periodicities in the rhythm, within frequency bands defined by the dynamics of the phase locking behaviour. Such models are not actually interlocked between hierarchies [5, pp. 198], suggesting independent stratified layers of rhythmic times produced by a time-frequency analysis will reveal the signal on which the oscillators are self-organising. Wavelets The discrete Fourier transform decomposes an arbitary signal onto infinite time, comple eponential bases in harmonic relationship to the analysis period. This localises frequency while losing representing frequency change over time. The short term Fourier transform (STFT) addresses this by a time window over the signal, fied across the frequency plane. By contrast, the continuous wavelet transform (CWT) [] decomposes a time t varying signal s(t) onto scaled and translated versions of a mother-wavelet g(t), W s (b, a) = 1 a s(τ) ḡ( τ b ) dτ, a >, (1) a where ḡ(t) is the comple conjugate and a is the scale parameter, controlling the dilation of the window function, effectively stretching the window geometrically over time. The translation parameter b centres the window in the time domain. The geometric scale gives the wavelet transform a zooming capability over a logarithmic frequency range, such that high frequencies are localised by the window over short time scales, and low frequencies are localised over longer time scales. The CWT indicated in Equation 1 is a scaled and translated instance from a bank of an infinite number of constant relative bandwidth (Q) filters. For a discrete implementation, a sufficient density of scales (a) or voices per octave is required. Morlet and Grossmann s mother-wavelet [2] for g(t) is a scaled comple Gabor function,

g(t) = e t2 /2 e i2πω t, (2) where ω is the frequency of the mother-wavelet before it is scaled. Choices of ω π 2/ ln 2 will be close to a zero mean []. The Gaussian envelope over the comple eponential provides the best possible simultaneous time/frequency localisation [2], respecting the Heisenberg uncertainty relation. This ensures that all short term periodicities contained in the rhythm will be captured in the analysis. The time domain of s(t) which can influence the wavelet output W s (b, a ) at the point b, a is an inverted logarithmic cone with its verte at b, a, equally etending bidirectionally in time. Where impulses fall within the time etent of a point, W s (b, a ) will return a high energy value. The phase of Equation 2 is constant. Real valued wavelets such as Marr s [8] do not provide an anti-symmetric component to produce a phase which oscillates independent of the signal. This produces magnitudes which will not be congruent across scales, deviating forward and backward in time. By the progressive nature of Equation 2 [2, ], the real and imaginary components of W s (b, a) are the Hilbert transform of each other. These can be computed as magnitude and phase components and then plotted in grey scales on a scalogram and phasogram (Figure 2) respectively. Phase values are mapped from the domain 2π to black through to white. The transition from white to black indicates a return to. Vertical lines of constant shade indicates a congruence of phase over a range of frequencies. In order to preserve phase, Morlet wavelets are non-causal, convolving a signal with the wavelet family in both directions in time. Phase has been shown to be important in discrimination at audio frequencies [1]. Given the effects of backwards masking and auditory persistence [8], there may be an integrating period, such that the auditory system may not be totally causal. However this does not account for the non-causality of the convolution operator over longer time scales in a rhythm. Convolution places the events within a global time construct and is therefore in some sense a theory about the cognition of a complete rhythmic structure, at the time of performance, rather than directly the listeners perception. Local Energy Phase indicates the progression of a periodic wave though its cycle. Phase values of a voice regularly oscillating linearly between to 2π, indicates the frequency represented by that voice is present in the signal. Morrone and Owens have reported compelling evidence for the local energy model for feature detection in image processing, proposing that features of an image are perceived at points where the Fourier components are most in phase [9]. Peaks in the local energy function can be used to indicate points of maimum phase congruency. The local energy function E(t) of the signal s(t) can be defined as, [ N E(t) = 2 [ N 2 R[W s (t, n)]] + I[W s (t, n)]], n where N is the number of voices, and R[], I[] are the real and imaginary outputs from the CWT of Equation 1 for each voice respectively. With a progressive motherwavelet, a singularity such as an impulse will be marked by a constant phase [2]. n

The local energy function will typically produce high values at the impulse times. Applied in the 1-D case, phase congruency produces a dimensionless measure of the alignment of beat frequencies within a global temporal contet. The similarity between phase congruent temporal feature detection and Yeston s theory of interhierarchial accents is striking. Eample Analysis The snare drum rhythm of Ravel s Bolero was chosen to demonstrate the behaviour of the CWT on a musical eample (See Figure 1). The rhythm s metrical durations were used with a tempo of 6 bpm to generate an unaccented impulse train at 2Hz sampling rate. Morlet wavelets were discretised over 16 equally tempered voices per octave, for 1 octaves, with the maimum analysing period of the wavelets ranging from 2 to 248 samples. The scalogram and phasogram of two repeats of the rhythm are shown in Figure 2, with scales plotted in rhythmic notation according to the tempo. At the highest scales, the impulses are discernable due to the localisation over a short time window. At lower scales, the frequencies created by the IOIs of the impulses are indicated by regular phase oscillations and grey scaled magnitude values. Figure is a -D rhythmic contour plot combining scale and phase to illustrate the rhythm frequency energy variation over time for a region centered at the 75th sample (.75 seconds). Figure 4 is a cross-sectional plot illustrating the frequencies present at that time point. The repeated rhythm induces a high (dark) ridge at the dotted semibreve voice and a clear hump for the triplet semiquaver IOI voice, a train of which surround 75th sample. The crochet voice the tactus for this rhythm, undulates in energy value, notably being the highest scale which remains distinct across the window. While impulses will create high phase congruency (Figure 5), the quaver impulse points which contrast to the triplet semiquaver sequences over longer timescales actually create lower points of local energy than their surrounding intervals. The repeat of the phrase produces higher local energy values than the first sequence due to the congruent dotted semibreve voice. Future Work This paper has demonstrated a phase congruent wavelet analysis for revealing rhythmic strata. This finds common ground with Lerdahl and Jackendoff s metrical and grouping structure process, however it does not represent a generative grammatical approach to interpreting those structures. Additionally, it is not suggested that listeners utilise a direct Gabor filter multiresolution process in their rhythm perception while noting that Kohonen has recently presented evidence for the self organisation of Gabor wavelet transforms [4]. Rather, the aim of this paper has been to eamine the information contained within a musical rhythm before any perceptual processing is performed. The intention is to make eplicit that information which is inherent in the rhythm, viewed as a formalisation of the decomposition conceptual model. In the current model, the energy density for each wavelet is proportional to it s frequency. Further work is to investigate rescaling the scalogram to better represent a cognitive model. While it is tempting to draw hypotheses for methods of derivation of the tactus by ridge-tracing or the well-formedness [7] of the global continuation of a voice, further research is required to build a model of tactus in respect of perceptual issues.

4.. Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï Ï.. Figure 1: The snare drum rhythm of Bolero. Magnitude of ~/AnalysedRhythms/bolero r. e e. q q. h h w. w. W 4 8 12 16 2 24 16 2 24 Phase of ~/AnalysedRhythms/bolero r. e e. q q. h h w. w. W 4 8 12 Figure 2: Time-Scale scalogram (top) and phasogram (bottom) displays of a CWT of the rhythmic impulse function of two repetitions of the rhythm in Figure 1. 115 95 75 55 2 15 Energy 1 5 r h h. q q. e e.. W w w. Figure : Combined magnitude and phase display of a portion of the CWT of Figure 2.

14 12 1 Energy 8 6 4 2 Figure 4: Cross-section at the 75th sample through the scalogram in Figure 2. The logarithmic scale produces dotted rhythmic units which do not evenly subdivide surrounding times. Normalised Energy.8.7.6.5.4..2.1 Dimensionless Local Energy of ~/AnalysedRhythms/bolero 4 8 12 16 2 24 Figure 5: Local energy display of a CWT of the rhythm in Figure 1. References [1] P. Desain. A (de)composable theory of rhythm perception. Music Perception, 9(4):49 54, 1992. [2] A. Grossmann, R. Kronland-Martinet, and J. Morlet. Reading and understanding continuous wavelet transforms. In J. Combes, A. Grossmann, and P. Tchamitchian, editors, Wavelets, pages 2 2. Springer-Verlag, Berlin, 1989. [] M. Holschneider. Wavelets: An Analysis Tool. Clarendon Press, 1995. 42 p. [4] T. Kohonen. Emergence of invariant feature detectors in self-organization. In M. Palaniswami, Y. Attikiouzel, R. J. Marks II, D. Fogel, and T. Fukuda, editors, Computational Intelligence: A Dynamic System Perspective, chapter 2, pages 17 1. IEEE Press, New York, 1995. [5] E. W. Large and J. F. Kolen. Resonance and the perception of musical meter. Connection Science, 6(2+):177 28, 1994. [6] O. E. Laske. Artificial intelligence and music: A cornerstone of cognitive musicology. In M. Balaban, K. Ebcioglu, and O. E. Laske, editors, Understanding Music with AI, pages 28. Massachusetts Institute of Technology, Cambridge, Mass, 1992. [7] F. Lerdahl and R. Jackendoff. A Generative Theory of Tonal Music. Massachusetts Institute of Technology, Cambridge, Mass, 198. 68p. [8] N. P. McAngus Todd. The auditory primal sketch : A multiscale model of rhythmic grouping. Journal of New Music Research, 2(1):25 7, 1994. [9] M. C. Morrone and R. A. Owens. Feature detection from local energy. Pattern Recognition Letters, 6: 1, December 1987. [1] A. V. Oppenheim and J. S. Lim. The importance of phase in signals. Proceedings of the IEEE, 69(5):529 41, 1981. [11] M. Yeston. The stratification of musical rhythm. Yale University Press, New Haven, 1976. 155p.