UNIVERSITY OF MIAMI BEAT DETECTION IN MUSIC USING AVERAGE MUTUAL INFORMATION. Margarita Maria Escobar Perez A RESEARCH PROJECT

Size: px

Start display at page:

Download "UNIVERSITY OF MIAMI BEAT DETECTION IN MUSIC USING AVERAGE MUTUAL INFORMATION. Margarita Maria Escobar Perez A RESEARCH PROJECT"

Virginia Tate
6 years ago
Views:

1 UNIVERSITY OF MIAMI BEAT DETECTION IN MUSIC USING AVERAGE MUTUAL INFORMATION By Margarita Maria Escobar Perez A RESEARCH PROJECT Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Master in Science in Music Engineering Technology Coral Gables, Florida December 2001

2 UNIVERSITY OF MIAMI SCHOOL OF MUSIC BEAT DETECTION IN MUSIC USING AVERAGE MUTUAL INFORMATION BY MARGARITA MARIA ESCOBAR PEREZ A Research Project Submitted to the faculty of the University of Miami in partial fulfillment of the requirements for the degree of Master in Science in Music Engineering Technology. Approved by: Prof. Ken Pohlmann, Project Advisor Music Engineering Technology Prof. Will Pirkle Music Engineering Technology Dr. Kenon D. Renfrow Keyboard Pedagogy and Performance Coral Gables, Florida November 16, 2001

3 ESCOBAR, MARGARITA M. (M.S., Music Engineering Technology) (Dec 2001) Beat Detection in Music Using Average Mutual Information Abstract of a Master s Research Project at the University of Miami Thesis supervised by Professor Ken Pohlmann. No. of pages in text. (90) In this research an alternative automatic beat detection method using average mutual information (AMI) is proposed and tested on music audio files. This method carries on envelope detection, decimation, and applies AMI (a non-linear correlation). AMI is a measure of how much information one can expect to derive from future measures of a system based on current observations. In this particular study, AMI is applied to music files to obtain the beat rate due to the dependence in the signal is assumed to be the beat. Experimental results show that the method is effective and could be a good option in applications where real-time are needed even though the AMI calculation could be demanding of computational resources. Some audio files were used for testing such as: train of pulses, popular music sampled from compact discs, and MIDI files. All files had constant beat rates. The system performed better with frames of 5 sec, fs= 8 khz, and with music files having beats clearly marked by percussion. The system presents some capacity to detect strong and weak beats (quarter note level), this opens opportunities in the development of algorithms capable of detecting levels of rhythmic structure in music. In addition, this study is relevant for those interested in beat detection and beat tracking. Keywords: Beat tracking; Beat detection; Rhythm detection, Average mutual information in acoustics. ii

4 3 TABLE OF CONTENTS 1 Introduction Psychoacoustics Human Ear Physiology Human Ear Sensitivity Binaural Hearing Pitch Loudness Timbre Masking Critical Bands Rhythm RHythm Structure in Music Beat Meter Tempo Accent Rhythm as Perception Cognitive Bases of Rhythm Behavior Gestalt Principles of Perception Rhythmic Tempo Perception Computer Models To Emulate Rhythm Perception Tempo and Beat Analysis of Acoustic Musical signals... 30

5 4 4.2 Real-time Beat Tracking for Drumless Audio Signals Deterministic Chaos fundamentals Phase Space Sensitive Dependence and Butterfly Effect Average Mutual Information (AMI) Proposing Average Mutual Information (AMI) for beat detection Envelope detection Decimation Average Mutual Information Calculation Tests and Results Train of Pulses Train Of Pulses With White Noise Train of Pulses (1 Hz) with White Noise Train of Pulses (2 Hz) with White Noise Train of Pulses (1 Hz) with much more White Noise MIDI Files MIDI file (80 bmp) MIDI file (110 bpm) MIDI file (124 bpm) Polyphonic Music Ballad Reggae Merengue... 75

6 Latin Pop Train of Pulses with Two Different Intensities Autocorrelation Comparison Conclusions Appendix Matlab Codes Train of Pulses Train of Pulses with White Noise Polyphonic Music TSTOOL for Matlab What is TSTOOL? What Software is Required to Run TSTOOL? On Which Systems Does TSTOOL run? References... 88

7 6 1 INTRODUCTION Automatic recognition of rhythm is a tool in the future development in computer music, interactive composition, responsive instruments, accompaniment systems, musicology, tempo tracking in live performance, estimation of meter and new methods of automatic music transcription. This research explores the fundamentals of deterministic chaotic systems (nonlinear dynamic systems) as an alternative method for automatic beat recognition in music audio files. Average Mutual Information (AMI) is used in nonlinear analysis, as a kind of non-linear correlation, which measures how much information one can expect to derive from future measures of a system based on current observations. It connects two sets of measurements with each other and establishes a criterion for their mutual dependence based on the notion of information connection between them. In this particular study, Average Mutual Information is applied to audio files to obtain the beat rate and the dependence in the signal is assumed to be the beat. Thus, the maximum value yielded from the Average Mutual Information calculation over the music excerpt is taken as the beat rate. Music files with constant beat rate, could be understood as a chaotic system where each instrument or voice is performing independently, but are connected by the beat. This is because all of them are following the same rhythm rate. A chaotic system is defined [36] as the deterministic system in which small changes in initial conditions may lead to completely different behavior in the future. A signal from a chaotic system is often at first sight indistinguishable from a random process despite the fact that it is forced by deterministic dynamics.

8 7 Psychoacoustics research over the years has presented good clues about the perception of rhythm even though the way human beings perceive and process rhythm is not yet well understood. The most relevant studies in automatic beat tracking that obtained good results were these studies of Scheirer [31], and Goto and Muraoka [12]. Scheirer s paper [31] presented a method based on a psychoacoustic simplification of how human beings perceive rhythm. The algorithm uses a small number of bandpass filters and banks of parallel comb filters to analyze the tempo and extract the beat from polyphonic music. On the other hand, Goto and Muraoka [12] presented a real-time beat tracking system that detects a hierarchical beat structure in music audio signals without drum sounds. This system not only tracks beats at the quarter-note level, but also detects beat structure at the half note and measure levels. These studies are reviewed in Chapter 4. The method presented in this thesis provides a simple way to detect the beat from audio files, opening the opportunities to applications that require few computational resources and would operate in real-time. Chapter Two explores the human auditory system s physiology and anatomy. Concepts such as pitch, loudness, timbre, auditory thresholds, masking, and critical bands are described and associated with the physical attributes of the sound. These concepts are useful and fundamental for the audio engineer when it is necessary to simplify models of audio systems. Chapter Three examines the basic components of rhythm s structure such as beat, meter, tempo, and accent. Studies over the years about the perception of rhythm are reviewed in an effort to understand how our brains detect, learn and predict rhythm.

9 8 Chapter Four overviews two successful studies in beat tracking systems, as a point of reference for this research. Chapter Five describes the basic concepts of non-linear systems that at first glance look random and are unable to be modeled under traditional linear methods. Through chaotic analysis these systems become congruent and, in some ways, predictable. The Butterfly effect, phase space, and average mutual information are reviewed and explained. Chapter Six proposes an alternative beat detection system. The main parts and concepts of the detection algorithm are explained and supported. The methodology, implementation, and tools used for the algorithm are also presented. Chapter Seven presents the results obtained in detecting the beat rate in several kinds of audio files, as a train of pulses, different genres of music, MIDI files. The testing parameters, programs, concepts and results are discussed in this chapter. Chapter Eight concludes this study about AMI applied to music audio signals for automatic beat detection. Further studies, implementations, and improvements also are suggested.

10 9 2 PSYCHOACOUSTICS Psychoacoustics explains the subjective response to everything we hear. It is the ultimate arbitrator in acoustical concerns because it is only our response to sound that fundamentally matters. Psychoacoustics seeks to reconcile acoustical stimuli and all the scientific, objective, and physical properties that surround them, with the physiological and psychological responses evoked by them. [25]. Psychoacoustics studies the relationship between acoustic sound signals, the auditory system physiology, and the psychological perception to sound, in order to explain the auditory behavioral responses of human listeners, the abilities and limitations of the human ear, and the auditory complex processes that occur inside the brain. Hearing involves a behavioral response to the physical attributes of sound including intensity, frequency, and time-based characteristics that permit the auditory system to find clues that determine distance, direction, loudness, pitch, and tone of many individual sounds simultaneously. 2.1 HUMAN EAR PHYSIOLOGY The human ear has three main subdivisions: the outer ear which amplifies incoming air vibrations, the middle ear that transduces these vibrations into mechanical vibrations, and the inner ear that filters and transduces these mechanical vibrations in hydrodynamic, and electro-chemical vibrations, with the result that electrochemical signals are transmitted through nerves to the brain. These three subdivisions are collectively classified as the peripheral auditory system. The Figure 2.1 shows a simplified view of the human ear.

11 10 Figure 2.1 Human Ear [40] The outer and middle ear structures enhance the sensitivity of hearing acting as a preamplifier of the sound energy spread out from its sources. Inside the outer ear the pinna captures more of the wave and hence more sound energy than the ear canal would receive without it. The auditory canal acts as a half closed tube resonator enhancing sounds in the range of 2-5 KHz. Inside the middle ear the tympanic membrane or eardrum receives vibrations traveling up the auditory canal and transfers them through the ossicles to the oval window, the port into the inner ear. The ossicles (hammer, anvil, and stapes) achieve a multiplication of force by lever action and amplification when listening to soft sounds, but they can also be adjusted by muscle action to attenuate the sound signal for protection against loud sounds. The inner ear consists of the semicircular canals that serve as the balance organ of the body and the cochlea that contains the basilar membrane and organ of Corti, which together form the complicated mechanisms that transduce vibrations into neural signal codes. The organ of Corti is the sensitive element in the inner ear, and it is located on the

12 11 basilar membrane in one of the three compartments of the cochlea. It contains four rows of hair cells. Above them is the tectoral membrane that can move in response to pressure variations in the fluid-filled tympanic and vestibular canals. There are some 16,000 20,000 of the hair cells distributed along the basilar membrane that follows the spiral of the cochlea and can resolve about 1500 separate pitches. According to the place theory, pitch is determined by the place along this collection of hair cells at which the maximum excitation occurs along the basilar membrane. On the other hand, timing and frequency theory states that the basilar membrane is assumed to move up and down in synchrony with the pressure variation of the sound wave by the movement of the stapes at the oval window. Each up and down movement results in one neural firing, so that frequency is coded directly by the rate of firing. For example, a 400 Hz tone results in hair cells firing 400 Hz per second. When the rate is above 1000 Hz, the frequency cannot be represented with individual cells, and the firings of many cells are integrated to create the correct firing rate. Further complex auditory processing occurs in the brain, using information contained in the neural signals passed on to the brain via the auditory nerve. The auditory nerve, by taking electrical impulses from the cochlea and the semicircular canals, makes connections with both auditory areas of the brain. In addition, physiologically, the left and right ears do not differ in their capacity for detecting sound, but the left and right brain halves do. The left cerebral hemisphere processes most speech (verbal) information; thus, the right ear that is wired to this brain side may be perceptually superior for spoken words. On the other hand, the left ear may

13 12 be better at perceiving melodies because it is connected to the right brain half that processes melodic (no verbal) information. 2.2 HUMAN EAR SENSITIVITY Psychoacoustics demonstrates how remarkable the human auditory system is in terms of absolute sensitivity and in terms of the range of intensities to which it can respond. The ratio between the powers of the faintest sound we can detect and the loudest sound we can hear without damaging our ears is 1,000,000,000,000:1. This shows that the ear can accommodate a very wide dynamic range of sound intensity, and it responds to increasing sound intensity in a logarithmic relationship. For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies. For mid-range sounds around 60 phons, the discrimination is not so pronounced and for very loud sounds in the neighborhood of 120 phons, the hearing response is nearly flat. This aspect of human hearing implies that the ear will perceive a progressive loss of bass frequencies as a given sound becomes softer and softer. Figure 2.2 shows the frequency intensity regions for auditory experience. Figure 2.2 Frequency Intensity regions for auditory experience. [13]

14 BINAURAL HEARING Binaural hearing is related to the fact that the ears are some distance apart allowing the localization of sound by registering the slight differences in time, phase, and intensity of the sound striking each ear. The ear can detect a time difference as slight as 30 µsec. Both the comparison of left and right ear receptions and the evaluation of the sound s intensity are done automatically, without any conscious thought, allowing us to identify the approximate location of the origin of a sound. 2.4 PITCH In psychoacoustics, the term pitch is considered to be the psychological perception of frequency. Much research has been done to find correlations between pitch and frequency, in which pitch is understood as a response pattern to the frequency of a sound. In music, pitch is defined as the position of a single sound in the complete range of sound. It is the feature of a sound by which listeners can arrange sounds on a scale from "lowest" to "highest." Sounds are higher or lower in pitch according to the frequency of vibration of the sound waves producing them. Musical notation uses a logarithmic measuring scale due to the logarithmic response to frequency of the ear. For example, two different octaves are heard as the same duration interval even though the frequency range of one octave is between 100 and 200 Hz and another octave is between 1,000 and 2,000 Hz. The audible frequency range is roughly between 20 and 20,000 Hz, the most sensitive region being from 1,000 to 5,000 Hz.

15 LOUDNESS Loudness is a subjective perception of the intensity of a sound, in terms of which sounds may be ordered on a scale extending from quiet to loud. Intensity is defined as the sound power per unit area. In Figure 2.3 equal loudness curves for the human ear are shown. Each curve describes a range of frequencies that are perceived to be equally loud. The curves are rated in phons, measuring the SPL of a curve at 1,000 Hz. These curves show that the ear is less sensitive to low frequencies, and also that the maximum sensitivity region for human hearing is around 1,000 to 5,000 Hz. The dotted curve represents the threshold of hearing. Figure 2.3 Equal loudness curves for the human ear. [25] The standard threshold of hearing at 1,000 Hz is nominally taken to be 0 db, but the actual curves show the measured threshold at 1,000 Hz to be about 4 db. 2.6 TIMBRE Sounds may be generally characterized by pitch, loudness, and sound quality or timbre. Timbre is that attribute of auditory sensation in terms of which a listener can

16 15 distinguish two similar sounds that have the same pitch and loudness. Timbre is mainly determined by the harmonic content and the dynamic characteristics of sound such as vibrato and the attack-decay envelope. Timbre is the characteristic that allows us to discriminate sounds produced by different instruments playing at the same time. 2.7 MASKING Simultaneous masking is a property of the human auditory system where some sounds vanish in the presence of louder sounds. For example, in the presence of very strong white noise, many weaker sounds get masked, or a tone of 500 Hz can mask a softer tone of 600 Hz. The strong sound is called the masker and the softer sound is called the maskee. This aspect of human hearing has important implications for the design of audio perceptual coders. The goal of audio perceptual coders is to minimize the amount of data to be coded without degradation or loss of information for the listener. Data reduction can be achieved in accordance with psycho acoustical algorithms based on the concepts of critical bands, minimum threshold of hearing, and the masking phenomena. The sound signals to be coded are compared to the minimum hearing threshold and the masking curve. When a sound signal falls bellow the minimum threshold of coding, the signal is coded by using the minimal quantity of bits, and signals that fall below the threshold are discarded, because the ear cannot hear them. 2.8 CRITICAL BANDS A critical band is the smallest band of frequencies that activate the same part of the basilar membrane. The ear can distinguish tones a few hertz apart at low frequencies and

17 16 tones must differ by hundreds of hertz at high frequencies to be differentiated. In any case, hair cells respond to the strongest stimulation in their local region that is termed a critical band. The concept of critical band was introduced by Fletcher in 1940 and has been widely tested. Experiments show that critical bands are much narrower at low frequencies than at high frequencies; three-fourths of the critical bands are below 5,000 Hz. Critical bands are analogous to a spectrum analyzer with variable center frequencies and any tone will create a critical band centered on it. Critical bands also can be explained in another way, when two sounds of equal loudness sounded separately are close together in pitch, their combined loudness when sounded together will be only slightly louder than one of them alone. They may be said to be in the same critical band where they are competing for the same nerve endings on the basilar membrane of the inner ear. If the two sounds are widely separated in pitch, the perceived loudness of the combined tones will be considerably greater because they do not overlap on the basilar membrane and compete for the same hair cells. If the tones are far apart in frequency (not within a critical band), the combined sound may be perceived as twice as loud as one alone. The theory of critical bands is an important auditory concept because they show that the ear discriminates between energy in the band, and energy outside the band, the former promotes masking.

18 17-3 RHYTHM 3.1 RHYTHM STRUCTURE IN MUSIC Although the attributes of musical rhythm are many and varied, the most agreed components of rhythm structure are beat, meter, tempo, and accent. Phase rhythm, melodic rhythm, rhythm pattern, or rhythm group are some names given to the rhythm of the melody and harmony, these are parts that overlie and/or are entwined with beat, meter, tempo, and accent, making it difficult to separate discussion of physical structure from rhythm as a psychological phenomenon Beat The beat is the unit division of musical time and it underlies rhythm s structural components, it also generally divides the duration into equal segments. Beat is often referred to as pulse, but beats are fundamental to music's metric structure while pulses are significant in relation to its rhythmic context. From the acoustic point of view, beats are loudness fluctuations. Beat frequencies occur when two nearly equal frequencies are sounded together. If two tones are about 15 Hz or less apart interference will result from their similar, though not exactly identical frequencies. Gradually they will move out of phase until at 180 destructive interference results, producing diminished loudness. When they move back into phase, constructive interference will produce increased loudness. Thus, beats are a form of amplitude modulation. As two frequencies are brought closer together, the beats will gradually

19 18 slow down and disappear when they become identical. In Figure 3.1 the superposition of two sine waves of 100 Hz and 110 Hz is shown. Figure 3.1 Superposition of two sine waves producing beats [40] Beats recur at a rate equal to the difference between the two frequencies, called the beat frequency. Thus the beat frequency produced by the 100 Hz and 110 Hz sine waves is 10 Hz Meter Meter involves a grouping of beats usually metric beats. In practice, the unit designated by a meter signature as receiving the beat is not always the same as the beat that is felt in response to the music. Thus, the metric beat is that which a meter signature indicates and the true beat is the one beat felt in response to music. Generally in much music the metric beat coincides with the true beat and is simply referred to as beat. Meter usually is considered in terms of notation and is commonly indicated by bar lines. In many types of music, the first beat of each measure should receive an accent, thus delineating the meter. It is important to note that music does not always conform

20 19 mechanically to this pattern because music is an expressive medium and is not merely mechanical or arithmetic. The usual approach to meter is indicated at the opening of a piece of music by a time signature and is defined in algebraic terms in the standard musical notation. They are represented most often by either quarter notes (e.g., in 2/4, 3/4 meter) or eighth notes (in 4/8, 6/8 meter) Tempo Tempo refers to the speed at which beats recur. In music notation, tempo is indicated by use of the traditional Italian terms: grave, largo, adagio, lento, andante, moderato, allegro, and presto (from slowest to quickest). More precise tempo indications are given in terms of metronome markings, that indicate the number of times a given note value or unit of time recurs in one minute. The note value indicated may coincide with either the metric beat or the true beat Accent Accent is the aspect of rhythm that makes prominent or emphasizes a beat. Creston [15] views accent as the "very life of rhythm" without which meter becomes monotonous and classifies it in eight types: dynamic, metric, harmonic, weight, pitch, pattern, and embellished. Kramer [15] maintains that there are just three types of accents: stress, rhythmic, and metric. Metric accents help to define the regular grouping of beats. Rhythm accents help define rhythmic groups and may serve to define groups at several levels, e.g., a motive, phrase, period, section, or movement. Whereas beat and meter provide reference points in musical time, tempo refers to the speed at which beats recur and accent provides a means for emphasizing a beat. A

21 20 listener, however, may group phrase or melodic rhythm patterns at various levels. The mind apparently seeks some organizing principle in the perception of music. When a grouping of sounds is not objectively present, the mind imposes one of its own. Experiments show that the mind instinctively groups regular and identical sounds into twos and threes, stressing every second or third beat, and thus creates from an otherwise monotonous series a succession of strong and weak beats. Regardless of how one labels or describes rhythm patterns, there is agreement that melodic rhythms overlay and entwine themselves in relation to the beat. Consequently, a psychoacoustic simplification of the model of rhythm perception could be done using as a base the perception of beat. 3.2 RHYTHM AS PERCEPTION Rhythm could be defined in terms of perceptual response as emotion in hearing a "dancing," "exciting," or "calm" rhythm. The response also might be behavioral, as clapping or tapping, or might be physiological, as in changes in heart rate or muscular movements. There is the familiar idea of rhythm as patterns of accentuated beats. These patterns may vary from moment to moment and they can be modified to make them more interesting. Musicologists refer to this incessant beating of drums as meter. There is another conception of rhythm that is the rhythm of organic movement; it is generated all day long, e.g. the rhythm of cascading water and howling wind, or the rhythm of speech. In contrast, this kind of rhythm lacks the repetitive, evenly paced accentuations of measured rhythm. In music, it is built up by a succession of irregular sonic shapes that combine in various ways and is called phrasing. These two conceptions of rhythm are sometimes referred to as vocal for phrasing and instrumental for meter.

22 21 Music could hardly exist without both kinds of rhythm. Meter gives order to time and without it music takes on the static quality of Gregorian chant. Without phrasing music becomes repetitious and banal. On the other hand, phrasing imparts a kind of narrative to music Consequently, to analyze them, the human brain requires some way to segment the longest sonic objects that music provides. It cannot wait until the end of a ten-minute composition to figure out what happened. The brain is always looking for clues about where musical objects begin and end. Rhythm exists in music to help the brain in this task, drawing lines around musical figures. A sequence of rhythmic markers tells the brain where the beginning or the end of a musical object is. Without rhythmic markers, the brain would quickly be overwhelmed by a multitude of observations. Rhythm is often associated to the beating if a clock, suggesting that it is concerned with measuring temporal durations. The brain measures the lengths of individual sounds and the silences that fall between them. It seeks patterns among these durations and then patterns among these patterns. In observing pitch space, our brains naturally perceive octaves that can be subdivided to form scales. Once a brain has become accustomed to a culture's scale structure it can use the scale's pitches as a framework for perceiving any composition. But time presents no natural unit of measure akin to an octave to guess the temporal scale. Without meter, we do not have anything to tell us how long any of the notes actually last. So the brain cannot approach a composition with fixed notions of temporal durations the way it can for pitch distances.

23 22 The core of the meter is the pulse, an unceasing clock-beat that rhythmic patterns overlay. Idealized pulses exist as the steady recurrence of contraction and relaxation, tension and release. Psychologically, a pulse constitutes a renewal of perception, a reestablishment of attention. It is a basic property of our nervous systems that they soon cease to perceive phenomena that do not change. Pulses keep unchanging phenomenon alive. This process of renewing attention comes so naturally to us that our nervous systems add pulse where none is found. When the brain begins to sense a train of pulses, it continues to anticipate them even when individual pulses disappear into silence, or into notes held long. Certain pulses are made more prominent by accenting them. Typically, every second or third or fourth note is played louder, causing our brains to automatically form groups of two or three or four beats, each group starting from accent. When meter is more than four beats, a brain perceiving five beats as two followed by three, or three followed by two, would strain to constantly readjust its scope. The brain tries to grasp the five beats as a whole. But five beats runs much longer than the two-and three-beat periods to which we are accustomed and many listeners cannot manage this. They complain that music written in 5/4 time "has no rhythm." A perceptual challenge for our brains is called Polyrhythm, which should be called "polymeter" since it is made by playing more than one meter at a time. It is difficult for the brain to simultaneously generate two rhythms, even when they are related. In Polyrhythm any combination is possible, and any number of meters can be combined. On the other hand, tempo is very important because the mechanics of music perception are very sensitive to the rate at which musical structures are presented to the

24 23 brain. Every aspect of the perception of music as individual tones, their timbre, their groupings, their harmonic relatedness, depend on speed of presentation. However, it is important to note that when music is played quickly, we may miss detail, but when it is played very slowly, the reach of the perceptual present is diminished and we may fail to observe groupings of melody, harmony and meter. Tempo and rhythm are strongly related. Tempo is the number of renewals of attention that establishes the underlying beat. In addition, the human skills of rhythm recognition are innate, but are quite different between a novice and a music conservatory student. The latter after a long period of practice is able to play rhythms written in common music notation and to recognize played rhythms, transcribing them into notation Cognitive Bases of Rhythm Behavior Due to its nature, the perceptual basis of rhythmic behavior has been more a matter of speculation and theory than research. Traditional psychology of music literature recognized instinctive, physiological, and motor theories as possible explanations for human interaction with musical rhythm. Lundin [28] proposed a learning theory in the development of rhythmic behaviors. Lundin's account of rhythmic response recognizes the importance of learning, which involves both perception and motor response. Perception of rhythm requires observation of rhythmic stimuli and may or may not involve overt behaviors. It involves both perceptual organization of rhythmic stimuli and discrimination among stimuli. Lundin contends that the ability to organize and discriminate among rhythmic stimuli is dependent on learning. He also viewed rhythm behavior as both a perceptual and behavioral response.

25 24 Seashore [28], as a major proponent of the instinctive theory, held that there are two fundamental factors in the perception of rhythm: an instinctive tendency to group impressions in hearing and a capacity for doing this with precision and stress. This theory reflects the position that rhythmic potential is an inherited trait, not a learned one. However, a number of studies provide data suggesting that training can improve rhythmic potentially disproving the theory. Jaques-Dalcroze [28] proposed that the human heart rate is a basis for musical rhythm and tempo. However, evidence to support the heart rate theory is entirely lacking. Mursell [28] criticized the heart rate notion on the basis that there is no psychological mechanism by which the heartbeat gives us our sense of time. Lund [28] reported no significant relationships between college students preferred tempi for selected popular songs and the rate of any of their objectively measured physiological processes. Recent research on tempo perception offers little or no support for physiological theories. While the natural rhythms of human physiology, including the menstrual cycle and cyclic changes in body temperature, wakefulness, and biochemistry, may influence a person's receptivity to musical stimuli, they are too lengthy, complex, and variable to explain rhythm responses to relatively short-term musical stimuli. The motor theory holds that rhythm depends on the action of the voluntary muscles. Schoen [28] noted that nearly all investigations concerning the nature of rhythmic experiences find a motor or musical factor, thus lending support to motor theory advocates. Mursell [28] and Lundin [28] both recognize motor theory as the most plausible of the traditional theories, but neither accepted it without reservation. Mursell argued that neuromuscular movement does not function in isolation from the human

26 25 brain. Rather, music functions in conjunction with the brain and central nervous system that control voluntary movements. Today, much of the research related to rhythmic behavior has focused on perception of various aspects of rhythm: the role of movement in the perception of rhythm, tempo perception, meter perception, perception of rhythm groups, and expressive rhythm in music Gestalt Principles of Perception The general principles that govern the perceptual organization of the auditory world correspond well to those described by the Gestalt psychologists. When we listen to rapid sequences of sounds, they may be perceived as a single perceptual stream or they may split into a number of perceptual streams. This process is known as primary auditory stream segregation or fission. Fission is more likely to occur if the elements making up the sequence differ markedly in frequency, amplitude, location, or spectrum. Such elements would normally emanate from more that one sound source. When two elements of a sound are grouped into different streams, it is more difficult to judge their temporal order than when they form part of the same stream. The principle of similarity is that sounds will be grouped into a single perceptual stream if they are similar in pitch, timbre, loudness or subjective location. In visual perception, similar objects tend to be grouped together as is shown in Figure 3.2. Rows and columns are equally spaced, but columns of X or 0 are perceived.

27 26 Figure 3.2 Gestalt Principle of Similarity [17] The principle of good continuation is that smooth changes in frequency, intensity, location or spectrum will be perceived as changes in a single source, whereas abrupt changes indicate a change in source. The principle of common fate is that if two components in a sound undergo the same kind of changes at the same time, they will be grouped and perceived as part of a single source. The principle of belongingness is that a given element in a sound can only form part of one stream at a time. The principle of closure is that when parts of a sound are masked or occluded, that sound will be perceived as continuous, provided that there is no direct sensory evidence to indicate that it has been interrupted. We tend to complete incomplete experience as is shown is Figure 3.3. Despite the lines are not completely finished, a Letter A is perceived. Figure 3.3 Principle of Closure [17] Usually, we attend primarily to one perceptual stream at a time. That stream stands out from the background formed by other streams. Stream formation places

28 27 constraints upon attention, but attention may also influence the information of streams. Stream formation may also depend upon information not directly available in the acoustic waveform Rhythmic Tempo Perception Beat is the unit division of musical time, the pace of the fundamental beat is called tempo (Italian "time"). The expressions slow tempo and quick tempo suggest the existence of a tempo that is neither slow nor fast "moderate" is often assumed to be that of a natural walking pace (76 to 80 paces per minute) or of a heartbeat (72 per minute). The tempo of a piece of music indicated by a composer is, however, neither absolute nor final. In performance, it is likely to vary according to the performer's interpretative ideas or to such considerations as the size and reverberation of the hall, the size of the ensemble, and to a lesser extent, the sonority of the instruments. A change within such limits does not affect the rhythmic structure of a work. Time provides a framework for auditory events where the onset and offset of sounds define those events. One temporal quality is whether the sound is roughly continuous (e.g., duct noise), oscillates in intensity (e.g., hand sawing), or is a series of discrete units (e.g., hammering, clapping, walking). Another temporal quality is the rhythm or timing between discrete sounds. Some physical systems are defined by damped rhythms in which successive sounds are progressively closer together in time (bouncing balls). [13]. Music involves the temporal patterning of stimulus features in addition to the well-known spectral aspects of stimuli. Langner [7] emphasized that music contains periodic fluctuations in amplitude, that is, envelopes of AM (amplitude modulation). Such AM information can be used to bind sounds in various frequency channels, as

29 28 separated by the cochlea, into a common sound source. Langner further points out to make use of this type of information the central auditory system must perform a periodicity analysis. The neurons have the ability to respond to different levels of the auditory system to respond reliably to different rates of AM sounds. The modulation transfer function is the common response to AM stimuli. This provides an index of the ability of a neuron to respond synchronously to AM envelopes of pure tones. The rate of AM to which a cell responds maximally is called the best modulation frequency or BMF. The perception in music involves the perceptual organization of patterns in time. Behavioral studies have revealed that listeners organize or group streams of sounds and silence. These studies suggested that grouping is done on the run and gap principle, namely, that patterns are organized to begin with the longest run of like elements and end with the longest gap (Garner, 1974) [7]. Perceptual grouping of temporal sequences is based on the stimulus element that elicits the largest response in the auditory system. The longest silent period, which perceptually completes a sequence, allows for the longest time for recovery, which would produce the largest response to the first element of the next pattern presented. Fraisse [7] drew a primary distinction between the perception of time and the estimation of time. The former is confined to temporal phenomena extending to no more that about 5 seconds or so, whereas the latter relies primarily on the reconstruction of temporal estimates from information stored in memory. The boundary between these two corresponds to the length of the perceptual present, which he defined as the temporal

30 29 extent of simulations that can be perceived at a given time, without the intervention of rehearsal during or after the stimulation. (Fraisse, 1978) [7]. Rhythm perception, therefore, is essentially concerned with phenomena that can be apprehended in this immediate fashion and is also closely tied up with motor functioning. In studies of spontaneous tapping, Fraisse observed that by far the most ubiquitous relationship between successive tapped intervals was a ratio of 1:1. Fraisse regarded this as intimately connected with anatomical and motor properties- most notably the bilateral symmetry of the body, the pendular movements of the limbs in walking and running, and the regular alternation of exhalation and inhalation in breathing. Both arrhythmic and rhythmic tapping as a break with the underlying tendency for pendular movement, but whereas there is no structure in the former case, the latter exploits a principle of identity or clear differentiation between time intervals. This principle of equality or differentiation creates two distinct categories of duration, long duration and short duration. These categories are not only quantitatively, but also qualitatively different.

31 30 4 COMPUTER MODELS TO EMULATE RHYTHM PERCEPTION As relevant material of study two papers were selected for review in this chapter. Both describe beat detectors in real-time; they attempt to emulate human rhythm perception and were designed for high level detection as from music sampled from compact discs. 4.1 TEMPO AND BEAT ANALYSIS OF ACOUSTIC MUSICAL SIGNALS Scheirer [31] presented a computational algorithm capable of producing behavior similar to the performance of human listeners in the detection of beat tracking or pulse in a variety of musical situations. This model has certain similarities to existing theories of sound perception that make it attractive as a psychoacoustic model of tempo perception. In this study, beat is considered as the fundamental perceptual attribute of rhythm and the sequence of equally spaced phenomenal impulses that define a tempo for the music. The grouping and strong/weak relationships that define rhythm and meter were not considered. His method presents certain kinds of signal manipulations and simplifications without affecting the perceived tempo and beat of a musical signal. In Figure 4.1, consider the signal flow network where an amplitude-modulated noise is constructed by vocoding a white noise signal with the sub-band envelopes of a musical signal. This is accomplished by performing a sub-band analysis of the music and also a white-noise signal is modulated with the amplitude envelope of the corresponding band of the

32 31 musical filterbank output. The resulting noise signals are summed together to form an output signal. Figure 4.1 Psychoacoustic simplification of rhythm perception. [31] The psychoacoustic simplification lies in that the only information preserved is the amplitude envelopes of the filterbank, because only this information is necessary to extract pulse and meter from a musical signal. This suggests that musical notes are not necessary components for rhythm perception. This is a vast reduction of input data size from the original signal. Certain other kinds of simplifications are not possible. Thus, it seems that separating the signal into sub-bands and maintaining the sub-band envelopes separately is necessary to do accurate rhythmic processing. Neither psychoacoustic experiments to examine the exact properties of filterbank or envelope manipulations were done to verify how much rhythm perception is not disturbed. Results suggested that a rhythmic processing algorithm should treat frequency bands separately, combining results at the end, rather than attempting to perform beat tracking on the sum of filterbank outputs.

33 32 Figure 4.2 shows an overall view of Scheirer s tempo analysis algorithm as a signal flow network. Figure 4.2 Schematic view of the processing algorithm. [31] As the signal comes in, a filterbank is used to divide it in into six bands. For each of these sub-bands, the amplitude envelope is calculated and the derivative taken. Each of the envelope derivatives is passed on to another filterbank of tuned resonators. In each resonator filterbank, one of the resonators will phase-lock. This is the one for which the resonant frequency matches the rate of periodic modulation of the envelope derivative. The outputs of the resonators are examined to see which ones are exhibiting phase-locked behavior, and this information is tabulated for each of the bandpass channels. These tabulations are summed across the frequency filterbank to arrive at the frequency (tempo)

34 33 estimate for the signal, and reference back to the peak phase points in the phase-locked resonators to determine the phase of the signal. The filterbank implementation in the algorithm has six bands; each band has sharp cutoffs and covers roughly a one-octave range. The lowest band is a low-pass filter with cutoff at 200 Hz; the next four bands are band-pass, with cutoffs at 200 and 400 Hz, 400 and 800 Hz, 800 and 1600 Hz, and 1600 and 3200 Hz. The highest band is high pass, with cutoff frequency at 3200 Hz. Each filter was implemented using a sixth-order elliptic filter, with 3 db of ripple in the pass band and 40 db of rejection in the stop band. Figure 4.3 shows the magnitude responses of these filters. Figure 4.3 Magnitude response of the frequency filterbank used in the system [31] The envelope is extracted from each band of the filtered signal through a rectifyand-smooth method. After this the first-order difference function is calculated and halfwave rectified; this rectified difference signal will be examined for periodic modulation. Figure 4.4 shows the envelope extraction process for one frequency band in each two signals.

35 34 Figure 4.4 Envelope extraction process. [31] The top panels show the audio waveforms, 2 Hz click track (left) and a polyphonic music example (right). The middle panels show the envelopes, and the bottom, the half-wave rectified difference of envelopes. The lowest filterbank is shown for the click track, the second highest for the music. Comb filters are often used in reverberators and other sorts of audio signal processing. They have properties that make them suitable for acting as resonators in the phase-locking pulse extraction process. The beat tracking algorithm uses a network of

36 35 resonators to phase-lock with the beat of the signal and determine the frequency of the pulse. Consequently, the comb filter with delay T will respond more strongly to a signal with period T that any other, since the response peaks in the filter line up with the frequency distribution of energy in the signal. Thus after the envelope has been extracted and processed for each channel, a filterbank of comb filter resonators is implemented in which the delays vary by channel and cover the range of possible pulse frequencies to track. The output of these resonator filterbanks is summed across frequency subbands. By examining the energy output from each resonance channel of the summed resonator filterbanks, the strongest periodic component of the signal may be determined. The frequency of the resonator with the maximum energy output is selected as the tempo of the signal. Figure 4.5 shows the summed filterbank output for a 2 Hz pulse train and for a polyphonic music example (bottom). The horizontal axis are labeled with "metronome marking in beats per minute, that is, 120 MM=2Hz. This is a direct mapping of the delay of the corresponding comb filter. The polyphonic music shows more overall energy, but the tempo is still seen clearly as peaks in the curve.

37 36 Figure 4.5 Tempo estimation [31] The phase is determinate once its tempo is known by examining the output of the resonators directly, or even better, by examining the internal state of the delays of these filters. The vector w of delays can be interpreted at a particular point in time as the "predicted output" of that resonator. That is, the w vector contains the next n samples of envelope output that the filter would generate in response to zero input, where n is the period of the filter. The sum of the delay vectors over the all frequency channels for those resonators corresponding to the tempo determined in the previous step is examined. The peak of this prediction vector is taken as the estimate of when the next beat will arrive in the input. The ratio ω=2π(tn-t)/t, where tn is the time of the next predicted beat, t the current time, and T the period of the resonator, is the phase ω of the tempo being tracked. The phase and period may be used to predict beat times as far into the

38 37 future as desired. In Figure 4.6 the phase estimates, after tracking 5 s of a 2Hz click track (top) and polyphonic music example (bottom), are shown. Figure 4.6 Phase estimation. [31] The x-axis in each case covers the next full period of the resonator tracking the tempo and the peak of the curve shows where the next beat is predicted to occur. The implementation of the model performs the phase analysis every 25 ms and integrates evidence between frames in order to predict beats. The performance of the algorithm was evaluated in both qualitative and quantitative manners. For the qualitative performance, 60 ecological music excerpts were tested with the implemented algorithm using a short application, that reads a sound sample off of disk, causally beat-tracks it, and writes a new sound file with clicks (short noise bursts) added to the signal where beats are predicted to occur. A selection of these sound files is available on:

39 38 Forty-one of 60 samples (68%) were qualitatively classified as being tracked accurately, and another 11 (18%) as being tracked somewhat accurately. Based on these results the algorithm seems quite successful at tracking the musical beats. On the other hand, a short quantitative validation experiment was conducted to test whether the beat-tracking algorithm performed generally like a human listener. Five adult listeners, experienced musicians with normal hearing, all graduate students and staff members at the MIT Media Laboratory, participated in the experiment. Subjects listened to seven musical examples; drawn from different musical genres, through headphones. They indicated their understanding of the beat in the music by tapping along with the music on a computer keyboard. All seven trials were run in the same sequence for each listener, in a single block. The experiment was not counter-balanced based on an assumption that there is little training effect in this task. The entire experiment took approximately 5 min per subject. Results indicate that the algorithm was as regular as a human listener for five of the seven trials. 4.2 REAL-TIME BEAT TRACKING FOR DRUMLESS AUDIO SIGNALS Goto and Muraoka [12] presented a real-time beat tracking system that recognizes a hierarchical beat structure in musical audio signals without drum-sounds. The system detects a beat structure of three rhythmic levels: the quarter note level, the half note level, and the measure level in music sampled from popular compact discs (see Figure 4.7). They proposed a method of detecting chord changes to make musical decisions about the audio signals using heuristic musical knowledge. The purpose of this study was to build a beat-tracking system useful in applications as music-synchronized CG animation, video/audio editing, and human-computer improvisation in live ensemble.

40 39 Figure 4.7 Beat-Tracking Problem [12] They defined beat times as the temporal positions of almost regularly spaced beats corresponding to quarter notes and the sequence of beat times is called the quarter note level. Then they find the beginnings of half notes and measures. The sequence of half note times is obtained by whether a beat is strong or weak. The beat-tracking system for musical audio signals without drum-sounds provides a real time output called beat information (BI) that consists of the beat time, its beat types and the current tempo. Figure 4.8 shows the system.

40 Figure 4.8 Overview of the Goto and Muraoka s beat-tracking system [12] The system first digitizes an input audio signal in the A/D conversion stage.

41 40 Figure 4.8 Overview of the Goto and Muraoka s beat-tracking system [12] The system first digitizes an input audio signal in the A/D conversion stage. Then in the frequency analysis stage, multiple onset-time finders detect onset times in different ranges of the frequency spectrum, and those results are transformed into vectorial representation (called onset-time vectors) by onset-time vectorizers. In the beat prediction stage, the system manages multiple agents that, according to different strategies, make parallel hypotheses based on those onset-time vectors. Each agent first calculates the inter-beat interval and predicts the next beat time. By communicating with a chord change checker, it then determines the beat types and evaluates the reliability of its own hypothesis. A hypotheses manager gathers all hypotheses and then determines the final output on the basis of the most reliable one. Finally, in the beat information (BI) transmission stage, the system transmits BI to application programs via a computer network. The method detects chord changes by analyzing the frequency spectrum sliced

42 41 at provisional beat times. The results show that the beat detection rates obtained with real-world audio signals were more than 87.5% and that the method of detecting chord changes and basic music decisions on chord changes were effective enough to contribute to determining the hierarchical beat structure comprising the three rhythmic levels. They also developed an application that displays real-time computer graphics dancers whose motions change in time to musical beats (Figure 4.9). Figure 4.9 Goto and Muraoka s virtual dancers synchronized with musical beats [12] This application shows that the system is useful in multimedia applications in which human-like hearing ability is desirable. They plan to upgrade the system by generalizing it to other musical genres and enabling it to follow tempo changes. Also they are looking forward to using other higher level musical structure and will include applications to various multimedia systems for which beat tracking is useful, such as systems for video/audio editing, controlling stage lighting, and synchronizing various computer graphics with music.

43 42 5 DETERMINISTIC CHAOS FUNDAMENTALS The ancient word chaos originally denoted a complete lack of form or systematic arrangement, but today is often used to imply the absence of some kind of order that ought to be present. Moreover chaos is regarded as a universal phenomenon that is observed in many fields. Terms such as non-linearity, complexity, and randomness are often used more or less synonymously with chaos in one or several of its senses. Chaos could be compared to the manner in which many disorganized systems can spontaneously acquire organization, just as a shapeless liquid mass can, upon cooling, solidify into an exquisite crystal. Mathematicians have defined chaos as stochastic behavior occurring in a deterministic system. Therefore, chaos theory is the popular label for a body of theory about certain mathematical models and their applications that study deterministic systems so sensitive to measurement that their output appears random. Classic systems that vary deterministically as time progress, such as mathematical models, are known as dynamic systems. At least in the case of the models, the state of the system may be specified by the numerical values of one or more variables. A deterministic sequence is one in which only one thing can happen next because is governed by precise laws. On the other hand, a random sequence of events is one in which anything that can ever happen can happen next. Usually it is also understood that the probability that a given event will happen next is the same as the probability that a like event will happen at any later time. Hence, a random system is a system in which the progression from earlier to later states is not completely determined by any law. It could also be expressed as a system that is not deterministic.

44 43 In general, supposing a real-world phenomenon whose state at a particular time can be characterized by the values of the n variables x 1, x 2,, x n, (so x i might represent the angular position and velocity of a swinging pendulum, or might indicate the relative concentrations of certain chemicals in a mixture, or the velocity and temperature gradients in a convecting fluid). If we choose the right quantities to represent these state variables, then we may be able to specify the dynamics of the phenomenon, the way that the phenomenon evolves over time by giving the rate of change of each variable as some function of x i. In other words, we may be able to describe the system dynamics by means of a set of n linked differential equations in the canonical form dx i = Fi ( x1, x2, x3,..., xn ) i = 1,.., n. dt Equation 5-1 Canonical form of a differential equation If the F i in the set of equations 5-1 satisfy some relatively mild constraints, then there will be a unique solution to the equations for a given set of initial conditions. In other words, a particular setting of the parameters and the initial values of xi at time t = t 0 will fix a unique set of values for x i at least for some interval of time around t 0 (and perhaps for all times). In the general case, we will not be able to write down an explicit solution we won t be able to specify the value of each x i in terms of polynomial or trigonometrical functions of time. So to use the equations we will have to resort to numerical integration by computer. But the point of the principle remains: the set of equations determines a unique evolution of the state variables over some period of time. Hence, the equations describe a mathematical model that is deterministic in a straightforward sense.

45 PHASE SPACE Generally, it is helpful to look at things geometrically. So imagine the values of the n state-variables x i as giving the coordinates of a point in an abstract n-dimensional space, a so-called state space or phase space. A point x in this phase space with the coordinates x,..., 1, x2 xn will then represent a particular instantaneous state of our dynamic system. And, given a point x(0) representing the state at some initial time t = t 0, the dynamic equations (with fixed parameters) will, in the deterministic case, fix a unique trajectory or path traced out in phase space by the point x(t) representing the state at later times t. (See Figure 5.1) If we are to apply a mathematical model to predict the evolution of some real-world dynamic phenomenon, then we must start by fixing the initial conditions to feed into the model. But, we can only know the actual initial conditions with some margin of error. If we input a small error in representing the initial real-world state, then the dynamic equations will output a correspondingly erroneous prediction about where the system ends up at later time t (and the predictive error may very well grow over time). Figure 5.1 A phase space trajectory [35]

46 45 To put it geometrically, we can only pin down the point representing the initial state of the dynamic system to within some small fuzzy-boundaried ball of phase space and our dynamic equations will then map that fuzzy initial region of phase space onto a possibly much more spread-out region that will only contain the point representing the later state at t as shown in Figure 5.2. Figure 5.2 A small ball of initial states spread out by the dynamics. [35] In order to make use of a model predictively, we need to know something about just how spread out that later region is. That is, we need to know how quickly the dynamic model propagates initial errors. The hypothetical multidimensional space in which such a diagram would have to be drawn, and describes a chaotic system by using various indices, is known as phase space. In other words, phase space is a hypothetical space having as many dimensions as the number of variables needed to specify a state of a given dynamic system. Each point represents a particular state of a dynamic system. The coordinates of a point in phase space (distances in mutually perpendicular directions from some reference point, called the origin) are numerically equal to the values that the variables assume when the state occurs. Even though the concept of these diagrams can be useful, sometimes the

47 46 diagrams cannot be drawn in the phase space to include as many dimensions as the number of variables in the system. In the phase space of chaotic dynamic systems, two orbits slightly separated from each other will differ exponentially with time. The degree with which two infinitesimally separated orbits move away from or approach each other is measured by the Lyapunov exponent, and is calculated by the long time average of the algorithm of the amplification (reduction) rate of the difference between the two orbits. Since the number of directions for the deviation of the two orbits is equal to the number of degrees of freedom in the phase space, the number of degrees of separation is equal to the dimension of the phase space. Thus, the number of Lyapunov exponents is the same as the number of degrees of freedom. Chaos is often characterized by a system having at least one positive Lyapunov exponent. In other words, when an initial value is changed only slightly, a later state becomes very different. The system is said to have a sensitive dependence on initial conditions. This instability of orbits generates a sensitive dependence on initial conditions and can be measured by the Lyapunov exponent. 5.2 SENSITIVE DEPENDENCE AND BUTTERFLY EFFECT One mark of chaos is sensitive dependence on initial conditions because a chaotic system starting from two very similar initial states can develop in radically divergent ways. An immediate consequence of sensitive dependence in any system is the impossibility of making perfect predictions or even mediocre predictions sufficiently far into the future. This assertion presupposes that we cannot make measurements that are completely free of uncertainty. When Edward Lorenz published his paper: Predictability: Does the Flap of a Butterfly s wings in Brazil Set off a Tornado on

48 47 Texas? such sensitive dependence on initial conditions is often referred to as The Butterfly Effect, this is because very small changes in initial conditions can become greatly amplified by later events in ways that prevent useful prediction [35]. A small blue butterfly, let s suppose, sits on a cherry tree in a remote province of China. As is the way of butterflies, while it sits it occasionally opens and closes its wings. It could have opened its wings twice just now; it in fact it moved them only once. And- because the weather system exhibits sensitive dependence the minuscule difference in the resulting eddies of air around the butterfly eventually makes the difference between whether, two months later, a hurricane sweeps across southern England or harmlessly dies out over the Atlantic. Or so the story goes. [35]. Chaos is a type of unpredictable motion generated by deterministic equations (differential equations or difference equations). Lorenz for purposes of experimentation created a new system with three nonlinear differential equations (Equations 5-2, 5-3 and 5-4) to simulate an extremely simple model of convection in the atmosphere. dx dt = 10 x +10y Equation 5-2 dy dt = 28 x y + xz Equation 5-3 dz dt 8 = 3 z + xy Equation 5-4

49 48 Even though these equations do not have an explicit analytic solution, a simple computer numerical program can solve the Lorenz s system of equations. When Lorenz performed the numerical integration, he found that, for almost any initial state, the model soon settles with the values of x, y and z confined between definite limits. Within those limits though, the values vary in highly complex ways. Figures 5.3, 5.4 and 5.5 show sample runs of the system for an arbitrary initial conditions set of x=y=z=t= The combination of all three variables locates a point in three-dimensional space and results in the phase space diagram all over time. The thinking behind the phase space plot is to provide an idea of what the system is like by containing the output for a long period of time in a single graph. Figure 5.3 Lorenz Attractor X variable vs. Time [42] Figure 5.4 Lorenz Attractor Y variable vs. Time [42]

50 49 Figure 5.5 Lorenz Attractor Z variable vs. Time [42] Figure 5.6 shows the phase space plot of the system. The variables x, y, and z are acting together over a period of thirty two seconds in this simulation. The image is known as the Lorenz Attractor and is one of the earliest examples of chaos ever recorded. It is also been referred to as Lorenz s Butterfly. The Lorenz attractor always has the familiar butterfly shape, no matter how random each variable may appear to be on its own, the combination of the three always produces the same picture. Figure 5.6 The Lorenz Attractor from the X-Z plane [42] Figure 5.7 shows another run of the Lorenz attractor program for slightly different initial conditions.

51 50 Figure 5.7 Lorenz Attractor X vs. Time for slightly different initial conditions [42] Note in Figure 5.8, where it is plotted against the earlier x variable results, how the outputs stay nearly the same for a good portion of time at the beginning, but diverge into completely different patterns. Figure 5.8 Lorenz Attractor X vs. Time comparison [42] Figure 5.9 shows again the Lorenz attractor, except this time with the same slight variation in initial conditions as observed in Figures 5.7 and 5.8. It manages to maintain its same butterfly shape, despite the utter lack of correlation to Figure 5.8.

52 51 Figure 5.9 The Lorenz Attractor with slightly different initial conditions [42] 5.3 AVERAGE MUTUAL INFORMATION (AMI) Mutual information is a general measure based on information theory of the extent to which the values in a time series can be predicted by earlier values. But it is not limited to linear dependence as is the autocorrelation function. The correlation function estimates the correlation or how much related are two random processes from each other. The true cross-correlation sequence is defined by * { X Y m} γ xy ( m) = E n n + Equation 5-5 Where X n and Y n are stationary random processes, <n< and E {} is the expected value operator. Autocorrelation is handled, as a special case of correlation and it is useful in obtaining a partial description of a time series for forecasting. The autocorrelation function of X n is defined as [ ] x[ n] x[ n + m] A m N = 1 n= 0

53 52 Equation 5-6 samples. Where the average is taken over N samples and m is the autocorrelation time in In order to explain AMI let s consider an experiment A with possible outcomes A 1, A 2, A 3, A. If the respective probabilities are p A ), p A ), p A ),, p A ), n the uncertainty of the outcome can be assessed. If a system is deterministic there is no point in performing an experiment because all p A ) are zero except one. On the other hand, all p A ) are equiprobable, the uncertainty of the outcome is at the maximum and ( i the information gained by carrying out the experiment is also maximal. Consequently, the information obtained by a measurement of the outcome of a finite scheme A can be expressed through the corresponding entropy H(A) H(A)= p ( A) log 2 p ( A) i i ( i ( 1 ( 2 ( 3 ( n Equation 5-7 Whereas H(A) is defined as the entropy that is a measure of randomness, the more random a variable is, the more entropy it has as is presented in Figures 6.2 and 6.3. Figure 5.10 High entropy density Figure 5.11 Low entropy density

54 53 In order to determine higher order relationships, it is necessary to introduce higher order measures. For example, if measurements are collected from two schemes A {[ A p( A )], [ A, p( A )],...[ A, p( A )]} and B {[ B p( B )], [ B, p( B )],...[ B, p( B )]} 1, n n 1, n n, the mutual information I(A,B) is the measure of how much can be said about the one given the other. I ( A, B) = m n j= 1 i= 1 p( A, B )log i j 2 p( A, B ) i i p( A ) p( B ) j j Equation 5-8 I ( A, B) = H ( A) + H ( B) H ( A, B) Equation 5-9 Here H(A,B) refers to the information obtained considering A and B together, H ( A, B) = H B ( A) + H ( B) Equation 5-10 In which H B (A) denotes conditional entropy the entropy of A given B. If A and B are independent, the terms H B (A) and H(A) become equal, reducing H(A,B) to H(A)+H(B) and finally implying that mutual information between A and B amounts to zero - I(A,B)=0. It should also be noted that I(A,B) 0 A, B so that there are no negative values as in the case of autocorrelation function. Defined in other way, average mutual information is the reduction in uncertainty for one variable due to knowing about another. Therefore, mutual information

54 I ( A, B) = H ( A) H ( A) = H ( B) H ( B) B A Equation 5-11 Figure 6.1 presents graphically the average mutual information I ( A, B) between the random variables A and B. Figure 5.

55 54 I ( A, B) = H ( A) H ( A) = H ( B) H ( B) B A Equation 5-11 Figure 6.1 presents graphically the average mutual information I ( A, B) between the random variables A and B. Figure 5.12 Average Mutual Information, I(A,B) Within the context of nonlinear deterministic systems and chaos theory, AMI is used to determine the time delay for the phase space reconstruction. For the analysis of a signal s ( n), A i is considered to be the measurement of the signal at time n and measurement of the signal a time T later ( n T ) B i is the s +. Then the first minimum of I (T ) (AMI) is selected as the time delay to use in making vectors out of the observed onedimensional data s ( n). So we take as the set of measurements A the values of the observable s ( n) and for the B measurements, the values of ( n T ) s +. Then, the AMI between these two measurements, that is, the amount in bits learned by measurements of s ( n) through measurements of ( n T ) s + is

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner University of Rochester ABSTRACT One of the most important applications in the field of music information processing is beat finding. Humans have