Introduction. Improvements to Standard FFT Usage

Similar documents
by Ken Lindsay for CS505 Spring 2006 Southern Oregon University Ashland, Oregon USA

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

COMPUTATIONAL RHYTHM AND BEAT ANALYSIS Nicholas Berkner. University of Rochester

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

FFT analysis in practice

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Extraction of Musical Pitches from Recorded Music. Mark Palenik

SGN Audio and Speech Processing

Topic 6. The Digital Fourier Transform. (Based, in part, on The Scientist and Engineer's Guide to Digital Signal Processing by Steven Smith)

Speech/Music Change Point Detection using Sonogram and AANN

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

MUSC 316 Sound & Digital Audio Basics Worksheet

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Lab 3 FFT based Spectrum Analyzer

URBANA-CHAMPAIGN. CS 498PS Audio Computing Lab. Audio DSP basics. Paris Smaragdis. paris.cs.illinois.

Drum Transcription Based on Independent Subspace Analysis

Laboratory Assignment 4. Fourier Sound Synthesis

CSC475 Music Information Retrieval

COM325 Computer Speech and Hearing

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

8.3 Basic Parameters for Audio

Automatic Evaluation of Hindustani Learner s SARGAM Practice

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

A hybrid virtual bass system for optimized steadystate and transient performance

Computer Audio. An Overview. (Material freely adapted from sources far too numerous to mention )

Hearing and Deafness 2. Ear as a frequency analyzer. Chris Darwin

A New General Purpose, PC based, Sound Recognition System

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

MUSICAL GENRE CLASSIFICATION OF AUDIO DATA USING SOURCE SEPARATION TECHNIQUES. P.S. Lampropoulou, A.S. Lampropoulos and G.A.

Auditory modelling for speech processing in the perceptual domain

The psychoacoustics of reverberation

The exponentially weighted moving average applied to the control and monitoring of varying sample sizes

SUB-BAND INDEPENDENT SUBSPACE ANALYSIS FOR DRUM TRANSCRIPTION. Derry FitzGerald, Eugene Coyle

FFT 1 /n octave analysis wavelet

Speech and Music Discrimination based on Signal Modulation Spectrum.

Monaural and Binaural Speech Separation

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

A CLOSER LOOK AT THE REPRESENTATION OF INTERAURAL DIFFERENCES IN A BINAURAL MODEL

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Wavelet Transform Based Islanding Characterization Method for Distributed Generation

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

SAMPLING THEORY. Representing continuous signals with discrete numbers

Notes on Fourier transforms

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

Chapter 12. Preview. Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect. Section 1 Sound Waves

Testing of Objective Audio Quality Assessment Models on Archive Recordings Artifacts

Since the advent of the sine wave oscillator

Lab P-4: AM and FM Sinusoidal Signals. We have spent a lot of time learning about the properties of sinusoidal waveforms of the form: ) X

Chapter 4. Digital Audio Representation CS 3570

SGN Audio and Speech Processing

Sound Synthesis Methods

Different Approaches of Spectral Subtraction Method for Speech Enhancement

The 29 th Annual ARRL and TAPR Digital Communications Conference. DSP Short Course Session 1: DSP Intro and Basics. Rick Muething, KN6KB/AAA9WK

ON THE IMPLEMENTATION OF MELODY RECOGNITION ON 8-BIT AND 16-BIT MICROCONTROLLERS

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Exploring Haptics in Digital Waveguide Instruments

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

SOUND QUALITY EVALUATION OF FAN NOISE BASED ON HEARING-RELATED PARAMETERS SUMMARY INTRODUCTION

AUTOMATED MUSIC TRACK GENERATION

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Reducing comb filtering on different musical instruments using time delay estimation

Pickin. Objective. Resources. The Skills Map. Assessments

Auditory filters at low frequencies: ERB and filter shape

EE 215 Semester Project SPECTRAL ANALYSIS USING FOURIER TRANSFORM

DCSP-3: Minimal Length Coding. Jianfeng Feng

Convention Paper Presented at the 120th Convention 2006 May Paris, France

Computer Generated Melodies

DSP First. Laboratory Exercise #11. Extracting Frequencies of Musical Tones

Evaluation of Audio Compression Artifacts M. Herrera Martinez

Assistant Lecturer Sama S. Samaan

HCS 7367 Speech Perception

Using the Gammachirp Filter for Auditory Analysis of Speech

YEDITEPE UNIVERSITY ENGINEERING FACULTY COMMUNICATION SYSTEMS LABORATORY EE 354 COMMUNICATION SYSTEMS

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Roberto Togneri (Signal Processing and Recognition Lab)

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 13 Timbre / Tone quality I

Synthesis Techniques. Juan P Bello

Qäf) Newnes f-s^j^s. Digital Signal Processing. A Practical Guide for Engineers and Scientists. by Steven W. Smith

TIME DOMAIN ATTACK AND RELEASE MODELING Applied to Spectral Domain Sound Synthesis

Interpolation Error in Waveform Table Lookup

Transcription of Piano Music

Auditory Based Feature Vectors for Speech Recognition Systems

Lab 9 Fourier Synthesis and Analysis

Principles of Musical Acoustics

Research on Extracting BPM Feature Values in Music Beat Tracking Algorithm

Fourier Series and Gibbs Phenomenon

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

Chapter 1. Introduction

Nonuniform multi level crossing for signal reconstruction

Timbral Distortion in Inverse FFT Synthesis

ROOM AND CONCERT HALL ACOUSTICS MEASUREMENTS USING ARRAYS OF CAMERAS AND MICROPHONES

Transcription:

NEW SIGNAL PROCESSING TECHNIQUES FOR IMPROVED INFORMATION EXTRACTION FROM MUSIC AND AUDIO DATA Ken Lindsay Information Scientist ken@tlafx.com (650) 520-4536, (541) 552-1509 (h) 2007

Introduction The purpose of this report is to convey new (and newly rediscovered) signal processing techniques which I believe are imminently practical in the field of speech recognition, music content analysis and search, and applications to other feature recognition tasks in one dimensional data sets. I also include some forward looking ideas which have not yet been implemented as far as I know, and if discussed in the literature, such talk is minimal. Given the large discrepancy between human audio processing capabilities and the abilities of computers, new techniques for software should produce substantial advances in automated audio cognition. In my recent thesis research I analyzed the nature of swing rhythm in music. The image on the title page is a spectrogram of Natalie Cole singing Fever in the 2004 recording by Ray Charles. Useful details of both rhythm and vocal are clearly visible. In many other cases, the complexity of the musical clip was beyond the practical analysis limits of the standard STFT spectrogram approach, and I wished for, conceptualized and discussed at some length, my ideas for improving Fourier analysis. My advisor and others were skeptical, at least until I started to uncover prior work similar to my ideas. Improvements to Standard FFT Usage Fulop and Fitz (2006) published a new(ly rediscovered) approach to the spectrogram which utilizes the phase information in the complex form of the standard FFT to reassign time and frequency results to new locations, and so extract better details. Their results are an impressive improvement on the traditional approach, and computationally cheap, being based on finite difference methods applied to the FFT results. They give details about three forms of the algorithmic approach, all of which are basically equivalent in terms of compute cost and quality of results. Oddly, they do not go beyond first order finite differences. I believe their approach is just a beginning, and that a variety of useful heuristics can be found which enhance the results for particular applications, e.g. choosing FFT length, windowing functions, and overlap (for STFT) that tease out features in particular frequency ranges, due to artifacts caused by the inherent limits present in the parameter choices artifacts which might be theoretically offensive, but may be very useful in the real world. Step size(s) choice or using higher orders in the finite difference scheme may also reveal useful tricks. Fulop & Fitz give a good summary of the history of the reassigned spectrogram, going back some fifty years, although the main breakthrough came in the 1970 s. They also touch on other practical considerations such as the time/frequency information resolution tradeoff limits that are analogous to the Heisenberg uncertainty principle. I am curious what aspects of some of my other ideas can be uncovered in prior work. One of my main complaints in my research was the waste of throwing away all phase information in the FFT, so I find the Fulop & Fitz technique very appealing. New DSP Techniques 2007 Ken Lindsay ( ken@tlafx.com ) ( samba4ken@gmail.com ) 1

Alternatives to Harmonic Fourier Series Traditional Fourier Analysis is based on using sines and cosines whose frequencies are related by the set of integers, the so-called harmonics, to generate a set of basis functions for estimating objects in the function space of interest. Of course, most spaces have infinitely many sets of basis functions, and Fourier series are only one of many possible choices of basis sets for audio work. From the common talk however, one would think that not only is harmonic analysis the only choice, but also that it is somehow true and accurate despite obvious limitations. The audio space that sources the stream we listen to, or analyze in the computer, is more complicated than the model used in standard Fourier Analysis processing. At its most basic, sound is a 4 dimensional system 3D plus time. There are also important effects in human audio perception such as echoes and phase shifts from the shape of the pinna (outer ear), binaural hearing, and movements of the head, all of which add dimensionality to the analysis. This complexity is an important aspect of hearing, clear to anyone who uses their ears in a critical, self aware manner. While theoretical constructs from math, DSP, information theory or experimental audiology research are useful tools for analyzing this information, I think it is important to go with the primary experience rather than to let abstractions unduly color the understanding of sound. The FFT has a very attractive aspect efficiency of the Cooley-Tukey algorithm. This is also known as the Danielson-Lancszos algorithm, or Runge-Konig algorithm. This algorithm has also been rediscovered several times. Press et al. (2002) trace the first description of this approach to Gauss in 1805. The efficiency gain is well known. What is less well known is the mechanics of the approach. This involves factoring the commonalities of the complex exponential functions that are duals of the sines and cosines of the real valued form. I ve discussed my idea with my math advisors and got the Sure, sounds like that should work if you can figure it out response. Given a sparse, non linear spaced set of integers to determine the basis frequencies, the trick would be to factor the common exponential coefficients that combine to determine the various frequencies as is done in Cooley- Tukey. Such a technique may already exist, buried in deep theoretical math articles. A quadratic frequency distribution would be the first obvious approach. This could speed up an already quick algorithm by omitting calculation of redundant frequencies (basis vectors). The primary benefits of a good non-harmonic Fourier scheme are efficiency and precision of detail. The standard decomposition of an audio signal by linearly spaced harmonics causes 3/4 of the frequencies extracted to be in the 5000 Hz and up range. Most of the important information for tasks like pitch detection and speech recognition are below 5 KHz. Omitting many of the high frequency (redundant) harmonics would speed up the algorithm many times, if the same trick of factoring complex exponentials could be used. New DSP Techniques 2007 Ken Lindsay ( ken@tlafx.com ) ( samba4ken@gmail.com ) 2

High resolution in the low frequencies of a Fourier decomposition requires longer FFTs than low resolution. Adequate resolution for tracking a melody would require lengthy FFT windows that dilute the signal strength as it is spread over too much time, blurring frequency resolution, but in a much different way from commonly known FFT window resolution problems. A non-harmonic scheme would map the high frequencies to far fewer basis functions, and still maintain sufficient resolution to recognize high frequency features. Whether non-harmonic analysis would also help overcome the time/frequency limits in the lower frequencies is less clear, since these are primarily based on the window length of the FFT. But no doubt other useful heuristics can and will be developed, e.g. low pass filtering used with spline based approximation such as Chebyshev polynomials, or estimation of low frequencies by counting zero crossings. The use of non-harmonically related Fourier series in control theory goes back to the 1930 s and 1940 s by Bellman and others. This technique is used for generating the waveforms needed rather than analyzing an unknown signal, which is our current interest. Wavelet analysis is a related construct that can be useful if you can find the appropriate wavelet family, which I found difficult in practice. Moreover, it is likely than many different wavelets would be needed to solve the general audio parsing problem. Last year I explored empirically in MATLAB the topic of both harmonic and non-harmonic Fourier analysis with interesting, though fairly obvious, results. Today, Google favored me better than last year, and I found references to work being done in the non-harmonic Fourier analysis side of things. These seem to be entirely in the theoretical math domain. Whether there is a nonharmonic version of the Cooley-Tukey algorithm remains to be seen. Practical Software Applications My work to date has shown simple and practical techniques for characterizing rhythm in musical recordings, without relying on meta information like sheet music, MIDI files, or explicit analysis by humans. This sort of approach is critical for fully automating music search. My thesis research involved a lot of hand work, but extending this to automated analysis is mostly a question of developing a catalog of information features for search, rather than any great fundamental technical breakthrough. One major reason for this is the relative simplicity of searching for percussion sounds in musical recordings. These musical events are typified by strong and sudden onsets, and these onset events are easily mapped to time locations which makes extraction of rhythm quite straightforward. While I chose simple audio mixes for quick and easy analysis, the same techniques can be applied to more complex musical samples, making general rhythm extraction and comparison practical. I am sure there will be some situations where the simple approach proves inadequate, but more sophisticated techniques such as Blind Signal Separation and Independent Components Analysis should extend the practical limits in many cases. New DSP Techniques 2007 Ken Lindsay ( ken@tlafx.com ) ( samba4ken@gmail.com ) 3

There are numerous other cases in music where more advanced techniques would be needed. Such work can be found in the literature, but it typically has a disconnected academic quality, and does not appear ready for professional real world use. To take an example, an original recording of a Beatles song and an elevator music version of the same tune would match in many ways: same key, same tempo, same chord progression, same harmonic structure etc. A human would not be fooled, and returning The 101 Strings version of Yesterday would probably just annoy a user and cause them to use a different search engine. Clearly better techniques are needed for the general task of music search. Here is an example of a more difficult analysis task. Figure 1 is a spectrogram of Bob Marley singing Stir it up, quench me darling, when I m thirsty. Contrasted with Natalie Cole s strong and punchy vocal image in Fever, Marley sings in a plaintive sensitive style. The interleaving of the vocal frequencies is visually clear, and undoubtedly, the presence of emotional content in vocal music can be traced to features like these. A practical feature extraction and recognition scheme would allow music searching for emotional similarity in music without relying on meta information explicitly entered by human experts. Similarly, vocal harmonies and complex instrumentation could be analyzed for search without relying on human supplied meta information. In this example, the percussive rhythmic note events stand out clearly, although they are sometimes masked by the vocal signal. Figure 1. Bob Marley sings Stir it up (1974) New Paradigms Based on Human Auditory Perception The higher frequency range of the human ear uses hardware (or wetware) which is fairly similar to Fourier analysis the frequency sensing cells in the cochlea respond to input signals and discriminate frequency content something like Fourier series. However, the low New DSP Techniques 2007 Ken Lindsay ( ken@tlafx.com ) ( samba4ken@gmail.com ) 4

frequency response of the hearing system is quite different from Fourier analysis. Instead the basilar membrane flexes into frequency specific shapes in response to incoming signals, and the nerve pathways pick up displacement dynamics and delay information in order to extract very fine resolution in the 100 to 2000 Hz range. The 2000 Hz cutoff is due to limits of neuron recovery time, which would not be a problem in a computer algorithm. Below 200 Hz there is an additional mechanism based on detecting beats between component frequencies, although some researchers discount this third mechanism. Nonetheless, it is useful to learn from Nature to devise methods which can help bypass the inherent time/ frequency limits of standard windowed FFTs. Conclusions Music analysis for searching will soon develop to use a Google like interface where we copy and paste a few sample audio clips into a search field, and quickly see a list of songs which have characteristics like those present in the samples. There are numerous players in music search technology, but they typically rely heavily on human expertise and manual labor in cataloging and categorizing musical pieces. If it can be made practical, an automated approach would be better. I believe it wi$ be practical within a few years, and am keen on being part of that development process. My interests are not academic. I am very much a pragmatic developer and researcher. While I am glad that there are highly skilled theoreticians in the world, my interests lie in creating practical applications which will be used extensively by real people to help resolve their real world needs. Similarly, speech recognition is quite impressive as it stands now, but clearly falls far short of even the most basic of human skills. Developing and using better DSP and pattern recognition techniques are crucial for bringing these tasks into the 21st century. References Fulop, Sean. A. & Fitz, Kelly (2006). Algorithms for computing the time-corrected instantaneous #equency (reassigned) spectrogram, with applications. Journal of the Acoustical Society of America, 119:1 pp. 360-370. January 2006. Press, William H., Teukolsky, Saul A., Vetterling, William T., & Flannery, Brian P. (2002) Numerical Recipes in c++: The Art of Scientific Computing, 2nd edition. Cambridge University Press. Cambridge, UK. Young, Robert M. An Introduction to Nonharmonic Fourier Series, revised 1st ed. Academic Press. San Diego, San Francisco. 2001. New DSP Techniques 2007 Ken Lindsay ( ken@tlafx.com ) ( samba4ken@gmail.com ) 5