Measuring the complexity of sound

Similar documents
International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Speech Enhancement Using Spectral Flatness Measure Based Spectral Subtraction

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Complex Sounds. Reading: Yost Ch. 4

Signals & Systems for Speech & Hearing. Week 6. Practical spectral analysis. Bandpass filters & filterbanks. Try this out on an old friend

Introduction of Audio and Music

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Acoustics, signals & systems for audiology. Week 4. Signals through Systems

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Speech/Music Change Point Detection using Sonogram and AANN

Digital Signal Processing

SPEECH AND SPECTRAL ANALYSIS

From Ladefoged EAP, p. 11

Psychology of Language

Applications of Music Processing

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique

Isolated Digit Recognition Using MFCC AND DTW

8.3 Basic Parameters for Audio

Different Approaches of Spectral Subtraction Method for Speech Enhancement

(Time )Frequency Analysis of EEG Waveforms

COMP 546, Winter 2017 lecture 20 - sound 2

Fourier Methods of Spectral Estimation

Automatic Evaluation of Hindustani Learner s SARGAM Practice

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

REAL-TIME BROADBAND NOISE REDUCTION

Lab 1B LabVIEW Filter Signal

Audio Restoration Based on DSP Tools

Spectro-Temporal Methods in Primary Auditory Cortex David Klein Didier Depireux Jonathan Simon Shihab Shamma

SAMPLING THEORY. Representing continuous signals with discrete numbers

FFT 1 /n octave analysis wavelet

ALTERNATING CURRENT (AC)

Basic Characteristics of Speech Signal Analysis

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

A Novel Approach for the Characterization of FSK Low Probability of Intercept Radar Signals Via Application of the Reassignment Method

PHYS225 Lecture 15. Electronic Circuits

Laboratory Experiment #1 Introduction to Spectral Analysis

Signal Processing for Digitizers

Real time noise-speech discrimination in time domain for speech recognition application

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Modulation. Digital Data Transmission. COMP476 Networked Computer Systems. Analog and Digital Signals. Analog and Digital Examples.

Image Denoising using Filters with Varying Window Sizes: A Study

3.2 Measuring Frequency Response Of Low-Pass Filter :

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

JOURNAL OF OBJECT TECHNOLOGY

Spur Detection, Analysis and Removal Stable32 W.J. Riley Hamilton Technical Services

Noise estimation and power spectrum analysis using different window techniques

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

LAB 2 Machine Perception of Music Computer Science 395, Winter Quarter 2005

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

Frequency Domain Representation of Signals

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Communications Theory and Engineering

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

A102 Signals and Systems for Hearing and Speech: Final exam answers

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

21/01/2014. Fundamentals of the analysis of neuronal oscillations. Separating sources

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

Keysight Technologies Pulsed Antenna Measurements Using PNA Network Analyzers

TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES

Audio Fingerprinting using Fractional Fourier Transform

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Fourier and Wavelets

DETECTION AND CLASSIFICATION OF POWER QUALITY DISTURBANCES

FAULT DETECTION OF ROTATING MACHINERY FROM BICOHERENCE ANALYSIS OF VIBRATION DATA

Speech Synthesis; Pitch Detection and Vocoders

Machine recognition of speech trained on data from New Jersey Labs

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Modern spectral analysis of non-stationary signals in power electronics

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Transfer Function (TRF)

Outline. Introduction to Biosignal Processing. Overview of Signals. Measurement Systems. -Filtering -Acquisition Systems (Quantisation and Sampling)

SGN Audio and Speech Processing

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Epoch Extraction From Emotional Speech

Practical Applications of the Wavelet Analysis

Perceptive Speech Filters for Speech Signal Noise Reduction

A Novel Technique or Blind Bandwidth Estimation of the Radio Communication Signal

Sampling and Reconstruction

Acoustics and Fourier Transform Physics Advanced Physics Lab - Summer 2018 Don Heiman, Northeastern University, 1/12/2018

19 th INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2-7 SEPTEMBER 2007

Enhanced Waveform Interpolative Coding at 4 kbps

Simulation Study and Performance Comparison of OFDM System with QPSK and BPSK

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

AUDL GS08/GAV1 Auditory Perception. Envelope and temporal fine structure (TFS)

Signal Analysis. Young Won Lim 2/9/18

Key words: OFDM, FDM, BPSK, QPSK.

Discrete Fourier Transform (DFT)

Speech Signal Analysis

Understanding Digital Signal Processing

OFDM Systems For Different Modulation Technique

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Transcription:

PRAMANA c Indian Academy of Sciences Vol. 77, No. 5 journal of November 2011 physics pp. 811 816 Measuring the complexity of sound NANDINI CHATTERJEE SINGH National Brain Research Centre, NH-8, Nainwal Mode, Manesar 122 050, India E-mail: nandini@nbrc.ac.in Abstract. Sounds in the natural environment form an important class of biologically relevant nonstationary signals. We propose a dynamic spectral measure to characterize the spectral dynamics of such non-stationary sound signals and classify them based on rate of change of spectral dynamics. We categorize sounds with slowly varying spectral dynamics as simple and those with rapidly changing spectral dynamics as complex. We propose rate of spectral dynamics as a possible scheme to categorize sounds in the environment. Keywords. Auditory; spectral dynamics; non-stationary; time frequency. PACS No. 05.45 1. Introduction The human auditory system is capable of discriminating a large variety of complex sounds in the natural environment. Interestingly, anatomical studies of the adult human brain indicate that specialized regions of the brain analyse different types of sounds [1]. Music, speech and environment noise are processed in areas that are anatomically distinct [2]. However, the reasons for this kind of functional organization are not clearly identified. We study the spectral dynamics of different environmental sounds and develop indices to quantify rate of change of spectral dynamics. We propose rate of change of spectral dynamics to explain sound categorization. The left panel of figure 1 shows examples of sound pressure waveforms from the natural environment. A striking feature of these different waveforms is that the successive disturbances are not equally spaced in time and are not of constant shape. In fact, a characteristic feature of these waveforms is the variation of spectral content as a function of time. Such non-stationarity in spectral content, which is a common feature of biological signals (electroencephalography, for example) makes it difficult to study such signals using standard analysis techniques. New methods of analysis, which use joint time frequency representations (TFR) have emerged as convenient methods to describe such non-stationary dynamics. A TFR is obtained by mapping a one-dimensional signal (continuous or discrete) in the time domain into a two-dimensional time frequency representation. It allows DOI: 10.1007/s12043-011-0188-y; epublication: 31 October 2011 811

Nandini Chatterjee Singh Waveform Tool (saw) Spectrogram Amplitude (in arb.units) Amplitude Amplitude Amplitude Frequency (Khz) Frequency (Khz) Page turn Aeroplane Frequency (Khz) Laughter Figure 1. Left panel shows time amplitude waveforms for some environmental sounds. Tool (saw), page turn, aeroplane and laughter show time-varying spectral structure which is shown in the right panels in the spectrographic representation using a 45 Hz Hamming window. Frequency (in Hz) is plotted on the y-axis while time (in s) is plotted on the x-axis with intensities (in db) represented in colour. Red indicates maximum power while blue indicates minimum power. The colour index is relative to the highest and lowest intensities for each signal. a simultaneous analysis in the time and frequency domains. TFRs provide localization both in time and frequency, within limits of resolution allowed by the uncertainty principle [3]. We study one such class of TFRs called spectrograms. 812 Pramana J. Phys., Vol. 77, No. 5, November 2011

Measuring the complexity of sound In the following sections, we identify a data set of sounds in the environment and describe them using the spectrographic representation. We find that the spectral distribution of environmental sounds can be described in terms of two kinds of spectral structures, one that has a periodic or harmonic spectral distribution and the other that has a noisy spectral distribution. We identify a measure to characterize such spectral structures and propose that the spectral dynamics of any sound in the environment can be described in terms of these spectral structures. We define an index to characterize sound complexity in terms of the number of distinct spectral structures and estimate the complexity of different environmental sounds. We suggest that spectral features of sounds in the natural environment could be a basis for the evolution of specialized auditory processing areas in the human brain. 2. Data Sounds were collected from online databases and were drawn from several different classes animal cries (e.g. cow moo), environmental sounds (telephone ring, airplane noise), and human non-verbal vocalizations (e.g. laughter). The sampling frequency of all sounds was 22,050 Hz. The sounds were pre-processed using Goldwave (version 5.10) software for noise reduction. Noise reduction is the elimination of unwanted noise, such as a background hiss or a power hum within a sound. Goldwave was also used to ensure that all sounds were matched for 2-s length. 3. Methods As described earlier, new analysis techniques, which use joint time frequency representation (TFR) within the limits of resolution allowed by the uncertainty principle [3] have emerged as convenient methods to describe non-stationary dynamics. For signals, where the dynamics can be considered to be stationary in short time windows, the short time Fourier transform (STFT) [3], has been found to be extremely useful. A display of the sound signal using the STFT in the time frequency representation is called the spectrogram. A spectrogram is obtained by first partitioning the signal into small overlapping equal segments of time t and then carrying out a STFT, for each segment [3]. The STFT of a function is defined as S (t, f ) = e i2π f τ s (τ) h (τ t) dτ, where s(t) is the signal, f is the frequency and h(t) is the window function. For signals where temporal resolution is required, h(t) is narrow and spectral resolution is poor. On the other hand, for good frequency resolution, h(t) is broad and provides poor temporal resolution [4]. The energy density spectrum of STFT is defined as a spectrogram (right panel of figure 1). The spectro-temporal structure of complex sounds viewed in the spectrographic representation exhibits essentially two kinds of spectral structures: (1) harmonic and (2) noisy. The spectral structure in some regions is highly patterned (see the vertical stripes in the top right panel) suggesting periodic or harmonic structure whereas in other regions the underlying spectral distribution is noisy (see the right panel, third from top). Pramana J. Phys., Vol. 77, No. 5, November 2011 813

Nandini Chatterjee Singh A standard method to measure the amount of spectral structure in a stationary signal is the spectral flatness measure (SFM) [5]. The SFM estimates the number of peaks in the power spectrum as opposed to a flat spectrum and is defined as the ratio of the geometric mean to the arithmetic mean of the power spectrum. A distribution of the power spectrum is expressed as [ N ] 1 f =1 S( f ) /N SFM = log (1/N) N f =1 S( f ), where S( f ) is the magnitude of each frequency component in Hz and N is the number of FFT points used to estimate the power spectral density of s(t). For a pure tone, which has a single peak in the power spectrum and has the simplest spectral structure, SFM is 0, whereas for white noise, which has infinite peaks, SFM is 1. To expand the dynamic range it is expressed on a logarithmic scale and thus, for a pure tone, SFM is minus infinity whereas for a white noise signal, SFM is 0. Low SFM sounds are, therefore, tonal while high SFM sounds are noisy. For non-stationary sounds, we define a time-dependent SFM(t), which estimates the spectral structure in each temporal segment. SFM(t) defined in terms of S(t, f ) is obtained from the spectrographic representation as [ N ] 1 i=1 S(t, f ) /N SFM (t) = log (1/N) N i=1 S(t, f ), where S(t, f ) is the power associated with each frequency component in that particular temporal segment. To describe environmental sounds which have varying spectral dynamics, we propose an index of spectral variability, namely spectral structure index (SSI) in terms of the variance of SFM(t) as [ ] 2 SSI SFM(t) SFM(t) N, N where N is the number of time frames and SSI is the average spectral variance for a given signal. We calculate SSI for different environmental sounds and propose a categorization of environmental sounds in terms of SSI. For sounds with spectral distributions fluctuating rapidly across time frames, SSI is large and we classify them as complex sounds. On the other hand, when variation in the spectral distribution across time frames is small we classify them as simple sounds. We suggest that the SSI defines degree of spectral complexity and can be used to categorize sounds into varying levels of complexity. 4. Results A total of 15 sounds were analysed. To deal with silences in sounds, we extracted epochs in the sound signal where power is <1 db and assigned them an SFM value of 0. Narrowband spectrograms were obtained using a 45 Hz Hamming window for all the sounds. Figure 2 814 Pramana J. Phys., Vol. 77, No. 5, November 2011

Measuring the complexity of sound SFM (t) Figure 2. Plot of SFM(t) vs. time for different environmental sounds. shows computed values of SFM(t) plotted on a logarithmic scale for some of the sounds. As seen in figure 2, SFM(t) does not change much across time windows for airplane noise (for example), a feature which is also reflected in the spectrographic representation (figure 1). On the other hand, for laughter, SFM(t) shows fluctuations across time windows. Thus SFM(t) follows the spectral dynamics in successive time frames. The variation in spectral structure across time windows for different environmental sounds, as estimated by SSI, is shown in table 1. For signals with similar spectral dynamics across time windows SSI < 1 (airplane noise, for example), while for signals with varying spectral dynamics across time windows SSI > 1 (laughter). We therefore suggest that, based on spectral dynamics, sounds in the natural environment may at least be classified into two categories, namely simple and complex. Signals with SSI < 1, can be classified as simple sounds, whereas sound signals with SSI > 1 can be classified as complex sounds. Table 1. SSI for various environmental sounds. Complex sounds Simple sounds Cow 1.0532 Tool (saw) 0.2119 Doorbell 1.1103 Breaking glass 0.3525 Coin drop 1.2509 Phone ring 0.423 Crow 1.4835 Ox 0.5219 Laughter 1.8827 Bagpipes 0.5747 Chickens 2.0167 Aeroplane 0.7471 Crying 2.3601 Horn 0.8361 Squirrel 6.9204 Page turn 0.899 Pramana J. Phys., Vol. 77, No. 5, November 2011 815

5. Conclusions Nandini Chatterjee Singh We propose a classification of sounds in the environment in terms of spectral dynamics. Sounds for which the spectral structure varies slowly across time windows are categorized as simple and sounds with rapidly changing spectral dynamics are categorized as complex. Based on our results we suggest that the auditory system may adopt processing strategies that might be similar for sounds with similar spectral dynamics, which could be a crude explanation for their anatomical organization in different regions of the human brain [1]. Functional neuroimaging experiments are required to validate our proposal and are currently in progress. Our analysis shows that the spectrographic representation presents a convenient representation to describe the rich spectral dynamics of non-stationary signals. The spectral structure index (SSI) could emerge as a novel measure to study spectral complexity in physical and biological systems. Acknowledgements The author would like to acknowledge T A Sumathi and Megha Sharda for their help in making figures, Rithwik Reddy for earlier work and the National Brain Research Centre for research support. References [1] O Chiry, E Tardif, P J Magistretti and S Clarke, Eur. J. Neurosci. 17, 397 (2003) [2] R J Zatorre, P Belin and V B Penhune, Trends in Cog. Sci. 6, 37 (2002) [3] L Cohen, Time frequency analysis (Prentice-Hall, New Jersey, 1995) [4] R Reddy, V Ramachandra, N Kumar and Nandini C Singh, Biol. Cybern. 100(4), 299 (2009) [5] NSJayantandPNoll,Digital coding of waveforms (Prentice-Hall, 1984) 816 Pramana J. Phys., Vol. 77, No. 5, November 2011