Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

Similar documents
ALTERNATING CURRENT (AC)

What is Sound? Part II

An introduction to physics of Sound

Computer Audio. An Overview. (Material freely adapted from sources far too numerous to mention )

Speech/Music Change Point Detection using Sonogram and AANN

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition

Chapter 12. Preview. Objectives The Production of Sound Waves Frequency of Sound Waves The Doppler Effect. Section 1 Sound Waves

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Copyright 2010 Pearson Education, Inc.

Principles of Musical Acoustics

Linguistics 401 LECTURE #2. BASIC ACOUSTIC CONCEPTS (A review)

8.3 Basic Parameters for Audio

Applications of Music Processing

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Fundamentals of Digital Audio *

SPEECH AND SPECTRAL ANALYSIS

Electric Guitar Pickups Recognition

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Drum Transcription Based on Independent Subspace Analysis

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

What is Sound? Simple Harmonic Motion -- a Pendulum

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

SOUND SOURCE RECOGNITION AND MODELING

CONTENTS. Preface...vii. Acknowledgments...ix. Chapter 1: Behavior of Sound...1. Chapter 2: The Ear and Hearing...11

Properties and Applications

The Deep Sound of a Global Tweet: Sonic Window #1

Autonomous Vehicle Speaker Verification System

EE482: Digital Signal Processing Applications

Audio Similarity. Mark Zadel MUMT 611 March 8, Audio Similarity p.1/23

Sound Design and Technology. ROP Stagehand Technician

MUS 302 ENGINEERING SECTION

Feature Selection and Extraction of Audio Signal

Ch 26: Sound Review 2 Short Answers 1. What is the source of all sound?

Audio Fingerprinting using Fractional Fourier Transform

Advanced audio analysis. Martin Gasser

Determination of Variation Ranges of the Psola Transformation Parameters by Using Their Influence on the Acoustic Parameters of Speech

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Unit 6: Waves and Sound

Psychology of Language

AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND TRANSFER FUNCTIONS

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Chapter 16. Waves and Sound

CHAPTER ONE SOUND BASICS. Nitec in Digital Audio & Video Production Institute of Technical Education, College West

Unit 6: Waves and Sound

CS 188: Artificial Intelligence Spring Speech in an Hour

Sound/Audio. Slides courtesy of Tay Vaughan Making Multimedia Work

AP Physics B (Princeton 15 & Giancoli 11 & 12) Waves and Sound

Active Noise Cancellation System Using DSP Prosessor

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Linguistic Phonetics. Spectral Analysis

Design and Implementation of an Audio Classification System Based on SVM

Sound Synthesis Methods

University of Colorado at Boulder ECEN 4/5532. Lab 1 Lab report due on February 2, 2015

Isolated Digit Recognition Using MFCC AND DTW

νµθωερτψυιοπασδφγηϕκλζξχϖβνµθωερτ ψυιοπασδφγηϕκλζξχϖβνµθωερτψυιοπα σδφγηϕκλζξχϖβνµθωερτψυιοπασδφγηϕκ χϖβνµθωερτψυιοπασδφγηϕκλζξχϖβνµθ

In Phase. Out of Phase

CS 591 S1 Midterm Exam

Speech Synthesis using Mel-Cepstral Coefficient Feature

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

A mechanical wave is a disturbance which propagates through a medium with little or no net displacement of the particles of the medium.

Review of Standing Waves on a String

SGN Audio and Speech Processing

L 5 Review of Standing Waves on a String

Fundamentals of Music Technology

HCS 7367 Speech Perception

Overview of Signal Processing

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Chapter 7. Waves and Sound

7.8 The Interference of Sound Waves. Practice SUMMARY. Diffraction and Refraction of Sound Waves. Section 7.7 Questions

Acoustic Resonance Lab

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Chapter 4. Digital Audio Representation CS 3570

Harmonic Motion and Mechanical Waves. Jun 4 10:31 PM. the angle of incidence equals the angle of reflection.

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Making Music with Tabla Loops

From Last Time Wave Properties. Description of a Wave. Question. Examples. More types of waves. Seismic waves

10/24/ Teilhard de Chardin French Geologist. The answer to the question is ENERGY, not MATTER!

From Last Time Wave Properties. Description of a Wave. Water waves? Water waves occur on the surface. They are a kind of transverse wave.

Basic Characteristics of Speech Signal Analysis

Chapter 05: Wave Motions and Sound

Digitally controlled Active Noise Reduction with integrated Speech Communication

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Waves & Interference

EVALUATION OF MFCC ESTIMATION TECHNIQUES FOR MUSIC SIMILARITY

Sound waves. septembre 2014 Audio signals and systems 1

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Determining Guava Freshness by Flicking Signal Recognition Using HMM Acoustic Models

Advanced Music Content Analysis

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Gammatone Cepstral Coefficient for Speaker Identification

Overview of Digital Signal Processing

Chapter 16 Sound. Copyright 2009 Pearson Education, Inc.

Combining granular synthesis with frequency modulation.

Speech and Music Discrimination based on Signal Modulation Spectrum.

Implementing Speaker Recognition

Transcription:

Sound Recognition ~ CSE 352 Team 3 ~ Jason Park Evan Glover Kevin Lui Aman Rawat Prof. Anita Wasilewska

What is Sound? Sound is a vibration that propagates as a typically audible mechanical wave of pressure and displacement through a medium. Elements of sound perception: Pitch: Frequency of the sound Sound traveling in a gas or liquid medium Duration Loudness Timbre: How a sound changes over time Sound traveling through solid medium Sonic Texture: Interactions between multiple sound sources Spatial Location: Where the sound comes from

What is Sound Recognition? A subset of pattern recognition Depending on purpose, different AI discipline can be used Neural networks for Speech. Adaptive algorithms for noise cancellation and virtual surround sound. General pattern recognition for Music Recognition Classification

etc. Common Uses Event Detection Song Recognition Noise Cancellation Voice/Speech Recognition Environmental Condition Detection Mapping Music Composition

Audio Signal An electrical representation of sound Sends information along a signal flow From source to speaker or recording device. Frequency Range: 20 to 20,000 Hz (limits of human hearing) Can be synthesized or originate from a transducer. Parameters: Bandwidth Difference between upper and lower frequencies in a set of frequencies.

Acoustic Fingerprint Condensed digital summary - fingerprint - used to identify audio samples or items in an audio database. Key characteristics: Estimated tempo Average Zero crossing rate Average Spectrum Spectral Flatness Tones Bandwidth

Automatic Content Recognition Used to identify content element without user input Commonly uses acoustic fingerprinting and watermarking Associates content and associated information in a database, and allows for the return of metadata to a client.

MFCC Mel-frequency cepstral coefficients Feature Extraction When the input data to an algorithm is too large, it can be transformed into a reduced set of features. Reducing the amount of resources required to represent a large set of data, referred to as feature vectors. The process to reduce variables involved is called dimensionality reduction. Plays a major role in Digital Signal Processing (DSP). Few of the models used for DSP are: LPC - Linear Predictive analysis

Feature Extraction Speech is highly variable - different speakers - Speaking rates - Content - Acoustic conditions (ambient sounds) Theoretically, it is possible to recognize speech directly from the digitized waveform, but because of the large variability of the speech signal. This is where Feature Extraction plays a role in reducing variability.

Noise Cancellation Emit a sound wave with the same amplitude, but with inverted phase. Phase: the position of a point in time on a waveform cycle Waveform: the shape and form of a signal The crest of one wave meets the trough of another wave. Leads to destructive interference. Constructive Destructive

Adaptive Noise Cancellation

Shazam Shazam is an app for PC, Macs and smartphones that identifies music Mainly uses fingerprints to recognize the songs Fingerprinting the song: Analyzes a chunk of the song and get the frequency makeup of the audio Determine which frequency is signature to the audio To allow for easy access, the signature frequency is placed into a hash table

Shazam Matching the song: Capture the audio and perform a fingerprinting of it. Compare the fingerprint pattern to those stored in the database (hashtable) Commonly, the pattern will match to multiple songs. Usually use relative timings Allows for greater flexibility for the captured sound.

Music Composition Algorithmic composition Provides notational information (sheet music) Provides composition (music synthesis) Many types of models: Grammars - Creates distinct musical grammars. Composed of harmonies and rhythms instead of single notes Knowledge-based systems - Isolates the aesthetic code of a certain musical genre Evo-Devo approach - Transforms a very simple composition (of a few notes) into a fully fledged piece

Algorithmic Music Composition The AI system, called FlowMachines, works by first analyzing a database of songs, and then following a particular musical style to create similar compositions. (The Beatles) Was composed by the AI but the arrangement was done by a French composer.

Emotion Recognition Subset of Speech Recognition Use Neural Networks to determine emotion in a sound clip Obtain waveform of a certain speech pattern and examine different factors to determine emotion Pitch Decibels Formant Mel-frequency Cepstral Coefficients (MFCC)

Waveform samples of different emotions

3. Classification Emotion Recognition - How does it work? 1. Feature extraction 2. Feature selection Select features that best identify a class

Sources: https://arxiv.org/ftp/arxiv/papers/1305/1305.1145.pdf https://www.toptal.com/algorithms/shazam-it-music-processing-fingerprinting-and-recognition http://www-personal.umich.edu/~gowtham/bellala_eecs452report.pdf http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.437.2775&rep=rep1&type=pdf http://www.docsity.com/en/news/physics/physics-sound-visual-representation-gifs/