SGN Audio and Speech Processing

Similar documents
SGN Audio and Speech Processing

Complex Sounds. Reading: Yost Ch. 4

EE482: Digital Signal Processing Applications

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Mel Spectrum Analysis of Speech Recognition using Single Microphone

MUS 302 ENGINEERING SECTION

Final Exam Study Guide: Introduction to Computer Music Course Staff April 24, 2015

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Fundamentals of Music Technology

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Topic. Spectrogram Chromagram Cesptrogram. Bryan Pardo, 2008, Northwestern University EECS 352: Machine Perception of Music and Audio

Speech Synthesis; Pitch Detection and Vocoders

ECE 556 BASICS OF DIGITAL SPEECH PROCESSING. Assıst.Prof.Dr. Selma ÖZAYDIN Spring Term-2017 Lecture 2

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Reducing comb filtering on different musical instruments using time delay estimation

MUSC 316 Sound & Digital Audio Basics Worksheet

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

Lab 3 FFT based Spectrum Analyzer

Computer Audio. An Overview. (Material freely adapted from sources far too numerous to mention )

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

CS 591 S1 Midterm Exam

Audio Content Analysis. Juan Pablo Bello EL9173 Selected Topics in Signal Processing: Audio Content Analysis NYU Poly

Fundamentals of Digital Audio *

CS 188: Artificial Intelligence Spring Speech in an Hour

DERIVATION OF TRAPS IN AUDITORY DOMAIN

Principles of Musical Acoustics

Sound, acoustics Slides based on: Rossing, The science of sound, 1990.

8.3 Basic Parameters for Audio

Sound waves. septembre 2014 Audio signals and systems 1

Mel- frequency cepstral coefficients (MFCCs) and gammatone filter banks

Sound PSY 310 Greg Francis. Lecture 28. Other senses

Lab S-8: Spectrograms: Harmonic Lines & Chirp Aliasing

ECEn 487 Digital Signal Processing Laboratory. Lab 3 FFT-based Spectrum Analyzer

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Communications Theory and Engineering

Chapter 2. Meeting 2, Measures and Visualizations of Sounds and Signals

Linguistic Phonetics. Spectral Analysis

Speech Signal Analysis

8A. ANALYSIS OF COMPLEX SOUNDS. Amplitude, loudness, and decibels

COMP 546, Winter 2017 lecture 20 - sound 2

Sound Synthesis Methods

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Equalizers. Contents: IIR or FIR for audio filtering? Shelving equalizers Peak equalizers

EE482: Digital Signal Processing Applications

Laboratory Assignment 2 Signal Sampling, Manipulation, and Playback

Speech Synthesis using Mel-Cepstral Coefficient Feature

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Chapter 4. Digital Audio Representation CS 3570

CS101 Lecture 18: Audio Encoding. What You ll Learn Today

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Advanced Audiovisual Processing Expected Background

Pattern Recognition. Part 6: Bandwidth Extension. Gerhard Schmidt

BEAT DETECTION BY DYNAMIC PROGRAMMING. Racquel Ivy Awuor

Digital Signal Processing

What is Sound? Part II

Class Overview. tracking mixing mastering encoding. Figure 1: Audio Production Process

Rhythm Analysis in Music

Chapter 1: Introduction to audio signal processing

Rhythm Analysis in Music

Continuous vs. Discrete signals. Sampling. Analog to Digital Conversion. CMPT 368: Lecture 4 Fundamentals of Digital Audio, Discrete-Time Signals

Music Signal Processing

Practical Limitations of Wideband Terminals

COM325 Computer Speech and Hearing

Lecture 6. Rhythm Analysis. (some slides are adapted from Zafar Rafii and some figures are from Meinard Mueller)

Chapter 2: Digitization of Sound

Data Communications & Computer Networks

Tempo and Beat Tracking

CONTENTS. Preface...vii. Acknowledgments...ix. Chapter 1: Behavior of Sound...1. Chapter 2: The Ear and Hearing...11

Copyright 2009 Pearson Education, Inc.

A Digital Signal Processor for Musicians and Audiophiles Published on Monday, 09 February :54

SPEECH AND SPECTRAL ANALYSIS

EE 351M Digital Signal Processing

ALTERNATING CURRENT (AC)

Cepstrum alanysis of speech signals

AUDL Final exam page 1/7 Please answer all of the following questions.

Drum Transcription Based on Independent Subspace Analysis

Speech Coding using Linear Prediction

Lecture PowerPoints. Chapter 12 Physics: Principles with Applications, 6 th edition Giancoli

Lab 4: Using the CODEC

Physics 101. Lecture 21 Doppler Effect Loudness Human Hearing Interference of Sound Waves Reflection & Refraction of Sound

Audio Signal Compression using DCT and LPC Techniques

Overview of Digital Signal Processing

Overview of Signal Processing

Introduction of Audio and Music

techniques are means of reducing the bandwidth needed to represent the human voice. In mobile

PROBLEM SET 6. Note: This version is preliminary in that it does not yet have instructions for uploading the MATLAB problems.

Musical Acoustics, C. Bertulani. Musical Acoustics. Lecture 14 Timbre / Tone quality II

Digital Speech Processing and Coding

Block diagram of proposed general approach to automatic reduction of speech wave to lowinformation-rate signals.

Signals A Preliminary Discussion EE442 Analog & Digital Communication Systems Lecture 2

Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications

FIR/Convolution. Visulalizing the convolution sum. Convolution

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

Terminology (1) Chapter 3. Terminology (3) Terminology (2) Transmitter Receiver Medium. Data Transmission. Direct link. Point-to-point.

Pulse Code Modulation

Transcription:

SGN 14006 Audio and Speech Processing Introduction 1 Course goals Introduction 2! Learn basics of audio signal processing Basic operations and their underlying ideas and principles Give basic skills although all the latest cutting edge algorithms cannot be covered Lectures, Fall 2015 Pasi Pertilä Tampere University of Technology (slides by Anssi Klapuri)! Learn fundamentals of speech processing Speech production and its computational modeling Acoustic features to represent speech signals Some applications: speech coding, synthesis! Learn the basics of acoustics and human hearing These form the foundation for technical applications Lecture timeline (some changes may still take place) Introduction 3 What is not covered by this course Introduction 4! Sound, audio signals, acoustics! Hearing! Basic audio signal processing operations AD/DA-conversion, filters and filter banks, dynamic control, etc.! Sound synthesis! Audio coding! Speech production anatomy, phonetics! Linear prediction, MFCCs, and cepstrum! Speech coding! Speech synthesis! Speech recognition, audio content analysis, and acoustic pattern recognition " Course SGN-24006 Analysis of Audio, Speech and Music Signals (period 4)! Analog audio Electroacoustics, microphone and loudspeaker design " See the course Akustiikan mittaukset! Hardware implementations

Practical arrangements Introduction 5! Course homepage: http://www.cs.tut.fi/~sgn14006! Lectures Mondays 12-14 in TB219 Thursdays 14-16 in TB222 Pasi Pertilä, pasi.pertila @ tut.fi! Lecture slides will be available as pdf on the course page Course is not based on any individual textbook. Lectures, lecture notes and exercises will be sufficient to take the exam. Some recommended textbooks are mentioned at the end of this introduction! Requirements: exam and project work! 5 cr Exercises Introduction 6! Exercises start one week after the lectures (2.9.2015)! Assistants: Shriram Nandakumar, Emre Cakir! Contents: math and Matlab exercises related to the lectures! Two alternative groups Tuesday 10-12 in TC303 (updated!) Friday 12-14 in TC303 Register to either group on-line at 14:00 today www.tut.fi/pop! Math problems are to be solved in advance, Matlab exercises are done during the exercises! Active completion of the exercises and participation in the exercises is credited up to 3 points in the exam (equivalent to one mark)! Project work will be discussed at the exercises too Introduction 7 Introduction 8 Project work Reference material! Implementing an audio signal processing algorithm in Matlab In two-person groups! Topic(s) will be introduced later during the lectures! Requirements: Choosing the topic Implementing the algorithm Final report by 28.10.! More detailed instructions will appear on the course home page! Gold, Morgan, Ellis, Speech and audio signal processing, Wiley, 2011.! Zölzer. Digital audio signal processing, Wiley&Sons, 2nd ed. 2008. Including AD/DA-conversion, dynamic control, equalization, filter banks! T.F. Quatieri: "Discrete-Time Speech Signal Processing: Principles and Practice", Prentice Hall PTR, 2002.! Rossing. The science of sound, Addison-Wesley, 1990. Acoustics, hearing! Brandenburg, Kahrs. (1998). Applications of digital signal processing to audio and acoustics, Kluwer Academic Publishers Chapter on Perceptual audio coding! Pulkki, Karjalainen, Communication acoustic,2015, Wiley

Introduction 9 Audio signals Introduction 10 Introduction to audio signals and their representation! Audio = related to sound or hearing! The word sound may mean 1. a sensation perceived by the auditory system, or 2. longitudinal pressure waves in a material medium (such as air) that may cause a hearing sensation Due to human hearing, we usually consider the frequency range 20 Hz 20 khz and air as the medium (although hearing works also underwater for example)! Sound signal audio signal Numerical representation of sound Sound pressure level as a function of time, measured using a microphone for example! Note: audio signal is often understood as non-speech audio signal, although speech signals are audio too Audio and speech processing Introduction 11 Audio signal representations Introduction 12! Where is audio and speech processing needed?! Examples: Convert a musical piece into compressed mp3 format and store it on a hard disc for playback later (audio coding) Encode a speech signal on a mobile phone before transmission Add reverberation to a sound, correct the pitch of a singer (studio technology) Enhance the quality of a speech signal (denoising, echo cancell.) Compensate for loudspeaker non-idealities by digital equalization! Typical digital signal processing system: 1. Digitize a signal (sampling, quantization) 2. Process in digital form (store, manipulate, etc) -digital representation enables a variety of algorithms 3. Convert back to an analog signal! Different applications employ different representations Time domain representation Frequency domain representation Time-frequency domain representation! On this course we consider mainly music and speech Music signals involve a wide variety of sounds, billions of people listen to music worldwide Speech signals are an important special category of sound signals due to their importance for communication

Time domain signal Introduction 13! Air pressure level as a function of time (zero level = normal air pressure) is a natural representation for audio An analog signal is easy to record using a microphone and play back using a loudspeaker! For music, typical sampling rates are 44.1 or 48 khz Allows for representing the frequency range of human hearing (approximately 20 Hz 20 khz)! For speech 8 khz: Narrowband the conventional telephone rate (sibilants /s/, /f/ distorted) 16 khz: Wideband voice over IP, bandwidth extension! Other rates are also widely used: 96, 32, 22.05 khz etc.! Most of the energy (and information) of natural sounds is at low frequencies (around 200 Hz 5 khz) Time domain signal (1) Introduction 14! Analog signal (solid line) can be represented with discrete samples (dots) without loss of information, if the sampling frequency 2 * highest frequency component in the signal Remember from introductory signal processing courses Introduction 15 Introduction 16 Time domain signal (2) Time domain signal (3)! Large time scale illustrates the sound amplitude envelope! Example signal: one note from the oboe Amplitude is zero before the sound starts The oboe has continuous excitation, therefore the sound s amplitude envelope remains nearly constant throught it duration! Zoom-in of the same oboe signal at time t = 0.45 s! 90 ms frame illustrates the periodic waveform Many sounds are periodic, for example most musical instrument sounds and vowels in speech

Introduction 17 Frequency domain representation spectrum! Obtained by computing discrete Fourier transform (for example) of the time-domain signal, usually in a short frame! Many perceptually important properties are more clearly visible in the frequency domain! Decibel scale for amplitude is useful from the viewpoint of the human hearing and the dynamics of natural sounds Due to Fechner s law (subjective sensation is proportional to the logarithm of the stimulus intensity)! Phases are perceptually less important often omitted Consider log-frequency and db-magnitude! Linear scale usually hard to see anything! Log-frequency each octave is approximately equally important perceptually! Log-magnitude perceived change from 50dB to 60dB about the same as from 60dB to 70dB Introduction 18 Time-frequency representation spectrogram Introduction 19 Example audio signals: guitar Introduction 20! Shows sound intensity as a function of time and frequency! Obtained by blocking the signal into short analysis frames and by computing their spectra! For audio, the frame size is typically 10 100 ms: sound spectra are often nearly stationary at that time scale! Sound decays gradually after the onset! Instantaneous excitation: string is plucked at onset! Periodic sound (vibrating string, covered on Acoustics lecture)

Introduction 21 Introduction 22 Example audio signal: snare drum Example audio signals: snare drum (2)! Instantaneous excitation, exponentially decaying amplitude envelope! Zoom-in of the snare drum waveform! The signal contains also non-periodic components Introduction 23 Introduction 24 Example audio signals: snare drum (3) Example audio signals: snare drum (4)! Spectrum is noise-like too: not as clear structure as that in oboe s spectrum! Spectrogram

Polyphonic music (1) Introduction 25 Polyphonic music (2) Introduction 26! Polyphonic music consists of a mix of several sound sources (linear superposition)! Spectrogram reveals e.g. the rhythmic structure Speech: time domain signal (1) Introduction 27 Speech: time domain (2) Introduction 28! One sentence ( He knew what taboos he was violating. )! Speech can be viewed as a sequence of phonemes! Zooming in to different phonemes Left: vowel e in He (voiced: periodic) Right: t in taboos (unvoiced: noisy )

Speech spectrogram Introduction 29! Each phoneme has its characteristic spectral shape! Transitions between phonemes are continuous rather than step-like AD#1 NERDS MEET ART ISTS 2015-2016 Joint Course Module of Signal Processing, School of Architecture and Civil Engineering GOAL: Introduction 30 Help signal processing engineers to understand needs of urban design and help architects and civil engineers to understand potential of modern ICT in quantitative analysis of urban spaces. With the help of camera and microphone systems automatic analysis is provided for quantitative urban space monitoring. The quantitative data is used for boosting architectural and civil engineering design of future urban spaces. COURSE: SGN-81006 S ignal Processing Innovation Project PARTICIPATION: Enroll to the above course and come to the O pening Session August 25 2015 10:00-12:00 RO104 where the overall description is given and the project groups will be formed. The works will be supervised by the researchers from Department of Signal Processing, School of Architecture and Department of Civil Engineering. FOR MORE INFORMATION: Harry Edelman (School of Architecture / Dept. of Civil Engineering) Joni Kämäräinen (Dept. of Signal Processing - video processing) Tuomas Virtanen (Dept. of Signal Processing - audio processing) Invitation to Data Collection Campaign AD#2, Participate in a study, get a movie ticket! Introduction 31 I A project in Department of Signal Processing needs speech data for research purposes. I Your task is to read out simple English sentences from a script. Takes 25 minutes. I Reward: a movie ticket. I How to participate? I We need two persons per recording.! come with a friend. If you are alone, we could try to pair you. I Sign-up via email aleksandr.diment@tut.fi I The sessions take place on 24-28.8 during office hours, or at a different time upon agreement.