Basic Characteristics of Speech Signal Analysis

Similar documents
International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

Communications Theory and Engineering

Speech Recognition using FIR Wiener Filter

EE482: Digital Signal Processing Applications

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Mel Spectrum Analysis of Speech Recognition using Single Microphone

SPEECH AND SPECTRAL ANALYSIS

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Speech Synthesis using Mel-Cepstral Coefficient Feature

COMP 546, Winter 2017 lecture 20 - sound 2

Linguistic Phonetics. Spectral Analysis

Pitch Period of Speech Signals Preface, Determination and Transformation

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Rhythmic Similarity -- a quick paper review. Presented by: Shi Yong March 15, 2007 Music Technology, McGill University

Epoch Extraction From Emotional Speech

Comparison of a Pleasant and Unpleasant Sound

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

VOICE COMMAND RECOGNITION SYSTEM BASED ON MFCC AND DTW

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Adaptive Filters Application of Linear Prediction

Speech/Music Change Point Detection using Sonogram and AANN

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

Keywords: spectral centroid, MPEG-7, sum of sine waves, band limited impulse train, STFT, peak detection.

Voice Excited Lpc for Speech Compression by V/Uv Classification

SGN Audio and Speech Processing

INTRODUCTION TO ACOUSTIC PHONETICS 2 Hilary Term, week 6 22 February 2006

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Introduction of Audio and Music

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET)

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Real time noise-speech discrimination in time domain for speech recognition application

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

Envelope Modulation Spectrum (EMS)

Structure of Speech. Physical acoustics Time-domain representation Frequency domain representation Sound shaping

Isolated Digit Recognition Using MFCC AND DTW

CS 188: Artificial Intelligence Spring Speech in an Hour

Advanced Functions of Java-DSP for use in Electrical and Computer Engineering Senior Level Courses

A mechanical wave is a disturbance which propagates through a medium with little or no net displacement of the particles of the medium.

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Discrete Fourier Transform (DFT)

Acoustic Phonetics. Chapter 8

Complex Sounds. Reading: Yost Ch. 4

A multi-class method for detecting audio events in news broadcasts

Preview. Sound Section 1. Section 1 Sound Waves. Section 2 Sound Intensity and Resonance. Section 3 Harmonics

Sound Recognition. ~ CSE 352 Team 3 ~ Jason Park Evan Glover. Kevin Lui Aman Rawat. Prof. Anita Wasilewska

SOUND SOURCE RECOGNITION AND MODELING

JOURNAL OF OBJECT TECHNOLOGY

Measuring the complexity of sound

Audio Restoration Based on DSP Tools

Lab 8. ANALYSIS OF COMPLEX SOUNDS AND SPEECH ANALYSIS Amplitude, loudness, and decibels

SPEECH ENHANCEMENT USING PITCH DETECTION APPROACH FOR NOISY ENVIRONMENT

Reading: Johnson Ch , Ch.5.5 (today); Liljencrants & Lindblom; Stevens (Tues) reminder: no class on Thursday.

Digital Speech Processing and Coding

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

Aberehe Niguse Gebru ABSTRACT. Keywords Autocorrelation, MATLAB, Music education, Pitch Detection, Wavelet

Analysis/synthesis coding

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

Digital Signal Processing

Electronic disguised voice identification based on Mel- Frequency Cepstral Coefficient analysis

Signal Processing for Speech Applications - Part 2-1. Signal Processing For Speech Applications - Part 2

Perceptive Speech Filters for Speech Signal Noise Reduction

Overview of Code Excited Linear Predictive Coder

E40M Sound and Music. M. Horowitz, J. Plummer, R. Howe 1

Speech Compression Using Voice Excited Linear Predictive Coding

A Survey and Evaluation of Voice Activity Detection Algorithms

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Voiced/nonvoiced detection based on robustness of voiced epochs

PART I: The questions in Part I refer to the aliasing portion of the procedure as outlined in the lab manual.

SGN Audio and Speech Processing

Definition of Sound. Sound. Vibration. Period - Frequency. Waveform. Parameters. SPA Lundeen

Pitch Detection Algorithms

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

Feature Selection and Extraction of Audio Signal

Advanced audio analysis. Martin Gasser

About waves. Sounds of English. Different types of waves. Ever done the wave?? Why do we care? Tuning forks and pendulums

Speech Synthesis; Pitch Detection and Vocoders

Quantification of glottal and voiced speech harmonicsto-noise ratios using cepstral-based estimation

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Audio processing methods on marine mammal vocalizations

Applications of Music Processing

8.3 Basic Parameters for Audio

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

EE 464 Short-Time Fourier Transform Fall and Spectrogram. Many signals of importance have spectral content that

Friedrich-Alexander Universität Erlangen-Nürnberg. Lab Course. Pitch Estimation. International Audio Laboratories Erlangen. Prof. Dr.-Ing.

An introduction to physics of Sound

Audio Signal Compression using DCT and LPC Techniques

Speech Signal Analysis

Converting Speaking Voice into Singing Voice

THE CITADEL THE MILITARY COLLEGE OF SOUTH CAROLINA. Department of Electrical and Computer Engineering. ELEC 423 Digital Signal Processing

Physics 115 Lecture 13. Fourier Analysis February 22, 2018

Announcements. Today. Speech and Language. State Path Trellis. HMMs: MLE Queries. Introduction to Artificial Intelligence. V22.

Speech/Music Discrimination via Energy Density Analysis

Real-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p.

Electric Guitar Pickups Recognition

Transcription:

www.ijird.com March, 2016 Vol 5 Issue 4 ISSN 2278 0211 (Online) Basic Characteristics of Speech Signal Analysis S. Poornima Assistant Professor, VlbJanakiammal College of Arts and Science, Coimbatore, Tamil Nadu, India Abstract: Speech signal include several fundamental characteristics and it is divided into two types of features such as time-domain speech signal features and frequency domain signal features which is mainly used for segmenting speech signals. Three important characteristics of speech signal is short time zero crossing, energy and auto correlation. The short time energy and short time zero crossing rates are important properties for detecting the end point of a speech signal analysis. Especially these two properties are used in voiced and unvoiced segmentation and classification. Keyword: Short-time energy, short-time zero crossing rate, autocorrelation, spectral centroid, spectral flux. 1. Introduction Speech processing is an interesting area of signal processing where speaker identification and speaker recognition are widely used applications. The first feature is Time-domain speech signal features are short-time energy, short time zero crossing rate, short time autocorrelation. Frequency domain features such as spectral centoid, spectral flux. The other characteristics of speech signals are pitch, stress, power spectral density, vowel duration, rhythm and intonation patterns. These characteristics mainly involved in speech segmentation which is recognizable and meaningful. Three important characteristics of speech signal is short time zero crossing, energy and auto correlation. [4]. 2. Time-Domain Signal Features Time-domain signal features can be used in speech segment extraction, which has algorithm for implementation and efficiency calculation. It contains three features such as short-time energy, short-time zero-crossing rate and autocorrelation. Short-time energy and short-time zero-crossing is most important features to detect the voiced and voiceless speech. 2.1. Short-Time Energy Short time energy is a basic and important characteristic of speech processing. Energy is defined like strength of the signal. Speech signals are naturally differing in terms of energy with respect to time. Short time analysis is used to estimate the speech signal. In General, the speech signals are voiced, unvoiced, silence and noise regions. Analyzing speech signal based on energy will have higher pressure on identifying these regions of a speech signal [4]. Voiced segments of a speech signal will have higher short-time energy and it will be low when unvoiced speech occurs. It is too low when speech signal is silent. Short time energy is opposite process of zero crossing rates. It is calculated by the equation (1)Where, E n is an energy at sample n in the signal x, W is a windoww and m is number of frames will be occurring in the signal. Short time energy of sample speech signal for the sentence My first speech processing work is shown in figure 1. Figure 1: Short time energy of sample speech signal INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT Page 1

In the above figure the speech signal is displayed in blue colour and short time energy is plotted by red colour which shows high energy in few point of speech signal [6]. Figure 2 shows short time energy of normal speech for the word Already. Figure 2: Short-time energy of normal speech word Already The voiced speech signal has high energy power and unvoiced speech has low energy power. Silent is identified when energy is too low. Figure 3 shows the short-time energy of stuttering speech word Already. Figure 3: Short-time energy of stuttering speech word for the Already In stuttering speech short-time energy can be find out in the silent region, voiced and unvoiced regions are foundby calculating the pressure of energy. 2.2. Short-Time Zero Crossing Short time zero crossing is another most popular characteristic of speech signal. It is identified by point where the positive sign change to negative sign in a graph of speech signal. The zero crossing is defined as Number of times an amplitude of the sign wave changes in a sound sample. Zero crossing rates are used to identifying the voiced and unvoiced speech signal and end point detection.zero crossing value is high when unvoiced speech or silent occurring in speech signal [2].It is measured by calculating how many times amplitude of the speech signals passes through a value of zero in a particular time interval. Short time zero crossing rates will be calculated by 1 (2) Where, is a zero crossing rate at sample n in the signal x, sgn is a signum function and m is a number of frame to be occur in speech signal. Short-time zero crossing rate of sampling speech signal is shown in figure 4. Figure 4: Short time zero crossing rate of sample speech signal. INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT Page 170

The short time zero crossing is shown in in red colour and it is high when unvoiced signals compacted [3]. Figure 5 shows the shorttime zero-crossing for normal speech for the word Already. Figure 5: Short-time zero-crossing of normal speech word Already Zero-crossing value is low when normal voiced speech occurs. Figure 6 shows short-time zero-crossing of stuttering speech word Already. Figure 6: Short-time zero crossing rate of stuttering speech word Already Silent regions will have very low energy and very low zero crossing values. So it will be removed. The number of zero crossing is processed nearly equal to zero. 2.3. Autocorrelation Autocorrelation is a system analysis function and it is also called as serial correlation. It is computed by the correlation of time series compared and identifying similarities between with its own past and future values. It is used for detecting the repetition or periodicity present in the signals. Past values are the values before respective autocorrelation frame, and future values are the values after respective autocorrelation frame [6]. Short-time autocorrelation will be calculated by, + + + + (3) Where, is a short time autocorrelation at sample n in the signal x. W is a window. Autocorrelation function is very useful tool in speech processing and used to identify the similarities of speech characteristics with respect to time. Short-time autocorrelation of sampling signal is shown in figure 7. Figure 7: Short-time auto correlation of sampling speech signal INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT Page 171

In voiced segmentation where signal is periodic and it is used to finding the high peaks for deciding voiced and unvoiced speech by using auto correlation function peaks [4]. 3. Frequency-Domain Signal Features Frequency domain features are used to analyse the signal with respect to the frequency which is extracted by Discrete Fourier transform. The frequency-domain signal features consist of two important features such as spectral centroid and spectral flux. 3.1. Spectral Centroid Spectral Centroid is generally correlated with measure of sound brightness which characterizes the spectrum. A measure that specifies the spectrum is center of gravity used by frequency of Fast Fourier Transform and magnitude information. It is calculated by dividing the average weighted frequency of amplitudes which is divided by the sum of amplitude. The spectral centroid is calculated by, (4) Where f (m) represent the center frequency of the bin m with length N and X(m) is weighted frequency value or DFT spectrum amplitude of the bin m. In speech signal spectral centroid and energy is low when silent regions are occurred [4]. Figure 8 shows the spectral centroid of normal speech word Already, Figure 8: Spectral centroid of speech word Already Where original speech is shown in blue colour and red colour is spectral centroid for sample speech word Already. 3.2. Spectral Flux Spectral flux is measured the changing of power spectrum of the signal and it is computed by power spectrum comparison. It is most important feature to separate the music form speech signal. It is also defined as squared difference between two normalized magnitude of successive spectral distribution and it represent the successive signal frames. It is calculated by the equation / ( ) ( 1) ) (5)Where X (k) is the DFT coefficient of short term frame with length N.Spectral flux is used to find out the tone of audio signal [1]. 4. Other Speech Signal Characteristics Basic speech signal characteristics such as pitch and intonation which is identified by producing the speech. The speech is produced by air pressure come from lungs through vocal cords via vocal track. When vocal cord does not vibrate unvoiced sound is produced. Voiced sounds are produced when vocal card vibrate correctly. 4.1. Pitch and Intonation Pitch frequency is an important parameter of speech processing. Vocal cord produced voiced and unvoiced sounds based on it vibration. A vibration sounds are delivered with glottal pulse, it has fundamental frequency and harmonics. The fundamental frequency of glottal pulse is called pitch. Basically pitch is also known as frequency of sound. Sound can be characterized based on pitch value, loudness and quality. It is compared like high or low in musical sounds. The pitch is just ear response of frequency. Human can hear range of sound between 3Hz to 3,000 Hz. Intonation speeches is basically a matter of vibrating in the pitch level of the voice [7]. 5. References i. Abdullah I. Al-Shoshan, Speech and Music Classification and Separation: A Review, J. King Saud Univ., Vol. 19, Eng. Sci. (1), pp. 95-133, Riyadh (147H./302). INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT Page 172

ii. A. Milton, S. Ashitha Dayana, S. Tamil selvi, Voiced and unvoiced classification of speech signal using Average Zero Crossing Index Difference Function, International Journal of Advanced Information Science and technology (IJAIST), ISSN: 1319:1281. iii. Bachu R.G., Kopparthi S., Adapa B., Barkana B.D. Separation of Voiced and Unvoiced using Zero Crossing Rate and Energy of the Speech Signal, Advanced Techniques in Computer Science and Software Engineering 33, pp:179-181. iv. Md.Mijanur Rahman, Md.A1-Amin Bhuiyan, Continuous Bangla Speech Segmentation using Short-term Speech Features Extraction Approaches, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol.3, No. 11,311. v. Mojtaba Radnard, Mahdi Hadavi, Mohammad Mahdi Nayebi, A new method of voiced and unvoiced classification based on clustering, Journal of Signal and Information Processing, 311, 1,332-347. vi. Paulraj M.P, Sazali Bin Yaacob, Ahamad Nazri Abdullah, Sathees Kumar Natraj, Segmentation of voice portion for voice pathology classification using Fuzzy logic, Challenges and innovation in information technology, 33. vii. http://www.physicsclassroom.com/class/sound/lesson-2/pitch-and-frequency INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH & DEVELOPMENT Page 173