YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION

Similar documents
Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 1

SINOLA: A New Analysis/Synthesis Method using Spectrum Peak Shape Distortion, Phase and Reassigned Spectrum

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Communications Theory and Engineering

Detection, localization, and classification of power quality disturbances using discrete wavelet transform technique

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

IMPROVING QUALITY OF SPEECH SYNTHESIS IN INDIAN LANGUAGES. P. K. Lehana and P. C. Pandey

Enhanced Waveform Interpolative Coding at 4 kbps

Determination of instants of significant excitation in speech using Hilbert envelope and group delay function

Keywords Decomposition; Reconstruction; SNR; Speech signal; Super soft Thresholding.

HIGH QUALITY AUDIO CODING AT LOW BIT RATE USING WAVELET AND WAVELET PACKET TRANSFORM

Digital Image Processing

Experiment 6: Multirate Signal Processing

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

TRANSFORMS / WAVELETS

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

SPEECH TO SINGING SYNTHESIS SYSTEM. Mingqing Yun, Yoon mo Yang, Yufei Zhang. Department of Electrical and Computer Engineering University of Rochester

Pitch Period of Speech Signals Preface, Determination and Transformation

EE216B: VLSI Signal Processing. Wavelets. Prof. Dejan Marković Shortcomings of the Fourier Transform (FT)

FPGA implementation of DWT for Audio Watermarking Application

Application of The Wavelet Transform In The Processing of Musical Signals

Epoch Extraction From Speech Signals K. Sri Rama Murty and B. Yegnanarayana, Senior Member, IEEE

Audio and Speech Compression Using DCT and DWT Techniques

Speech Synthesis; Pitch Detection and Vocoders

Non-stationary Analysis/Synthesis using Spectrum Peak Shape Distortion, Phase and Reassignment

Two-Feature Voiced/Unvoiced Classifier Using Wavelet Transform

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Audio Signal Compression using DCT and LPC Techniques

[Nayak, 3(2): February, 2014] ISSN: Impact Factor: 1.852

Introduction to Wavelets. For sensor data processing

Audio Compression using the MLT and SPIHT

Basic Characteristics of Speech Signal Analysis

ENGR 210 Lab 12: Sampling and Aliasing

EEE508 GÜÇ SİSTEMLERİNDE SİNYAL İŞLEME

Biomedical Signals. Signals and Images in Medicine Dr Nabeel Anwar

Robust Voice Activity Detection Based on Discrete Wavelet. Transform

Speech Processing. Undergraduate course code: LASC10061 Postgraduate course code: LASC11065

Epoch Extraction From Emotional Speech

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Introduction to Wavelet Transform. Chapter 7 Instructor: Hossein Pourghassem

COMP 546, Winter 2017 lecture 20 - sound 2

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Sampling and Reconstruction of Analog Signals

EEG Waves Classifier using Wavelet Transform and Fourier Transform

Comparative Analysis between DWT and WPD Techniques of Speech Compression

Quality Evaluation of Reconstructed Biological Signals

The Discrete Fourier Transform. Claudia Feregrino-Uribe, Alicia Morales-Reyes Original material: Dr. René Cumplido

WARPED FILTER DESIGN FOR THE BODY MODELING AND SOUND SYNTHESIS OF STRING INSTRUMENTS

Speech Compression Using Wavelet Transform

RECENTLY, there has been an increasing interest in noisy

Chapter 5. Signal Analysis. 5.1 Denoising fiber optic sensor signal

Voice Activity Detection for Speech Enhancement Applications

ARM BASED WAVELET TRANSFORM IMPLEMENTATION FOR EMBEDDED SYSTEM APPLİCATİONS

Fundamentals of Digital Audio *

Original Research Articles

Detection of Voltage Sag and Voltage Swell in Power Quality Using Wavelet Transforms

Speech Synthesis using Mel-Cepstral Coefficient Feature

Hungarian Speech Synthesis Using a Phase Exact HNM Approach

OPTIMIZED SHAPE ADAPTIVE WAVELETS WITH REDUCED COMPUTATIONAL COST

Improved signal analysis and time-synchronous reconstruction in waveform interpolation coding

EEG DATA COMPRESSION USING DISCRETE WAVELET TRANSFORM ON FPGA

Introducing COVAREP: A collaborative voice analysis repository for speech technologies

Voice Excited Lpc for Speech Compression by V/Uv Classification

Evaluation of Audio Compression Artifacts M. Herrera Martinez

A Novel Detection and Classification Algorithm for Power Quality Disturbances using Wavelets

Sub-band Envelope Approach to Obtain Instants of Significant Excitation in Speech

Class 4 ((Communication and Computer Networks))

L19: Prosodic modification of speech

Detection and Classification of Power Quality Event using Discrete Wavelet Transform and Support Vector Machine

Module 3 : Sampling and Reconstruction Problem Set 3

Speech Compression Using Voice Excited Linear Predictive Coding

Analysis of LMS Algorithm in Wavelet Domain

Introduction to Wavelets Michael Phipps Vallary Bhopatkar

Discrete Fourier Transform (DFT)

ADDITIVE synthesis [1] is the original spectrum modeling

Voiced/nonvoiced detection based on robustness of voiced epochs

Sound is the human ear s perceived effect of pressure changes in the ambient air. Sound can be modeled as a function of time.

VIBRATO DETECTING ALGORITHM IN REAL TIME. Minhao Zhang, Xinzhao Liu. University of Rochester Department of Electrical and Computer Engineering

Lecture 7 Frequency Modulation

SINUSOIDAL MODELING. EE6641 Analysis and Synthesis of Audio Signals. Yi-Wen Liu Nov 3, 2015

DECOMPOSITION OF SPEECH INTO VOICED AND UNVOICED COMPONENTS BASED ON A KALMAN FILTERBANK

Department of Electronic Engineering NED University of Engineering & Technology. LABORATORY WORKBOOK For the Course SIGNALS & SYSTEMS (TC-202)

TIME FREQUENCY ANALYSIS OF TRANSIENT NVH PHENOMENA IN VEHICLES

ADDITIVE SYNTHESIS BASED ON THE CONTINUOUS WAVELET TRANSFORM: A SINUSOIDAL PLUS TRANSIENT MODEL

A Parametric Model for Spectral Sound Synthesis of Musical Sounds

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

Localization of Phase Spectrum Using Modified Continuous Wavelet Transform

Accurate Delay Measurement of Coded Speech Signals with Subsample Resolution

APPLICATIONS OF DSP OBJECTIVES

AN ERROR LIMITED AREA EFFICIENT TRUNCATED MULTIPLIER FOR IMAGE COMPRESSION

Realization and Performance Evaluation of New Hybrid Speech Compression Technique

MUS421/EE367B Applications Lecture 9C: Time Scale Modification (TSM) and Frequency Scaling/Shifting

Wavelet Transform. From C. Valens article, A Really Friendly Guide to Wavelets, 1999

Speech Coding using Linear Prediction

Monophony/Polyphony Classification System using Fourier of Fourier Transform

Real-time fundamental frequency estimation by least-square fitting. IEEE Transactions on Speech and Audio Processing, 1997, v. 5 n. 2, p.

Ultra wideband pulse generator circuits using Multiband OFDM

WAVELETS: BEYOND COMPARISON - D. L. FUGAL

PROJECT NOTES/ENGINEERING BRIEFS

Transcription:

American Journal of Engineering and Technology Research Vol. 3, No., 03 YOUR WAVELET BASED PITCH DETECTION AND VOICED/UNVOICED DECISION Yinan Kong Department of Electronic Engineering, Macquarie University Sydney, NSW, Australia Abstract. This paper describes property of the sudden change of a speech signal on its Glottal Closure Instant (GCI) and thereby discusses the principle of the localization of wavelets in both time and frequency domains. Based on this discussion, an algorithm for voiced/unvoiced segment decision and pitch detection is presented. Keywords: speech processing, pitch detection, wavelet, voiced/unvoiced decision. Introduction Since the significant pitch value constructs the solid ground most speech coders are built on, pitch detection and voiced/unvoiced decision are the most crucial steps for the encoding and decoding process in the area of speech processing. Though many approaches involved in pitch detection have been proposed [] [], none of them has achieved a perfect result that can be applied under all the various circumstances with different requirements, such as men, women, children or singer, nor the one to be presented in this paper. This article is to come up with a wavelet-based way to accomplish pitch detection and voiced/unvoiced segment decision. The normal speech segment can be divided into two categories, namely, voiced segment and unvoiced segment. The former comes from glottal vibration while the latter comes from airflow during phonation. The glottal vibration has the property of periodicity and the period of the vibration is called pitch, which is zero during unvoiced segment. It is known that the speech signal turns out to be a sudden change at the Glottal Closure Instant (GCI) [3][4], and as a result of this, the period between two GCIs can be detected to approximate the pitch of the corresponding voiced segment. Singularity Detection By Wavelet The main reason of using wavelet do detect pitch and determine voice from unvoice is due to its property of multi-resolution analysis and localization in both time and frequency domains. (x) If a real function satisfies Where dx, ( x) O( x ), 7

American Journal of Engineering and Technology Research Vol. 3, No., 03 then it is called smooth function [5]. Since the energy of smooth function is mainly focused on low frequency, can be seen as an impulse response from a low-pass filter, and thus, to convolute a signal f(x) with (x) can attenuate the high-frequency part of f(x) without changing its low-frequency part, and hence to make f(x) smooth. Defining wavelet function, following [5]: ) d d ( x dx, dx, and let x a a a then wavelet transform can be obtained from d a Wa f f * a f *( a ) a dx d dx ( f * ) d a d Wa f f * a f *( a ) a ( f * ) a dx dx It can be seen that W a f and W a f are direct ratio to the first and second order derivative of f * a( x) respectively, which obviously is the result from a (x) smoothing f(x). Since the extremum of the first-order derivative corresponds to the sudden change (singularity) in the original function, the point with maximum value of W a f corresponds to the singularity of f(x). Consequently, if the first-order derivative of a smooth function is chosen as a wavelet function, the singularity point, e.g. x0 of the original function f(x) can be detected by seeking the maximum of wavelet transform coefficients. Moreover, since W a f always appears to be j maximum at x0 for several scaling factors a, the discontinuity of the original function f(x) will have transitivity under different resolutions [6]. This is the property that wavelets have localization in both time and frequency domain, which conforms perfectly to the requirement of speech pitch detection. When the Discrete Wavelet Transform (DWT) is used to detect the singularity within a speech signal, the period during the j two singularities is just the pitch value. The scale factors a indicate the degree of the localization of wavelets on both time and frequency domain. As the scale factor a increases, the frequency support of wavelet is shifted towards low frequency, the time resolution decreases but the frequency resolution increases. Due to the localized feature of wavelets on both time and frequency domains, by means of different scale factors, the locations of the local maximums of DWT coefficients would specify the locations of GCI singularities and the pitch value accordingly. a (x) 8

American Journal of Engineering and Technology Research Vol. 3, No., 03 Start Set frame length = 0ms step length =.5ms i=, threshold T 0 =0.07 Read the i-th 0ms-length speech segment Does it reach the end of the End j=, calculate W j f The maximum of W j f <T 0? j=j+ Calculate W j f j<4? Let the maximum of W j f be M, and locate all those local maximum points of W j f and W j f that are bigger than Shift the frame for.5ms, i=i+ Any local maximum points of W f j and W f j have similar or the same location on horizontal Pitch = the time distance between two closest singularities This segment is unvoiced, and, pitch = 0 Fig. The algorithm of pitch detection and voiced/unvoiced segment decision. 9

American Journal of Engineering and Technology Research Vol. 3, No., 03 The Algorithm For Pitch Detection and Voice/Unvoiced Decision Fig. illustrates the algorithm for pitch detection and voiced/unvoiced decision. Voiced/Unvoiced Decision. Voiced/unvoiced speech segment is decided by the amplitudes of its DWT coefficients, as shown in Fig.. If the maximum value of the DWT coefficients is smaller than the T0, a threshold that has been set at the beginning of the program, then the segment being processed is determined to be unvoiced. If not, the segment may possibly be voiced and is left to be processed with pitch detection step, but if the pitch value detected turns out to be zero later on, this segment is still unvoiced, otherwise it is voiced. These two separate steps guarantee most of speech segments to be correctly decided concerning their voiced or unvoiced attributes. In this process, the most significant parameter is T0, whose appropriate value needs to be determined through constant attempts and adjustings during the practice and debugging process. n h 0 (n) h (n) -3 0.0000-0.00008-0.065-0.0643-0.500-0.087 0 0.3750-0.596 0.500 0.596 0.065 0.087 3 0.0000 0.0643 4 0.0000 0.00008 Fig.. Filter coefficients of B-spline wavelet of order 3. Pitch Detection. As shown in Fig., pitch detection is more complicated and important compared with voiced/unvoiced decision. In the algorithm, the locations of local maximum coefficients calculated under different scaling factors need to be compared to give the specific locations of singularities as precise as possible, which determine the pitch value. The process is to find such pairs of singularities which are in two different scales respectively and have similar or the same location on horizontal axis (t axis). If any one location on the horizontal axis has two singularities in two different scales, it could be safely conferred that on this location a singularity is happening, i.e. a GCI. Then, the time distance between two closest pairs of such singularities is the pitch value. Several wavelets were attempted in the algorithm, the 3th order B-spline wavelet showed best performance in singularity detection. Therefore, it is selected for this algorithm to detect the singularities of a speech segment on its Glottal Closure Instants. Fig. lists the sequence h0(n) and h(n), i.e. the filter coefficients of B-spline wavelet of order 3. The interval between the two nearby GCIs is just the pitch period, and the frequency of normal human pitch is within the range from 30 to 500Hz. 30

American Journal of Engineering and Technology Research Vol. 3, No., 03 0 00 80 60 40 0 Fig. 3 Wavelet Transform of the speech segment. 0 0 0 40 60 80 00 0 40 Fig. 4 Pitch detected from the sentence. High at different scales in one frame Da Jia Dou Shuo Pu Tong Hua Fig. 3 demonstrates the wavelet transform coefficients in different scales of a speech segment sampled from an English word High. The algorithm in Fig. was used, the frame length was set to 0ms and the sampling rate was 8K Hz. Considering d4 and d5 in Fig. 3, it can be seen that on the point of n=03, both of the two sequences reach one of their own local maximums, hence a singularity at this instant. Another singularity appears on n=3 in sequence d4 and n=38 in sequence d5, because the discrepancy of the two locations is 6, which is allowed by our experiment and could be deemed as a pair of similar locations. Then, the location of this singularity is (3+38)/=35. Since there is no other singularities existing between these two, hence the pitch = (35-03)/8K = 4.0ms. The flexible nature of this algorithm lies in the question that how close the two local maximums are required to be so that they can be seen as a pair of similar locations, i.e. to give a definition for a pair of similar locations. This usually has to be derived from repeated practices and also depends on the original signal, which means this can be different in the case of an adult s speech and a child speech. This also gives rise to some wave glitches in the pitch outputs, as those in Fig. 4, which shows the raw output pitch by applying this algorithm on the original signal, a Chinese sentence Da Jia Dou Shuo Pu Tong Hua. DWT Algorithm Applied The DWT is achieved based on the famous Mallat Algorithm [7]. The Mallat Algorithm is an efficient fast Discrete Wavelet Transform (DWT) algorithm just like the status of FFT in the area of Fourier Transform. It is actually composed of two-channel filter banks in reality, and embodies fully the multi-resolution property of wavelets [8]. Fig. 5 illustrates the wavelet cascade decomposition hierarchically performed by Mallat Algorithm, where h0(n) represents 3

American Journal of Engineering and Technology Research Vol. 3, No., 03 the Low-Pass Filter (LPF) and h(n) represents the High-Pass Filter (HPF). The coefficients and specifications of both filters depend on the type of wavelet selected. Fig. 5 The Mallot Algorithm. Fig. 6 The Algorithm Atrous The Mallat Algorithm needs a subsampling operation made to the output from the filters at each scale in order to maintain the same two-channel filter bank to be used in the next scale. However, this leads to an undesirable condition in that the length of the output sequence is halved at the output of each scale. The critical requirement is then to make sure the wavelet transform sequences in different scales be in the same length. This means the extension of those sequences with coefficients to the original length needs to be done. The atrous algorithm [7] is therefore 3

American Journal of Engineering and Technology Research Vol. 3, No., 03 introduced as a practical algorithm evolved from the Mallat algorithm. It shifts all the subsampling steps to the end of every route, as shown in Fig. 6. In view of Fig. 5 and 6, the core algorithm to implement the wavelet decomposition can be summarized as: j=0 a(j)=x(n) while j<j d(j+)=a(j)*h(j) a(j+)=a(j)*h0(j) end of while where h(j) and h0(j) are derived by interpolating j zeros between any of the two values next to each other. Conclusions A wavelet-based algorithm is presented for pitch detection and voice/unvoice decision. The algorithm adopts 3th order B-spline wavelet rather than other wavelets that are more often employed, e.g. Daubechies wavelet family. Atrous Algorithm is used to make the wavelet transform sequences in different scales be in the same length. The result from this algorithm gives fresh and useful ideas and experience in wavelet-based pitch detection that can be left open and referred to during the future work in this field. Wavelet algorithms will also be attempted on other audio coding steps. The strategy in deciding which two local maximums of two wavelet sequences are a pair of similar locations is simple and flexible. It also leads to glitches in the resulted pitch output, and therefore, it may still have the potential to be improved. For example, a smoothing process may apply. Acknowledgement This research was supported under the Macquarie University Research Development Grant (MQRDG) Scheme (Reference No. 9000087). References [] R. Ansari, D. Kahn and M. J. Macchi, Pitch Modification of Speech Using a Low-Sensitivity Inverse Filter Approach, IEEE Singal Processing Letters, VOL. 5, NO. 3, MARCH 998. [] E. B. George and M. J. T. Smith, Analysis-by-synthesis/overlap-add sinusoidal modeling applied to the analysis and synthesis of musical tones, J. Audio Eng. Soc.,, vol. 40, pp. 497 56, June 99. 33

American Journal of Engineering and Technology Research Vol. 3, No., 03 [3] S. Kandambe, Application of the Wavelet Transform for Pitch Detection of Speech Signals, IEEE Trans on Information Theory, VOL. 38,NO.,MARCH 99. [4] S. Molla and B. Torrésani, Determining Local Transientness of Audio Signals, IEEE Singal Processing Letters, VOL., NO. 7, JULY 004. [5] Y. Meyer, Wavelets algorithms and applications, Chapter : Wavelet from a historical perspective, Society for Industrial and Applied Mathematics, 993. [6] J. I. Agbinya DISCRETE WAVELET TRANSFORM TECHNIQUES IN SPEECH PROCESSING, 996 IEEE TENCON - Digital Signal Processing Applications. [7] MJ Shensa, The discrete wavelet transform: Wedding the atrous and Mallat algorithm, IEEE Trans. Signal Processing, vol. 40, pp. 464 48, Oct. 99. [8] O. Rioul, M. Vetterli, Wavelets and signal processing, Signal Processing Magazine, IEEE, Volume: 8, pp. 4 38, Oct. 99. 34