Journal of Next Generation Information Technology Volume 1, Number 2, August, 2010

Similar documents
Memorandum on Impulse Winding Tester

Comparing image compression predictors using fractal dimension

EE 330 Lecture 24. Amplification with Transistor Circuits Small Signal Modelling

P. Bruschi: Project guidelines PSM Project guidelines.

Lecture 4. EITN Chapter 12, 13 Modulation and diversity. Antenna noise is usually given as a noise temperature!

Passband Data Transmission I References Phase-shift keying Chapter , S. Haykin, Communication Systems, Wiley. G.1

Chapter 2 Summary: Continuous-Wave Modulation. Belkacem Derras

Knowledge Transfer in Semi-automatic Image Interpretation

Chapter 2 Introduction: From Phase-Locked Loop to Costas Loop

UNIT IV DIGITAL MODULATION SCHEME

Direct Analysis of Wave Digital Network of Microstrip Structure with Step Discontinuities

Communication Systems. Communication Systems

EECE 301 Signals & Systems Prof. Mark Fowler

Chapter 14: Bandpass Digital Transmission. A. Bruce Carlson Paul B. Crilly 2010 The McGraw-Hill Companies

ECMA st Edition / June Near Field Communication Wired Interface (NFC-WI)

Foreign Fiber Image Segmentation Based on Maximum Entropy and Genetic Algorithm

Answer Key for Week 3 Homework = 100 = 140 = 138

Investigation and Simulation Model Results of High Density Wireless Power Harvesting and Transfer Method

A New Voltage Sag and Swell Compensator Switched by Hysteresis Voltage Control Method

A Segmentation Method for Uneven Illumination Particle Images

Modeling and Prediction of the Wireless Vector Channel Encountered by Smart Antenna Systems

Communications II Lecture 7: Performance of digital modulation

Technology Trends & Issues in High-Speed Digital Systems

A WIDEBAND RADIO CHANNEL MODEL FOR SIMULATION OF CHAOTIC COMMUNICATION SYSTEMS

EXPERIMENT #4 AM MODULATOR AND POWER AMPLIFIER

Robust speech recognition using harmonic features

Channel Estimation for Wired MIMO Communication Systems

TELE4652 Mobile and Satellite Communications

Communication Systems. Department of Electronics and Electrical Engineering

Signal processing for Underwater Acoustic MIMO OFDM

The design of an improved matched filter in DSSS-GMSK system

f t 2cos 2 Modulator Figure 21: DSB-SC modulation.

Phase-Shifting Control of Double Pulse in Harmonic Elimination Wei Peng1, a*, Junhong Zhang1, Jianxin gao1, b, Guangyi Li1, c

Signal Characteristics

Pulse Train Controlled PCCM Buck-Boost Converter Ming Qina, Fangfang Lib

Laplacian Mixture Modeling for Overcomplete Mixing Matrix in Wavelet Packet Domain by Adaptive EM-type Algorithm and Comparisons

Bounded Iterative Thresholding for Lumen Region Detection in Endoscopic Images

Digital Communications - Overview

Comparison of ATP Simulation and Microprocessor

DS CDMA Scheme for WATM with Errors and Erasures Decoding

Evaluation of the Digital images of Penaeid Prawns Species Using Canny Edge Detection and Otsu Thresholding Segmentation

Lecture #7: Discrete-time Signals and Sampling

Enhancement of noisy speech signal based on variance and modified gain function with PDE preprocessing technique for digital hearing aid

ECE-517 Reinforcement Learning in Artificial Intelligence

Jitter Analysis of Current-Mode Logic Frequency Dividers

Dead Zone Compensation Method of H-Bridge Inverter Series Structure

Principles of Communications

Square Waves, Sinusoids and Gaussian White Noise: A Matching Pursuit Conundrum? Don Percival

Increasing Measurement Accuracy via Corrective Filtering in Digital Signal Processing

GaN-HEMT Dynamic ON-state Resistance characterisation and Modelling

ECE3204 Microelectronics II Bitar / McNeill. ECE 3204 / Term D-2017 Problem Set 7

Estimating Transfer Functions with SigLab

Double Tangent Sampling Method for Sinusoidal Pulse Width Modulation

Variation Aware Cross-Talk Aggressor Alignment by Mixed Integer Linear Programming

ELEG 3124 SYSTEMS AND SIGNALS Ch. 1 Continuous-Time Signals

Transmit Beamforming with Reduced Feedback Information in OFDM Based Wireless Systems

Development of Temporary Ground Wire Detection Device

Control and Protection Strategies for Matrix Converters. Control and Protection Strategies for Matrix Converters

EE201 Circuit Theory I Fall

4.5 Biasing in BJT Amplifier Circuits

Mobile Robot Localization Using Fusion of Object Recognition and Range Information

ECMA-373. Near Field Communication Wired Interface (NFC-WI) 2 nd Edition / June Reference number ECMA-123:2009

A New and Robust Segmentation Technique Based on Pixel Gradient and Nearest Neighbors for Efficient Classification of MRI Images

Experiment 6: Transmission Line Pulse Response

Deblurring Images via Partial Differential Equations

A new method for classification and characterization of voltage sags

Wrap Up. Fourier Transform Sampling, Modulation, Filtering Noise and the Digital Abstraction Binary signaling model and Shannon Capacity

Pointwise Image Operations

EXPERIMENT #9 FIBER OPTIC COMMUNICATIONS LINK

Notes on the Fourier Transform

THE OSCILLOSCOPE AND NOISE. Objectives:

Sound. Audio DSP. Sound Volume. Sinusoids and Sound: Amplitude

A novel quasi-peak-detector for time-domain EMI-measurements F. Krug, S. Braun, and P. Russer Abstract. Advanced TDEMI measurement concept

Fault Diagnosis System Identification Based on Impedance Matching Balance Transformer

How to Shorten First Order Unit Testing Time. Piotr Mróz 1

Lecture 5: DC-DC Conversion

SPEAKER IDENTIFICATION USING MODULAR RECURRENT NEURAL NETWORKS. M W Mak. The Hong Kong Polytechnic University

Motion-blurred star image acquisition and restoration method based on the separable kernel Honglin Yuana, Fan Lib and Tao Yuc

EE 40 Final Project Basic Circuit

Passband Data Transmission II References Frequency-shift keying Chapter 6.5, S. Haykin, Communication Systems, Wiley. H.1

GG6005. General Description. Features. Applications DIP-8A Primary Side Control SMPS with Integrated MOSFET

(This lesson plan assumes the students are using an air-powered rocket as described in the Materials section.)

Negative frequency communication

Lecture 11. Digital Transmission Fundamentals

Proceedings of International Conference on Mechanical, Electrical and Medical Intelligent System 2017

Comparative Analysis of the Large and Small Signal Responses of "AC inductor" and "DC inductor" Based Chargers

Modulation exercises. Chapter 3

ICT 5305 Mobile Communications

IMPROVEMENT OF THE TEXT DEPENDENT SPEAKER IDENTIFICATION SYSTEM USING DISCRETE MMM WITH CEPSTRAL BASED FEATURES

Performance Analysis of High-Rate Full-Diversity Space Time Frequency/Space Frequency Codes for Multiuser MIMO-OFDM

Development of an Efficient Algorithm for Fetal Heart Rate Detection: A Hardware Approach

Parameters Affecting Lightning Backflash Over Pattern at 132kV Double Circuit Transmission Lines

Primary Side Control SMPS with Integrated MOSFET

AK8777B. Overview. Features

FASER: Fast Analysis of Soft Error Susceptibility for Cell-Based Designs

Detecting Multi-Channel Wireless Microphone User Emulation Attacks in White Space with Noise

Using Box-Jenkins Models to Forecast Mobile Cellular Subscription

Evaluation of Instantaneous Reliability Measures for a Gradual Deteriorating System

A Smart Sensor with Hyperspectral/Range Fovea and Panoramic Peripheral View

Table of Contents. 3.0 SMPS Topologies. For Further Research. 3.1 Basic Components. 3.2 Buck (Step Down) 3.3 Boost (Step Up) 3.4 Inverter (Buck/Boost)

Transcription:

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy Universiy Visvesvaraya College of Engineering, K.R. Circle Bangalore, India. Email: suma855@yahoo.com;drksgurumurhy@gmail.com doi:.456/jni.vol.issue2.2 Absrac Modern speech processing applicaions require operaion on signal of ineres ha is conaminaed by high level of noise. These siuaions call for a greaer robusness in esimaion of he speech parameers for mismach environmen and low environmenal SNR level. In his paper he speech is analyzed wih a Gammaone filer bank. This splis he full band speech signal s(n) ino frequency bands(sub bands).and for each sub band speech signal pich is exraced. We deermine he Signal o Noise Raio for each Sub band speech signal. Then he average of pich periods of he highes SNR sub bands is used o obain a opimal pich value. This paper describes a compuaionally simple Pich exracion algorihms using Average Magniude Difference Funcion (AMDF) which is a new approach using weighed Auocorrelaion [2] and very useful for accurae Pich Period exracion. Boh hese algorihms can be sofware implemened and performance evaluaed. Boh of hem uses cener clipping for ime domain processing. This paper also in general Compares he effeciveness of he new AMDF using weighed Auocorrelaion and he exising Auocorrelaion mehod and how i is possible o uilize his furher in Speech Enhancemen Sysems using he proposed new algorihms for is implemenaion Keywords: Speech, Pich exracion, Linear predicive coding (LPC), Noisy Environmens, Average Magniude Difference Funcion(AMDF), Weighed Auocorrelaion, Gammaone filer banks. Inroducion Many principles have been proposed for he modeling of human pich percepion and for pracical pich deerminaion of speech signals [] [3]. For regular signals wih harmonic srucure, such as clean speech of a single speaker, he problem is solved quie reliably. When he complexiy increases furher, e.g., when harmonic complexes of sounds or voices are mixed in a single signal channel, he deerminaion of piches is generally a difficul problem ha has no been solved saisfacorily. The concep of pich refers o audiory percepion and has a complex relaionship o physical properies of a signal. Thus, i is naural o disinguish i from he esimaion of fundamenal frequency and o apply mehods ha simulae human percepion. Many such approaches have been proposed and hey generally follow one of wo paradigms: place (or frequency) heory and iming (or periodiciy) heory. Neiher of hese in pure form has been proven o show full compaibiliy wih human pich percepion and i is probable ha a combinaion of he wo approaches is needed. Also modern speech processing applicaions require operaion on signal of ineres ha is conaminaed by high level of noise. These siuaions call for a greaer robusness in esimaion of he speech parameers for mismach environmen and low environmenal SNR level. In his paper he speech sound is filered by Gammaone filer and for each sub band speech signal pich is exraced. We deermine he Signal o Noise Raio for each Sub band speech signal. Then he average of pich periods of he highes SNR sub bands is used o obain a opimal pich value. PITCH Period (i.e., fundamenal frequency fo and period, To /fo) is an imporan parameer of speech signal, which is used in speech analysis, synhesis and recogniion. For speech recogniion applicaions, he pich exracion algorihm provides he basis for voiced/unvoiced classificaion decision. Oher han voicing informaion, he pich exracion algorihm provides prosodic informaion such as sress and inonaion. The accuracy of pich exracion is direcly 3

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy relaed o he qualiy of speech. Thus, we need o exrac he pich of speech signals in pracical noisy environmens for mos of he applicaions. Pich exracion mehods are classified ino he following hree caegories; (a) waveform processing, (b) Specral processing and (c) correlaion processing known o be comparaively robus agains noise. Auo Correlaion funcion mehod of caegory (c), is one of he convenional mehods being used for Pich deerminaion, which provides he bes Performance in noisy environmens. Since he Average Magniude Difference Funcion (AMDF) has similar characerisics wih he Auo Correlaion funcion, a new pich exracion mehod, which uses an Auo Correlaion funcion weighed by he inverse of an Average Magniude Difference Funcion (AMDF) [2], can be implemened in Linear Predicive Coding Schemes. The characerisics of he AMDF are very similar wih hose of he auocorrelaion funcion. The Average Magniude Difference Funcion (AMDF) produces a noch, while he auocorrelaion funcion produces a peak. However, boh funcions essenially have he same periodiciy. The new AMDF mehod using weighed Auocorrelaion [2] uilizes he feaure ha in a noisy environmen, he noise componens included in he auocorrelaion funcion and AMDF behave independenly (and are uncorrelaed each oher). By such uncorrelaed properies, he peak of he auocorrelaion funcion is emphasized in a noisy environmen when he auocorrelaion funcion is combined wih he inversed AMDF. As a resul, i is expeced ha he accuracy of pich exracion for he AUTOC is improved. This paper describes wo compuaionally simple pich exracion algorihms using he new AMDF for pich deerminaion. For ease of presenaion, hese algorihms will be idenified hroughou his paper as # and #2. Algorihm # uses cener clipping and infinie peak dipping for ime domain preprocessing before compuing AMDF while Algorihm #2 nonlinearly disors he speech signal before cener clipping and AMDF compuaion. In fac Malab resuls shows ha #2 provides a beer pich deecion esimae han #. The resuls obained by comparing he average gross pich error rae sugges ha #2 is beer han #. Boh he mehods are compuaionally simple and more reliable in noisy environmens. Wih he growh of wireless DSP Processors where hardware is well molded for specific applicaions such Compuaionally simple algorihms becomes simple o implemen and leads o cycle coun gain ensuring reliabiliy in noisy environmens. The organizaion of he paper is as follows. Secion 2, we describe he LPC- Speech coder. Secion 3, we describe he gammaone filer bank used for sub band speech analysis. Secion 4, we describe he subband speech processing. Secion 5, describes he principle of he new weighed auocorrelaion mehod by inverse of AMDF. Secion 6, describes he proposed new compuaionally simple algorihms for he weighed auocorrelaion mehod by inverse of AMDF. Secion 7, describes Experimens and Resuls. Finally, we conclude his paper in Secion 8. 2. LPC- Speech Coder The LPC- speech coder is he US sandard for linear predicive coding of speech a 24 bis per second. I used he analysis-by-synhesis echnique, which based on he h order laice filer, o creae he predicion parameers. I employs linear predicive coding (LPC) shown in Figure, ha models he shor-erm specral informaion as an all-pole filer which capures he Power specral densiy (PSD) of he speech signal. The speech oupu from he LPC model is no accepable for many applicaions because i does no provide sound like human speech. Usually i is applied in miliary applicaions, which do no require high qualiy speech bu need low bi rae. However mos of he modern speech coder operaing principle is derived from he LPC model wih modificaions o improve qualiy and coding effeciveness. The LPC model is inspired by observaions of he basic properies of speech signals and represens an aemp o mimic he human speech producion mechanism which is shown in Figure. The combined specral conribuions of he gloal flow, he vocal rac, and he radiaion of he lips are represened by he synhesis filer. The driving inpu of he filer or exciaion signal is modeled as eiher an impulse rain (voiced speech) or random noise (unvoiced speech). Therefore, depending on he voiced or unvoiced sae of he signal, he swich is se o he proper locaion so ha he appropriae inpu is seleced. Energy level of he oupu is conrolled by he gain parameer. A he analysis sage (encoder), he four imporan parameers pich period, power, voicing informaion and LP coefficiens are compued and only hese parameers are appropriaely coded and ransmied. Wih hese parameers i is possible o reconsruc he speech signal a he decoder o 4

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 reproduce he speech signal. Esimaing he parameers precisely is he responsibiliy of he encoder. The decoder akes he esimaed parameers and uses he speech producion model o synhesize speech. Though he synhesized speech waveform is slighly differen from he original wave, bu since he power specral densiy (PSD) of he original speech is capured by he synhesis filer, PSD of he synheic speech is close o he original speech. Figure. LPC model ha synhesize he speech signal wih four inpus: Pich Period, voicing, Gain and Predicion Coefficiens. 3. Gammaone Filer Bank Gammaone filers can be implemened using FIR or IIR filers or frequency domain echniques. In his research, FIR filers can be employed in order o implemen linear phase filers wih idenical delay in each criical band. The analysis filers had a lengh of 2N- coefficiens, and were obained by convolving a sampled gammaone impulse response g(n) of lengh N = wih is ime reverse, where fc is he cenre frequency, T is he sampling period, n is he discree ime sample index, a, b, p, f are consans, and ERB (fc) is he equivalen recangular bandwidh of an filer. A a moderae power level, ERB( fc ) = 24.7 +.8 fc.examples of he impulse responses of hese filers are shown in Figure. 2. () Figure 2. Impulse responses of he (a) 3rd (cenre frequency 25 Hz) and (b) 8h (cenre frequency 4 khz) criical band linear phase gammaone filers. The gammaone filer bank employed in his approach conains gammaone filers Hi(z) whose cenre frequencies and bandwidhs mach hose of he criical bands. Thus, for an 8 khz signal bandwidh, 2 filers were used. 5

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy 4. Subband Speech Processing The Speech signal s() is applied o he gammaone filer bank described in he previous secion. The sub band speech s () o s () 2 are he oupus of he Gammaone filer bank as shown in Figure.3. Figure 3. Subband speech processing. The SNR esimaion is used for weighing he sub-bands. Noise Power is esimaed as an average of he non speech frames in each uerance. Wheher a frame is speech or non-speech is deermined by comparing he curren frame energy wih he average frame energy of he firs frames in inpu es speech. The full band SNR of frame can be simply compues as: SNR Full = log K k = K 2 k = s ( k) Nk ( ) 2 (2) s( k) = max x( k) α N, β N (3) Where k, x ( k ), s ( k ), and N are he frequency Index, he magniude specrum of noisy speech, ha of esimaed clean speech, and he average magniude specrum of noise, respecively. In order o compue he SNR, he magniude specrum of clean speech has o be esimaed. We use he specral subracion mehod, he overesimaing facor a subracs an overesimae of he noise power specrum from he noisy speech power specrum in order o minimize he presence of residual noise, and he specral flooring facor β prevens he specral componens of esimaed clean speech from falling below he lower value, β N( k). The values of overesimaing and specral flooring facors are se o. and. empirically. From Eqs. (2) and (3), he sub-band SNR can be easily obained as 6

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 SNR = log i S ( k) k Sub bandi 2 k Sub bandi Nk ( ) 2 (4) Where i is he sub-band index. From he SNR obained, he full-band or sub-band weigh coefficien, ρ or ρ, is calculaed by applying a sigmoid funcion o full-band or sub-band SNR as Full i ρ Full = + Full exp[.5( SNR η)] (5) i ρ = + i exp[.5( SNR η)] (6) Figure 4. shows he plos of weigh from Eqs. (5) and (6) depending onη. Figure 4. Plos of weighs depending on h Sub band SNR is deermined for all he sub bands s () o s () 2 and he average of pich periods of he highes SNR subbands is used o obain a opimal pich value. 5. Weighed Auocorrelaion Mehod by Inverse of AMDF 5.. Principle The auocorrelaion funcion of a periodic signal is given by where x(n) is he speech signal; k is he lag number; n is he ime for he discree signal; φ( k) = N N n= x( n) x( n + k) (7) The auocorrelaion funcion represenaion of he signal is a convenien way of displaying cerain properies of he Signal. If he signal is periodic wih period P samples, hen 7

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy φ ( k) = φ ( k + P) for, ± P, ± 2 P,... (8) Also i is an even funcion i.e. I aains a maximum value a k=; i.e., φ( k) = φ( k) (9) φ( k) φ() for all k. () Considering he above properies of auocorrelaion for periodic signals, we can see ha he funcion aains a maximum a samples, P, 2P where P is he pich period. Le us assume ha x(n) is a noisy Speech signal given by x(n)=s(n)+w(n) () where s(n) is a clean speech signal and w(n) is a whie Gaussian noise. N φ( k) = [ s( n) + w( n)][ s( n+ k) + w( n+ k)] N n= N [()( s nsn+ k) + snwn () ( + k) + φ( k) = N wnsn ( ) ( + k) + wnwn ( ) ( + k)] n= (2) In equaion (2) we can see ha Auo Correlaion and Cross-Correlaion of s(n) and w(n) is done. For large N, if s(n) is no cross-correlaed wih w(n) and w(n) is no self-correlaed excep for k= N φ( k) = [ s( n) s( n+ k)] hen N n= for k N φ( k) = [ snsn ( ) ( + k) + wnwn ( ) ( + k)] and N n= for k= (3) Thus Auo Correlaion funcion provides robus performance agains noise. Now le us come o AMDF. I is described by N ψ ( k) = x( n) x( n+ k) N n= (4) Now considering equaions (7) o (3) we can see ha he maximum peak is locaed a k=p excep for cases of k=. Bu in some cases he peak locaed a k=2p becomes larger han ha a k=p as shown in Figure (5). Then a half pich error occurs. Also here is a peak a k<p. This siuaion in some cases leads o a double pich error. Thus he accuracy of pich exracion using Auocorrelaion becomes higher if unnecessary peaks are suppressed. In case of AMDF as in equaion (4) ψ ( k) becomes smaller when x( n) is similar o x( n+ k) i.e. if x( n) has a period P, y ( k) produces a noch a k=p as shown in Figure (5) i.e. makes a peak a k=p. Now if we subsiue y( k) equaion () in (4) i reduces o N ψ ( k) = s( n) + w( n) s( n+ k) w( n+ k) o N n= (5) N N ψ ( k) = sn ( ) sn ( + k) + wn ( ) wn ( + k) i.e. N n= N n= (6) 8

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 i.e. here is an AMDF for speech signal and an AMDF for w(n). We see ha he noise componen is obviously independen as compared o he Auocorrelaion funcion seen in (3). Hence, using he Auocorrelaion funcion weighed by, i is expeced ha rue peak is emphasized, and as a y ( k) resul he errors of pich exracion are decreased. So we can define a new funcion which is given by η( k) = φ( k) ( ψ ( k) + τ ) (7) where τ is a fixed number (τ >). The AMDF in equaion (4) a k= is ψ () = and herefore he denominaor is sabilized in equaion (7) by adding he numberτ. Figure 5. Auocorrelaion funcion and proposed funcion. T corresponds o he pich period. Figure. 5 shows he auocorrelaion and proposed funcions obained for a speech signal corruped by noise. In his case, by picking he maximum ampliude of each funcion, he proposed funcion leads o he rue pich, while he auocorrelaion funcion does an erroneous one. Now he new compuaionally simple algorihms # and #2 proposed o implemen his are as follows. 6. Proposed New Algorihms of he New Weighed Auocorrelaion Mehod by Inverse of AMDF 6. Algorihm # The echnique of removing he forman srucure for reliable pich deecion by cener clipping was shown by Sondhi while reaining periodiciy(pich period informaion). Figure 6 shows he block diagram of he Pich exracion algorihm. Figure 6. Block diagram of he pich exracion Algorihm # 9

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy The Speech signal s(n) is secioned in o Overlapping frames of 2-3 ms. Duraion wih 5 percen overlap beween adjacen frames. There is always a poenial loss of informaion during voicing ransiion and voiced/unvoiced boundary and hence he signal needs o be clipped accuraely. For his, he maximum absolue peak levels for he firs and he las ms secions of he speech are deermined and he clipping level is se a 8 percen of he smaller of he wo absolue peak values. Then, each secion of he speech signal is cener clipped which is given by So, he speech secion is infinie peak clipped, resuling in a signal, which akes hree possible values; - if i falls below he negaive clipping level ( c ), + if he sample exceeds he posiive clipping level ( + c ) and oherwise. The samples are hus reduced o hree levels so ha he compuaional and hardware complexiy is reduced. Nex he AMDF is performed for each secion and ampliude normalized. From equaion (7) he AMDF is given by η( k) = N N c n= s ( n) s ( n+ k) c N ( sc( n) sc( n+ k) + τ ) N n= k=,,, M (8) where N is he number of samples in he speech secion and k is he lag number. The normalized AMDF is given by η( k) = η( k) η() (9) Since each individual produc and difference erm can have only 3 values +, -, or, a simple up/down couner and a differenial combinaional logic circui is only desired o perform he compuaion in equaion (8). The use of such a weighed funcion for picking he maximum ampliude as he peak and he corresponding posiion of his peak gives he pich period. Usually in speech processing he signal is windowed (recangular or hamming window) so ha he peaks are gradually apered o zero wih he peak a he maximum level a he fundamenal frequency and reduced ampliude levels a he harmonics. Oher han locaing he pich period, each secion can be classified as voiced/unvoiced by comparing he correlaion peak value a he pich period o a predeermined hreshold value. If he value exceeds he hreshold hen he secion is classified as voiced oherwise as unvoiced. Based on peak signal levels, a silence level hreshold is chosen in all LPC Coders. By compuing he energy for each secion given by N 2 E s n = n= ( ) (2) By comparing his o he silence hreshold, each secion is classified as background noise/speech. The accuraion of his pich deerminaion in noisy environmens can be furher enhanced by doing inerpolaion on 3 poins around he deeced peak. The inerpolaion mehod used in his paper was 2

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 Lagrange s mehod. The infinie peak clipping of he speech signal is done in he range of 5 Hz o 4 Hz which corresponds o he region of fundamenal frequencies for mos men and women. 6.2 Algorihm #2 The seing of he clipping level hreshold in he previous algorihm is sensiive o pich deecion and based on exensive compuer simulaion sudy. So a non-linear disorion of he speech secions eliminaes he need o adjus criically he clipping level. Figure 7. Block diagram of he pich exracion Algorihm #2 The clipping level is se a 5 percen of he peak absolue value for each frame. The speech signal is again secioned in o overlapping Frames of 3 ms duraion wih 5 percen overlap beween adjacen frames. Figure 7 shows a general block srucure of his algorihm. The ampliude is normalized o uniy for each ms secion and hen he signal is nonlinearly disored by raising is cube power. By raising he signal o some high power before applying weighed auocorrelaion mehod highlighs he high ampliude peaks while suppressing low ampliude peaks. Each signal secion is hen cener clipped as in algorihm # and hen windowed using hamming window before performing weighed auocorrelaion. As in algorihm #, his also disinguishes voiced/unvoiced speech secion and also background noise from speech secion oher han pich deecion. 7. Experimens and Resuls For our experimens, male and female speakers was seleced from TIMIT daabase. A universal background model (UBM) is also rained from oher 5 male and 5 female speakers in hetimit daabase. For noise daa, we down-sampled TIMIT daabase from 6kHz o 8kHz and arificially added noise o clean es speech wih various SNRs. The speech analysis frame rae is se o 2ms wih ms inerval. The UBM and speaker models conain 6 Gaussian componens respecively. The performances of speaker idenificaion according o sampling raes in clean condiion are provided in Table. Table. Performances of speaker idenificaion according o sampling raes in clean condiion Sampling Rae 8 khz 6 khz Accuracy (%) 93.9 99.6 The ess suggesed ha using gammaione filering for he sub-banding resuled in an improvemen in he oupu SNR of up o 3dB compared wih he linear case, when speech was used as he signal. This improvemen was observed using boh whie and speech shaped noise as he corruping noise 2

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy signal. In his es, six corruping noise signals were used; whie and speech shaped noise a high, medium and low SNRs. Then pich for each sub band was deermined using weighed Auocorrelaion by inverse of AMDF.The speech daa was clipped afer every 3ms wih 5 percen overlap beween adjacen frames and hen he ampliude was normalized and new AMDF mehod was applied. The es resuls on simulaion using boh he algorihms was able o deec he pich peaks for differen secions of frames as he high ampliude peaks had suppressed low ampliude peaks. Resuls are given in percenage gross pich error (%GPE). If any esimaed pich is no wihin ms of he reference pich, hen i is ermed as gross error. %GPE is provided for boh male and female speech. Tables and 2 show %GPE of he proposed pich deecion Algorihms for boh female and male speech, respecively, a SNR = 2 db, 5 db, db, 5 db and db. A Convenional Pich Exracion algorihm using Auocorrelaion was used for comparison wih he new proposed algorihms. From he ables we can see ha he proposed algorihms ouperforms he convenional pich exracion algorihm wih Auocorrelaion. Also we can see ha Algorihm #2 offers improved % GPE compared o Algorihm #. For example, in SNR = db, he proposed algorihms improves %GPE from 43.75%, obained by he convenional algorihm, o 25.22%(Algorihm #) and 8.34%(Algorihm #2) for female speech, and from 4.74% o 22.22%(Algorihm#) and 6.53%(Algorihm #2) for male speech. Table 2. Performance comparison of he proposed algorihms and convenional algorihm using Auocorrelaion in erms of global pich error (%GPE) for female speech. SNR 2 db 5 db db 5 db db Proposed Algorihm # 6.25 2.5 6.22 8.75 25.22 Proposed Algorihm #2 5.23 9.2 2.45 5.3 8.34 convenional Algorihm using Auocorrelaion 2.5 25.32 3.25 38. 43.75 Table 3. Performance comparison of he proposed algorihms and convenional algorihm using Auocorrelaion in erms of global pich error (%GPE) for male speech. SNR 2 db 5 db db 5 db db Proposed Algorihm # 7.4 2. 4.8 8.5 22.22 Proposed Algorihm #2 6.7 8.98.2 4.23 6.53 convenional Algorihm using Auocorrelaion. 4.8 8.5 25.92 4.74 An example es simulaion carried ou for female speech a SNR = db is as shown in Figures (8) (9), (), () and (2) Figure 8. Speech daa capured 22

Journal of Nex Generaion Informaion Technology Volume, Number 2, Augus, 2 Figure 9. Residual Signal (Applying AMDF discussed) P refers o he pich period. The Speech daa was capured and hen noise was added. I was hen clipped afer every 3ms wih 5 percen overlap beween adjacen frames and hen he ampliude was normalized and AMDF was applied. Figure. Pich for a segmen I was observed ha i provides a pich for a segmen Also i is assumed ha he pich is consan over a shor segmen of speech signal. Figure. Algorihm # resul for hree secions of speech frames 23

Sub band Speech analysis using Gammaone Filer banks and opimal pich exracion mehods for each sub band using average magniude difference funcion (AMDF) for LPC Speech Coders in Noisy Environmens Suma S.A.; Dr. K.S.Gurumurhy Figure 2. Algorihm #2 resul for hree secions of speech frames. From hese resuls, we conclude ha he proposed pich deecion algorihms performs well even in very noisy condiion and a differen environmenal noises. I may be noed ha some of he daa consiss of babble noise, which is a major source of pich error. 8. Conclusions A Robus Pich Exracion Using Gammaone Filer Banks for sub band Speech analysis and opimal pich exracion for each sub band using weighed Auocorrelaion is proposed. The proposed mehod showed beer performance compared o convenional mehod using auocorrelaion in boh male and female speech corruped wih differen colored noise. 9. References [] J.D.Gordy and R.A.Goubran, A combined LPC-based speech coder and filered-x LMS algorihm for acousic echo cancellaion," in Proc. IEEE ICASSP, vol.4,pp.25-28, May, 24. [2] Tesuya Shimamura and Hajirne Kobayashi Weighed Auocorrelaion for Pich Exracion of Noisy Speech, IEEE Transacions on Speech and Audio Processing, Vol. 9,No.7, Ocober 2. [3] G. Muhammad, Noise robus pich deecion based on exended AMDF, Proc. 8h IEEE In. Symp. on Signal Proessing and Informaion Technology,(Sarajevo,Bosnia & Herzegovnia,28) pp. 33-38. [4] R. G. Amado and J. V. Filho, Pich deecion algorihms based on zero-cross rae and auocorrelaion funcion for musical noes,proc. In. Conf. on Audio, Language and Image Processing, (Shanghai, China, 28) pp. 449-454. [5] X-D. Mei, J. Pan and S-H. Sun, Efficien algorihms for speech pich esimaion, Proc.In. Symp. on Inelligen Mulimedia, Video and Speech Processing, (Hong Kong, 2) pp. 42-424. [6] M. S. Rahman, H. Tanaka and T. Shimamura, Pich deerminaion using aligned AMDF, Proc. INTERSPEECH 26 (Pisburgh, USA, 26) pp. 74-77 [7] W. Zhang, G. Xu and Y. Wang, Pich esimaion based on circular AMDF, Proc. of In. Conf. on Acousics, Speech, and Signal Processing (Florida, USA, 22) pp. 34-344. [8] H. Boril and P. Pollak, Direc Time Domain Fundamenal Frequency Esimaion of Speech in Noisy Condiions, Proc. European Signal Processing Conference, vol. (Vienna, Ausria, 24) pp. 3-6. 24