Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method

Size: px
Start display at page:

Download "Single channel speech separation in modulation frequency domain based on a novel pitch range estimation method"

Transcription

1 RESEARCH Open Access Single channel speech separation in modulation requency domain based on a novel pitch range estimation method Azar Mahmoodzadeh 1, Hamid Reza Abutalebi 1*, Hamid Soltanian-Zadeh 2,3 and Hamid Sheikhzadeh 4 Abstract Computational Auditory Scene Analysis (CASA) has been the ocus in recent literature or speech separation rom monaural mixtures. The perormance o current CASA systems on voiced speech separation strictly depends on the robustness o the algorithm used or pitch requency estimation. We propose a new system that estimates pitch (requency) range o a target utterance and separates voiced portions o target speech. The algorithm, irst, estimates the pitch range o target speech in each rame o data in the modulation requency domain, and then, uses the estimated pitch range or segregating the target speech. The method o pitch range estimation is based on an onset and oset algorithm. Speech separation is perormed by iltering the mixture signal with a mask extracted rom the modulation spectrogram. A systematic evaluation shows that the proposed system extracts the majority o target speech signal with minimal intererence and outperorms previous systems in both pitch extraction and voiced speech separation. Keywords: acoustic requency, modulation requency, onset and oset algorithm, pitch range estimation, speech separation 1. Introduction Speech separation, as a solution to the cocktail party problem, is a well-known challenge with important applications. To touch the point, consider the telecommunication systems or the Automatic Speech Recognition systems that lose perormance in the presence o interering sounds [1,2]. An eective system that segregates speech rom intererence in monaural (singlemicrophone) situations can be rewarding in such problems. Many methods have been proposed or monaural speech enhancement; or example, see [3-7]. These methods usually assume certain statistical properties or intererence and tend to lack the capacity o dealing with a variety o intererences. While the monaural speech separation works awkwardly, the human auditory system perorms proiciently. The perceptual process is considered as Auditory Scene Analysis (ASA) [5]. Psychoacoustic research in ASA has inspired considerable * Correspondence: habutalebi@yazduni.ac.ir 1 Speech Processing Research Lab (SPRL), Electrical and Computer Engineering Department, Yazd University, Yazd, Iran Full list o author inormation is available at the end o the article work in developing Computational Auditory Scene Analysis (CASA) systems or speech separation (see [6,7] or a comprehensive review). According to Bregman [5], ASA procedure can be separated into two theoretical stages: segmentation and grouping. At the irst stage, speech is transormed into a higher-dimensional space (such as a time-requency two-dimensional representation) and then, similar timerequency (T-F) units are segmented in order to compose dierent regions [6]. In the second stage, these regions are combined into dierent streams based on the relevant acoustic inormation. The major computational goal o CASA is to separate the target speech signal rom the intererence or dierent purposes, via generating a binary or a sot T-F mask, see, e.g., [8-1]. Grouping, itsel, consists o simultaneous and sequential organizations, which involves grouping o segments across requency and time. The task o sequential grouping is to group the T-F regions relative to the same sound source across time. Figure 1 illustrates this issue in which the upper panel shows T-F regions grouped into one single stream, as they are close enough in both (time 212 Mahmoodzadeh et al; licensee Springer. This is an Open Access article distributed under the terms o the Creative Commons Attribution License ( which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2 Page 2 o 1 Acoustic Frequency Acoustic Frequency Time Time A B Acoustic Frequency Acoustic Frequency and requency) directions; while, the lower panel illustrates the case o two streams o speech, grouped separately as the T-F regions are suiciently ar rom each other in the requency direction. Temporal continuity is an eective cue or grouping T-F regions neighboring in time. However, it cannot handle T-F regions that do not overlap in time due to the silence or intererence segments. Thereore, sequential grouping o such T-F regions is a very challenging problem (see [11,12] or more details). Natural speech includes both voiced and unvoiced portions. Voiced portions o speech are described by periodicity (or harmonicity), which has been used as an important eature in many CASA systems or segregating voiced speech (see, e.g. [13,14]). Despite considerable advances in voiced speech separation, the perormance o current CASA systems is still limited by pitch requency (F) estimation errors and residual noise. Various methods have been proposed or robust pitch requency estimation, see e.g., [15,16]; however, robust pitch requency estimation in low signal-to-noise ratio (SNR) situations still poses a signiicant challenge. While mixed speech may have a great deal o overlap in the time domain, modulation requency analysis provides an additional dimension that can present a greater degree o separation among sources. In other words, the original T-F representation obtained rom transormations like Short-Time Fourier Transorm (STFT) can be augmented to a third dimension that represents modulation requency. In [17], by assuming that the pitch requency range is known and this range is constant in each ilter channel, the modulation spectral analysis is used as a tool or producing the mask or speech separation a higher-dimensional spaces. Based on the above observations, we propose a new system or single channel separation o voiced speech based on the modulation iltering. The idea is that, irst, the target pitch (requency) range is estimated in the modulation requency domain, and then, this range is used or producing the proper mask or speech separation. Because o the ollowing reasons provided in [18], modulation analysis and iltering are applied or the Time Time Figure 1 Segmentation and grouping o speech projected into T-F cells in a 2D representation [6]. target speech separation problem. First, there is a general belie stating that the human ASA system processes the sounds in the modulation requency domain. Second, the energy rom two co-channel talkers is largely non-overlapping in the modulation requency domain. The method o modulation analysis and iltering has extensively been studied by many researchers in the ield o single channel speech separation; Reerence [19] provides a general discussion on this subject. At irst, the proposed system perorms a multipitch range estimation o target and intererence speech based on the segmentation o modulation spectrogram domain. The segmentation is done using an onset and oset algorithm similar to that proposed by Hu and Wang [2]. In the proposed method, the noisy signal is divided into 2 ms time rames and then, the proposed speech separation algorithm is applied to each individual rame. Pitch range estimation method works in three stages: the irst stage computes the modulation spectrogram; the second stage decomposes the modulation spectrogram into segments using an onset and oset algorithm. In this stage, at irst, the peaks and valleys o derivative smoothed intensity o modulation spectrogram are detected and marked as onset and oset candidates. Any onset bigger than a certain threshold is accepted or which the smallest oset between two onsets is selected. Then, onset and oset ronts are produced by connecting the common onsets and osets. Finally, the segments are ormed by matching the onset and oset ronts. The third stage determines the range o pitch requency by selecting and grouping the desired segments. The separation part o the proposed system aims at obtaining a sot mask in the modulation spectrogram domain. By extending the sot mask suggested in [17], a sot mask is proposed whose value depends on the estimated pitch range in each ilter channel. To determine the sot mask in each ilter channel, irst, we ind and mutually compare the modulation spectrogram energy o target and intererence in their pitch ranges estimated rom the previous stage. Then, we transorm the sot mask to the time domain and ilter the mixture signal in order to obtain the separated target signal. Thus, a strategy is suggested which estimates the target pitch range, and subsequently, segregates the target signal rom the intererence. Finally, the separated target signal is obtained rom arranging the separated signal rom each rame, in a time order sequence. This article is organized as ollows. Section 2 describes the modulation requency analysis. In Section 3, irst, a brie description o the present system is given and then the details o each stage are presented. In Section 4, a quantitative measure is proposed or evaluating the perormance o speech separation and it is used or systematic

3 Page 3 o 1 evaluation o pitch range estimation and speech separation. This article concludes with a discussion in Section Modulation requency analysis Decomposing a narrowband signal into a carrier and a modulator signals is an important problem in modulation analysis and iltering [18]. The modulator is a lowrequency signal that describes the amplitude modulation o the original signal; and the carrier is a narrowband signal describing the requency modulation o the signal. Consider a wideband discrete-time signal x(n), or which n represents a discrete-time independent variable. The T-F transorm o a signal x(n), denoted by X (m, k), is obtained using the Discrete STFT (DSTFT). X(m, k) is a T-F transormed narrowband signal (with the time index m) coming out o the kth channel: K 1 DSTFT {x (n)} = X (m, k) = n= x (n) w (mm n) e j2πnk/k k =,..., K 1, (1) where K is the DSTFT length (equal to the number o the ilter bank channels), w( )is the acoustic requency analysis window with length L and M is the decimated actor. The product model o the modulator signal M(m, k) and the carrier signal C(m, k) o the signal X(m, k) in the T-F domain is deined as X (m, k) = M (m, k) C (m, k), (2) The modulator o the signal X(m, k) isoundby applying an envelope detector to this signal, as M (m, k) D {X (m, k)}, (3) where D is the operator o the envelope detector. With respect to Equation (2), the signal s carrier is described as C (m, k) = X (m, k) M (m, k), (4) deined based on the Fourier transorm (FT) and the STFT. The discrete short-time modulation transorm o the signal x(n) is deined as X (k, i) =DFT{D {DSTFT {x (n)}}} I 1 = M (m, k) e j2πmi/i i =,..., I 1, m= where I is the DFT length and i is the modulation requency index. The modulation transorm consists o a ilter-bank that uses the DSTFT ollowed by a subband envelope detector and, then, a requency analyzer o the subband envelopes (the DFT) [18]. The modulation spectrogram intensity, deined as X (k, i) = X(k, i), is generally sketched in a diagram, in which the vertical axis displays the regular acoustic requency index k and the horizontal axis is the modulation requency index i. The modulation analysis ramework is described in Figure 2. A typical example o modulation transorm is illustrated in Figure 3, in which, Figure 3a shows the mixture o a target and interering male speakers and Figure 3b, c, respectively, depict the corresponding T-F representation and modulation spectrogram, with the overall SNR o db. 3. System description The main target o the current system is to produce a sot mask or single channel speech separation in the modulation spectrogram domain. In the proposed system, determining the pitch range o target and intererence speech is necessary or producing the mask or speech separation. The value o this mask in each subband depends on the obtained pitch range o target and intererence in that 2m 2m+1 (6) A good choice or the envelope detector is the incoherent detector, since it is able to create a modulation spectrum that has a large area covered in the modulation requency domain. For the speech signal in hand, this property may be used to ind the pitch requency in the modulation requency domain. Incoherent envelope detector is based on the Hilbert envelope (or realvalued subbands) or the magnitude operator (or complex-valued subbands) [21]. Thereore, the modulator o the complex signal X(m, k) is deined as k (Acoustic Frequency) Base Transorm 2nd Transorm on X(k,i) M (m, k) = X (m, k), (5) The theory o modulation requency analysis and iltering is best explained through the deinition o modulation transorms, which are signal transormations m (Time) i (Modulation Frequency) Figure 2 The modulation analysis ramework and the modulation spectrogram [19].

4 Page 4 o 1 Acoustic requency(hz) Acoustic requency (Hz) Amplitude Time(s) Time(s) (c) 5 subband. When the modulation spectrogram o the speech signal is computed, the pitch ranges o target and intererence speakers are determined and, then, a proper mask is calculated or the speech separation. The overall stages o our system are shown in Figure 4. To determine the mentioned pitch ranges, our proposed method uses an onset and oset detection algorithm [2] to ind the distribution o modulation spectrogram energy in the modulation requency domain, which is an important eature or determining the pitch range. When modulation spectrogram energy is ound, the modulation spectrogram is segmented, as described in Section Then, the resulting segments are grouped in order to estimate the pitch range o each speaker. A detailed description o stages is as ollows T-F decomposition and modulation transorm At the T-F stage, the STFT (as a uniorm ilter-bank) is used or decomposing a broadband signal into narrowband subband signals. The output o the T-F stage Figure 3 Sound mixture and its modulation spectrogram. Mixture o speech signals. T-F energy plot or a mixture o two utterances o a male speaker. The utterances are eight and dos. For better display, energy is plotted as the square o the FT. (c) Modulation spectrogram o the mixture signal. Mixture T-F Decomposition Modulation Transorm Smoothing Onset/oset Decision detection & making matching Pitch range estimation Figure 4 Block diagram o the proposed system. Speech segregation Pitch requency range Segregated speech enters into the modulation transorm stage in order to calculate the modulation spectrogram Pitch range estimation in modulation requency domain The pitch requencies o target and intererence speakers are both time-varying. Occasionally, pitch requencies o the target and intererence speakers are too close to each other, in which this act causes undesired errors in multipitch tracking algorithms and decreases the accuracy o speech separation methods. The algorithm o this article estimates the pitch range o target and intererence speakers o noisy speech in the modulation requency domain. Estimating the pitch range in small time-intervals (or example 2 ms) decreases the error in the pitch range estimation method. In the pitch range estimation approach, at irst, the intensity o the modulation spectrogram is smoothed over the modulation requency, using a low-pass ilter. Then, the partial derivative o the smoothed intensity over the modulation requency is computed. By marking the peaks and valleys o the resulting signal, the onset and oset candidates are detected and the onset and oset ronts are ormed. By matching the onset and oset ronts, the modulation spectrogram o speech signal is segmented. The detailed description o the stages or the pitch range estimation is as ollows Smoothing Smoothing corresponds to low-pass iltering. The proposed system uses a low-pass ilter to smooth the modulation spectrogram intensity over the modulation requency. Considering the requency channel k, the smoothed intensity or X (k, i) is ound as ollows: X s (k, i) = X (k, i) g s (i), (7) where g s (i) is a low-pass FIR ilter with a small number o coeicients with pass-band [, s] inhz.here, * denotes the convolution operator (over the modulation requency). The parameter s determines the degree o smoothing: the smaller s, the smoother X s (k, i) would be. As an example, Figure 5 shows the original (Figure 5a) and the smoothed (Figure 5b-d) intensities o the modulation spectrum or the mixture input signal shown in Figure 3a, at three typical scales. To display more details, Figure 5e-h describes the original and the smoothed intensities at these three scales, in a single requency channel centered at 56 Hz. The intensity luctuation reduces by smoothing, as certiied by Figure 5. Although the local details o onsets and osets become blurred, the major intensity changes o the onsets and osets are still preserved.

5 Page 5 o Acoustic requency (Hz) Intensity (db) (c) (e) (g) (d) Onset/oset detection and matching Onsets and osets correspond to sudden intensity changes. The partial derivative o smoothed modulation spectrogram intensity over the modulation requency is obtained as i X s (k, i) = [ X (k, i) gs (i) ], (8) i Peaks and valleys o the resulting signal o Equation (8) are, respectively, marked as onset and oset candidates. Figure 6 illustrates this procedure, in which the onset candidates with peaks bigger than a threshold θ on are accepted. The peaks corresponding to the true onsets are usually signiicantly higher than other peaks. For this reason, θ on = μ+ s is selected as the threshold, in which μ and s are the mean and standard deviation o all the onset candidates (peaks o Equation 8), respectively [2]. Hu and Wang [2] claim that the perormance o the method using such a threshold choice is satisactory. In every ilter channel k, to determine the oset corresponding to each onset candidate, let on [k, l] represent the modulation requency o the lth onset candidate in the ilter channel k. The corresponding oset, denoted by o [k, l], is located between on [k, l] and on [k, l+1]. I there are multiple oset candidates in this interval, the one with the largest intensity decrease (i.e., the smallest i X s(k, i)) is chosen. () (h) Figure 5 Smoothed intensity values at dierent scales. Initial intensity or all channels. Smoothed intensity at the scale 14. (c) Smoothed intensity at the scale 1. (d) Smoothed intensity at the scale 4. (e) Initial intensity in a channel centered at 56 Hz. () Smoothed intensity in the channel at the scale 14. (g) Smoothed intensity in the channel at the scale 1. (h) Smoothed intensity in the channel at the scale 4. The input is shown in Figure 3a. Intensity (db) The intensity derivative (db/hz) o k, j o o k, j 3 k, j Ater inding the onsets and osets, those with close modulation requencies are connected to the onset and oset ronts, because the requency components o onsets and osets with close modulation requencies probably correspond to the same source. Onset and oset ronts are vertical contours across acoustic requency in the modulation spectrogram domain. The proposed system connects an onset candidate rom a ilter channel to an onset candidate in the above adjacent ilter channel, provided that their distance in the modulation requency is less than a certain threshold relative to the latter ilter channel. In each ilter channel, this threshold is deined as the mean o the distances in the modulation requency direction between two-by-two adjacent onsets. This deinition or the threshold is provided rom experiments and is validated as a good choice in the data. The same applies to the oset candidates. Notice that a threshold with a too small value may prevent onsets or osets rom the same event to joint; while a threshold with a too large value may cause some onsets rom dierent events to connect together [2]. The next step is to orm segments by matching individual onset and oset ronts. Consider ( on [k, l k ], on [k, l k +1],..., on [k+r-1, l k+r-1 ]) as an onset ront with r consecutive ilter channels, in which l k denotes the number o the selected onset as an onset ront member, in the ilter channel k; and consider ( o [k, l k ], o [k+1, l k+1 ],..., o [k +r-1, l k+r-1 ]) as the corresponding oset modulation requencies. For each oset modulation requency, irst, we ind all those oset ronts that cross this oset; then, the oset ront with the most crosses (with the oset modulation requencies) is chosen as the matching oset on k, j on k, j 4 on k, j 8 Figure 6 Onset and oset detection. The upper panel shows the response intensity and the lower panel shows the results o onset and oset detection using a low-pass ilter. The threshold or onset detection is.5 and or oset detection is -.5 indicated by the dash-lines. Detected onsets are marked by downward arrows, and osets by upward arrows.

6 Page 6 o 1 ront. Now, the entire ilter channels rom k to k +r -1 occupied by the matching oset ront (and their corresponding oset modulation requencies on this matchingosetront)arelabeledas matched. I all the channels rom k to k+r-1 are labeled as matched, the matching procedure inishes; otherwise, the matched channels should be put aside and the procedure should be repeated or the remaining unmatched channels. At last, in order to orm the oset ront relative to each onset ront, we replace the oset modulation requencies corresponding to the onset ront with those o the matched oset ronts. The region between the onset ront and its oset ront yields a 2D segment in the acoustic-modulation requency space; see Figure 7 or the schematic representation o the matching procedure Segment selection and decision-making By detecting the onsets and osets and orming the onset and oset ronts, the modulation spectrogram domain o speech signal is segmented. Since the speaker s pitch range is [6, 35] Hz (or men, women, and children), only the segments with modulation requencies in this range are accepted. Now, we describe the grouping procedure or the segments. First, the modulation spectrogram energy o each selected segment is computed. Two almost disjoint segments with most energies, i.e., those with the most modulation spectrogram energies and the least horizontal overlap in the modulation spectrogram, or simplicity called segments A and B, are selected (the case speech interered by a non-speaker-noise has only one such segment). For any other segment (call segment C), i the modulation requency range at least 8% overlaps with that o segment A or segment B, the segment C is groupedwiththatoverlapping segment; otherwise, the segment C is omitted or the grouping procedure. Figure 8 presents a typical example o the grouping procedure. As shown, in each ilter channel, the onset and oset ronts o the resulting group determines the corresponding range o pitch requency in that ilter channel. Acoustic Frequency group A Modulation Frequency 3.3. Speech separation In [17], a mask is presented or speech separation in the modulation spectrogram domain, assuming that the pitch ranges o the target and intererence are known and that these ranges are the same in each subband. Our system extends this idea by allowing the value o the mask in each ilter channel to depend on the estimated pitch range o that ilter channel. Consider a given signal x(n) that is the sum o a target signal x ts (n) and an intererence signal x is (n), sampled at s Hz, i.e., x(n) =x ts (n)+x is (n). A proper mask should be estimated or segregating the target signal rom the intererence signal. In each ilter channel k, thepitch ranges o the target and interering speakers (obtained rom the previous stage) are denoted by PFts k k := [pts,low, p k ts,high ] and PFk k is := [pis,low, p k is,high ], respectively. Also, { Q k := i {,..., I 1} such that ( )/ } i. s (I. M) PF k is deined as the set o modulation requency indices o PF k, i.e., a pitch range in the ilter channel k. To produce a requency mask in each ilter channel k, deine the mean o the modulation spectral energy relative to a pitch range as the energy normalized by the wideness o that pitch range: E k = / 2 X(k, i) (phigh k p low k ) (9) i Q k Acoustic Frequency group A Modulation Frequency group B Figure 8 A graphical expression o the method o grouping segments in the modulation spectrogram domain; or one speaker, or two speakers. Acoustic requency k+r-1 k Modulation requency k+r-1 k Modulation requency Onset Oset Onset ront member Corresponding oset Matching oset ront member Figure 7 Schematic representation o the matching procedure; the osets corresponding to the onset ront is determined, the matching oset ront members are ound. The requency mask is calculated, when the means o the modulation spectral energy o the target and intererence speakers are compared in the ollowing sense. E k ts F k = E k ts+ E k, (1) is Since there are artiacts associated with applying masks in the modulation requency domain (see [22]), this domain is not preerable or modulation iltering in order to mask out the intererence and reconstruct a time-domain signal. Instead, the requency mask is transormed to the time domain. To this end, ailter

7 Page 7 o 1 with linear phase is constructed whose magnitude is F k and the assigned linear phase is j k (i) =i. Then,the inverse DFT is taken k (m) = 1 I 1 F k e jφk (i) e j2πmi/i. (11) I i= The separated target signal is estimated by the convolution (over the variable m) o the obtained ilter k (m) with the modulator signal o the mixture signal x(n) and then, multiplying by the carrier signal o the mixture signal [ ] X (m, k) = M (m, k) k (m) C (m, k), (12) Finally, the separated target signal in the time domain is obtained by taking the inverse STFT o X (m, k). 4. Evaluation As mentioned earlier, our system estimates the pitch range and uses this range or the speech separation. In this section, we evaluate the proposed system in the processes o pitch range estimation and speech separation Pitch range estimation First, the proposed system is evaluated in the pitch range estimation process with utterances chosen rom the Lee s database [23] and a corpus o 1 mixtures o speech and intererence [24], commonly used or CASA research, see, e.g., [13,25,26]. The corpus contains utterances rom both male and emale speakers. These utterances are mixed with a set o intrusions at dierent SNR levels. These intrusions are N: 1 khz pure tone; N1:whitenoise;N2:noisebursts;N3:cocktailparty noise; N4: rock music; N5: siren, N6: trill telephone; N7: emale speech; N8: male speech; and N9: emale speech. These intrusions have a considerable variety; or example, N3 is noise-like, while N5 contains strong harmonic sounds. They orm a realistic corpus or evaluating the capacity o a CASA system when it deals with various types o intererence. The signal X(k, i) is the modulation spectrogram o an input signal that is digitized at a 16-kHz sampling rate. The parameters o the proposed system are set to M = 16 and K = 128. w(n) is a Hanning window with length L = 64 (reer to Section 2). The STFT ilter-bank has 128 ilter channels, or which the center requency o the kth ilter channel is ω k =2πk/K, k =,..., K-1. Figure 9 shows the modulation spectrogram and the obtained segments or a typical speech rame, when the proposed system is applied. The speech signal is a mixture o target and intererence with the overall SNR o db. We select a male speech, a white noise and a trill Acoustic requency (Hz) Acoustic requency (Hz) Acoustic requency (Hz) (c) Figure 9 Modulation spectrogram and segments obtained or a mixture o male speaker, white noise, and (c) trill telephone. The input is shown in Figure 3a. telephone as the intererence. The results show that although the powers o the speech and intererence signals are equal, the proposed method is still able to estimate the pitch range o the target speaker with a reasonable accuracy. Figure 1 shows the average error percentage o the pitch range estimation by the proposed system on the above mixtures at dierent SNR levels. To determine the error percentage, we assign a two-element vector to the margins o each pitch range and ind the root mean square error distance between the vectors corresponding to the true and estimated pitch ranges. As shown in Figure 1, the proposed system is able to estimate 79.9% o the target pitch range, even at -5 db SNR. The estimation rate increases to about 96.1%, as the SNR increases to 15 db. Percentage o error detection LSH model RAPT MAP Proposed algorithm Mixture SNR (db) Figure 1 Percentage o pitch range estimation error or dierent SNR levels.

8 Page 8 o 1 A reliable evaluation o the proposed system requires a reerence range o the true pitch. However, such a reerence is probably impossible to obtain rom a noisy speech.weindthereerencepitchrangebyraming the clean speech signal and calculating the pitch requency in each rame. The perormance o the proposed method is compared with that o the Least Square Harmonic (LSH) technique [27], Robust Algorithm or Pitch Tracking (RAPT) [28], and the Maximum A Posterior (MAP) estimator [29]. RAPT and MAP are two standard pitch estimation algorithms. The LSH algorithm, derived in [27] or harmonic decomposition o a time-varying signal, estimates the harmonic amplitudes and phases, by solving a set o linear equations that minimizes the mean square error. The RAPT algorithm estimates the pitch requency, by searching or local maxima in the autocorrelation unction o the windowed speech signal and then, using a dynamic programming technique (see [28] or more details). The MAP approach [29] considers a harmonic model or the voiced speech so that each windowed signal is expressed with a generalized linear model whose basic unctions depend on the undamental requency and number o harmonic partials. Figure 1 also provides a comparison between the results o the pitch estimation using the mentioned our methods, in which the proposed system perorms consistently better than the three standard methods, at all SNR levels. Although the perormance o the LSH model (as the best perorming one among the mentioned standard algorithms) is good at SNR levels above 1 db, it drops quickly as SNR decreases, which shows that the proposed system is more robust to intererence compared with the LSH model. As mentioned in [29], MAP perorms slightly better in low SNR s rather than high SNR s. In addition, RAPT ails to estimate the desired pitch period in low SNR s, because it mistakenly chooses sub-harmonic and harmonic partials instead o the true pitch period. The current scheme perorms almost consistently in both high and low SNR s Voiced speech separation A corpus o 1 mixtures composed o 1 target utterances mixed with 1 intrusions is recruited or assessing the perormance o the system on voiced speech separation; these data are described in Section For comparison, the Hu and Wang system [14] and the spectral subtraction method [3] are employed. Perormance o the voiced speech separation is evaluated using two measures commonly used or this propose [14]: The percentage o energy loss, P EL, which measures the amount o the target speech excluded rom the segregated speech. The percentage o residual noise, P NR, which measures the amount o the intrusion included in the segregated speech. P EL and P NR are error measures o a separation system, which are complementary indices or assessing the system perormance. In addition, the SNR o the segregated voiced target (in db) provides a good comparison between waveorms [14]: SNR =1log 1 n n s2 (n) [ s (n) x (n) ] 2, (13) where x (n) is the estimated signal and s(n) is the target signal beore being mixed with the intrusion. The results o our system are shown in Figure 11. Each point in the igures represents the average value o 1 mixtures in the complete test corpus at a particular SNR level. Figure 11a, b shows the percentage o energy loss and noise residue. Since the goal here is to segregate the voiced target, the P EL values are only deined or the target energy at the voiced rames o the target. AsshowninFigure11,theproposedsystemsegregates 78.9% o the voiced target energy at -5 db SNR and 99% at 15 db SNR. At the same time, at -5 db, 15.9% o the segregated energy belongs to intrusion. This number drops to.7% at 15 db SNR. Figure 11c shows the SNR o the segregated target. This system obtains an average 7.5 db gain in SNR when the mixture SNR is -5 db. This gain increases to 14.3 db, when the mixture SNR is 15 db. As shown in the igure, the segregated target loses more target energy (Figure 11a), but contains less intererence as well (Figure 11b). Figure 11 also shows the perormance o the system proposed by Hu and Wang or voiced speech separation Percentage o energy loss (P EL ) Hu and Wang (24) Proposed Algorithm Mixture SNR (db) SNR o segregated target (db) (c) Mixture SNR (db) percentage o residual noise (P NR ) Hu and Wang (24) Proposed Algorithm Mixture SNR (db) Hu and Wang (24) Proposed Algorithm Figure 11 Results o voiced speech separation. Percentage o energy loss on voiced target. Percentage o noise residue. (c) SNR o segregated voiced target.

9 Page 9 o 1 [14], which is a representative o CASA systems. As shown in the igure, the Hu and Wang s system yields a lower percentage o noise residues (Figure 11b), but has a much higher percentage o target energy loss (Figure 11a, c). Nevertheless, it should be noted that our system signiicantly improved the P EL (in Figure 11a, see, e.g., by around 11 and 1% improvement at and 15 db, respectively), which leads to much less signal distortion. The price paid or this is a slightly increase in P NR,as depicted in Figure 11b (e.g., by around 6 and.5% increase at and 15 db, respectively). The average SNR or each intrusion is shown or the proposed system in Figure 12 in comparison with that o the original mixtures, Hu and Wang s system, and a Spectral Subtraction Method, which is a standard method or speech enhancement [3] (see also [14]). Theproposedsystemperormsconsistentlybetterthan Hu and Wang s system and spectral subtraction. In average, the proposed system obtains a db SNR gain, which is about 1.92 db better than Hu and Wang s system and 8.4 db better than the Spectral Subtraction. To help the reader recognize the real dierence in the perormance, a ile is prepared including sample audio mixture signals (target speech signal + intererence signal) and the results o the separation using the Spectral Subtraction, Hu and Wang, and the proposed systems. The ile is available at AM-SampleWaves.ppt. 5. Discussions and conclusions One o the major challenges in speech enhancement is the separation o a target speech rom an intererence signal o the same type. The accuracy o the CASA methods in single channel speech separation depends on the correctness o the pitch requency estimation o SNR (db) Mixture Proposed Algorithm Hu and Wang (24) Spectral Subtraction N N1 N2 N3 N4 N5 N6 N7 N8 N9 Intrusion type Figure 12 SNR results or segregated speech and original mixtures or a corpus o voiced speech and various intrusions. two simultaneous speakers because the proper mask in the T-F domain or the speech separation is produced in association with the estimated pitch requency. In this article, a single channel speech separation system is proposed that estimates the pitch range o one or two speakers and segregates the target speech rom the intererence. The pitch range is estimated using the onset and oset algorithm considering the distribution o speaker energy in the modulation spectrogram domain. When the target and intererence speakers are either male or emale, the methods or pitch requency estimation encounter large errors because o close pitch requency values. Thereore, CASA methods that employ the pitch requency as their main eature or speech separation ace diiculties. In contrast, a main novelty o the present algorithm is the estimation o pitch range based on short time-rames o the mixture signal. The constructed mask or speech separation depends on the pitch range estimated independently in each subband. As shown by the evaluation results, major portions o the voiced target speech are separated rom the interering speech using this mask. In addition, the proposed system can separate the unvoiced portions that are quasi-periodic because o the proximity o voiced portions. The proposed algorithm is robust to intererence and produces good estimates o both pitch range and voiced speech, even in the presence o strong intererence. Systematic evaluation shows that the proposed algorithm perorms signiicantly better than the mentioned CASA and speech enhancement systems. Silent gaps and other intererence-masked intervals are usually included in natural speech utterances. In practice, the utterance across such time-intervals should be grouped. This is a sequential grouping problem [5,6] whose segments or masks can be obtained using the speech recognition in a top-down manner (also, limited to non-speech intererence) [11] or the speaker recognition trained by speaker models [31]. However, the proposed algorithm does not encounter this problem o sequential grouping because it operates in the modulation spectrogram domain. In terms o computational complexity, the main cost o the proposed algorithm arises rom determining segmentsinmodulationspectrogram or pitch range estimation. The estimation o the mask and convolution or speech separation consumes a small raction o the overall cost. Both tasks (pitch range estimation and speech separation) are implemented in the requency domain, so the computational complexity is O(NlogN), where N isthenumberosamplesintheinputsignal.these operations should separately be perormed or each subband. On the other hand, since eature extraction takes place independently in dierent subbands, substantial speedup can be achieved through parallel computing.

10 Page 1 o 1 For uture work, the proposed algorithm can be improved by iterative estimation o pitch range and speech separation. The algorithm can include a speciic method to jump-start the iterative process, which gives an initial estimate o both pitch range and mask with reasonable quality. In general, the perormance o the algorithm depends on the initial estimate o pitch range; better initial estimates would lead to better perormance. Even with a poor estimate o pitch range, which is unavoidable in very low SNR conditions, the proposed algorithm improves the initial estimate during the iterative process. Author details 1 Speech Processing Research Lab (SPRL), Electrical and Computer Engineering Department, Yazd University, Yazd, Iran 2 Control and Intelligent Processing Center o Excellence (CIPCE), School o Electrical and Computer Engineering, University o Tehran, Tehran, Iran 3 Image Analysis Laboratory, Department o Radiology, Henry Ford Health System, Detroit, MI, USA 4 Electrical Engineering Department, Amirkabir University o Technology, Tehran, Iran Competing interests The authors declare that they have no competing interests. Received: 7 May 211 Accepted: 17 March 212 Published: 17 March 212 Reerences 1. RP Lippmann, Speech recognition by machines and humans. Speech Commun. 22, 1 16 (1997). doi:1.116/s (97) JJ Sroka, LD Braida, Human and machine consonant recognition. Speech Commun. 45, (25) 3. A de Cheveigne, in Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, ed. by Brown GJ, Wang DL (Wiley & IEEE, Hoboken, NJ, 26), pp S Dubnov, J Tabrikian, M Arnon-Targan, Speech source separation in convolutive environments using space-time-requency analysis. EURASIP J Appl Signal Process Article , 11 (26) 5. AS Bregman, Auditory Scene Analysis (MIT, Cambridge, MA, 199) 6. Brown GJ, Wang DL (eds.), Computational Auditory Scene Analysis: Principles, Algorithms, and Applications (Wiley & IEEE, Hoboken, NJ, 26) 7. M Buchler, S Allegro, S Launer, N Dillier, Sound classiication in hearing aids inspired by auditory scene analysis. EURASIP J Appl Signal Process. 18, (25) 8. G Hu, D Wang, A Tandem algorithm or pitch estimation and voiced speech segregation. IEEE Trans Audio Speech Lang Process. 18(8), (27) 9. Y Shao, S Srinivasan, Z Jin, D Wang, A computational auditory scene analysis system or speech segregation and robust speech recognition. Comput Speech Lang. 24, (21). doi:1.116/j.csl MH Radar, RM Dansereau, A Sayadiyan, A maximum likelihood estimation o vocal-tract-related ilter characteristics or single channel speech separation. EURASIP J Audio Speech Music Process Article 84186, 27, 15 (27) 11. J Barker, M Cooke, D Ellis, Decoding speech in the presence o other sources. Speech Commun. 45, 5 25 (25). doi:1.116/j. specom Y Shao, DL Wang, Model-based sequential organization in cochannel speech. IEEE Trans Acoust Speech Signal Process. 14, (25) 13. GJ Brown, M Cooke, Computational auditory scene analysis. Comput Speech Lang. 8, (1994). doi:1.16/csla G Hu, DL Wang, Monaural speech separation based on pitch tracking and amplitude modulation. IEEE Trans Neural Net. 15, (24). doi:1.119/tnn M Wu, DL Wang, GJ Brown, A multipitch tracking algorithm or noisy speech. IEEE Trans Speech Audio Process. 11, (23). doi:1.119/ TSA J Le Roux, H Kameoka, N Ono, A de Cheveigne, S Sagayama, Single and multiple F contour estimation through parametric spectrogram modeling o speech in noisy environments. IEEE Trans Audio Speech Lang Process. 15, (27) 17. SM Schimmel, LE Atlas, K Nie, Feasibility o single channel speaker separation based on modulation requency analysis, in Proc IEEE International Conerence on Acoustics, Speech and Signal Processing, Hawaii, USA. 4, (27) 18. SM Schimmel, (Dissertation, University o Washington, 27) 19. L Atlas, SA Shamma, Joint acoustic and modulation requency. EURASIP J Appl Signal Process. 23(7), (23). doi:1.1155/ S G Hu, DL Wang, Auditory segmentation based on onset and oset analysis. IEEE Trans Audio Speech Lang Process. 15(2), (27) 21. R Drullman, JM Festen, R Plomp, Eect o temporal envelope smearing on speech reception. J Acoust Soc Am. 95, (1994). doi:1.1121/ SM Schimmel, LE Atlas, Coherent envelope detection or modulation iltering o speech, in Proc IEEE International Conerence on Acoustics, Speech and Signal Processing, Pennsylvania, USA, (25) 23. TW Lee, Blind source separation: audio examples (1998). edu/~tewon/blind/blind_audio.html. Accessed 4 May MP Cooke, Modeling Auditory Processing and Organization (Cambridge University Press, Cambridge, 1993) 25. LA Drake, (Dissertation, University o Northwestern, 21) 26. DL Wang, GJ Brown, Separation o speech rom interering sounds based on oscillatory correlation. IEEE Trans Neural Netw. 1, (1999). doi:1.119/ Q Li, L Atlas, Time-variant least-squares harmonic modeling, in Proc IEEE International Conerence on Acoustics, Speech and Signal Processing, Hong Kong. 2, (23) 28. D Talkin, A robust algorithm or pitch tracking (RAPT), in Speech Coding and Synthesis, ed. by Paliwal KK, Klein WB (Elsevier, NewYork, NY, 1995), pp J Tabrikian, S Dubnov, Y Dickalov, Maximum a posterior probability pitch tracking in noisy environments using harmonic model. IEEE Trans Speech Audio Process. 12, (24). doi:1.119/tsa X Huang, A Acero, HW Hon, Spoken Language Processing: A Guide to Theory, Algorithms, and System Development (Prentice Hall PTR, Upper Saddle River, NJ, 21) 31. Y Shao, (Dissertation, University o Ohio State, 27) doi:1.1186/ Cite this article as: Mahmoodzadeh et al.: Single channel speech separation in modulation requency domain based on a novel pitch range estimation method. EURASIP Journal on Advances in Signal Processing :67. Submit your manuscript to a journal and beneit rom: 7 Convenient online submission 7 Rigorous peer review 7 Immediate publication on acceptance 7 Open access: articles reely available online 7 High visibility within the ield 7 Retaining the copyright to your article Submit your next manuscript at 7 springeropen.com

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain

Determination of Pitch Range Based on Onset and Offset Analysis in Modulation Frequency Domain Determination o Pitch Range Based on Onset and Oset Analysis in Modulation Frequency Domain A. Mahmoodzadeh Speech Proc. Research Lab ECE Dept. Yazd University Yazd, Iran H. R. Abutalebi Speech Proc. Research

More information

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation

Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Dominant Voiced Speech Segregation Using Onset Offset Detection and IBM Based Segmentation Shibani.H 1, Lekshmi M S 2 M. Tech Student, Ilahia college of Engineering and Technology, Muvattupuzha, Kerala,

More information

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking

ScienceDirect. Unsupervised Speech Segregation Using Pitch Information and Time Frequency Masking Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 46 (2015 ) 122 126 International Conference on Information and Communication Technologies (ICICT 2014) Unsupervised Speech

More information

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues

Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues Effects of Reverberation on Pitch, Onset/Offset, and Binaural Cues DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction Human performance Reverberation

More information

Monaural and Binaural Speech Separation

Monaural and Binaural Speech Separation Monaural and Binaural Speech Separation DeLiang Wang Perception & Neurodynamics Lab The Ohio State University Outline of presentation Introduction CASA approach to sound separation Ideal binary mask as

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation

IN a natural environment, speech often occurs simultaneously. Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 15, NO. 5, SEPTEMBER 2004 1135 Monaural Speech Segregation Based on Pitch Tracking and Amplitude Modulation Guoning Hu and DeLiang Wang, Fellow, IEEE Abstract

More information

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation

An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation An Efficient Extraction of Vocal Portion from Music Accompaniment Using Trend Estimation Aisvarya V 1, Suganthy M 2 PG Student [Comm. Systems], Dept. of ECE, Sree Sastha Institute of Engg. & Tech., Chennai,

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation

A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation Technical Report OSU-CISRC-1/8-TR5 Department of Computer Science and Engineering The Ohio State University Columbus, OH 431-177 FTP site: ftp.cse.ohio-state.edu Login: anonymous Directory: pub/tech-report/8

More information

ECE5984 Orthogonal Frequency Division Multiplexing and Related Technologies Fall Mohamed Essam Khedr. Channel Estimation

ECE5984 Orthogonal Frequency Division Multiplexing and Related Technologies Fall Mohamed Essam Khedr. Channel Estimation ECE5984 Orthogonal Frequency Division Multiplexing and Related Technologies Fall 2007 Mohamed Essam Khedr Channel Estimation Matlab Assignment # Thursday 4 October 2007 Develop an OFDM system with the

More information

TIME-FREQUENCY ANALYSIS OF NON-STATIONARY THREE PHASE SIGNALS. Z. Leonowicz T. Lobos

TIME-FREQUENCY ANALYSIS OF NON-STATIONARY THREE PHASE SIGNALS. Z. Leonowicz T. Lobos Copyright IFAC 15th Triennial World Congress, Barcelona, Spain TIME-FREQUENCY ANALYSIS OF NON-STATIONARY THREE PHASE SIGNALS Z. Leonowicz T. Lobos Wroclaw University o Technology Pl. Grunwaldzki 13, 537

More information

Detection and direction-finding of spread spectrum signals using correlation and narrowband interference rejection

Detection and direction-finding of spread spectrum signals using correlation and narrowband interference rejection Detection and direction-inding o spread spectrum signals using correlation and narrowband intererence rejection Ulrika Ahnström,2,JohanFalk,3, Peter Händel,3, Maria Wikström Department o Electronic Warare

More information

ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University of Colorado, Boulder

ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University of Colorado, Boulder ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University o Colorado, Boulder LECTURE 13 PHASE NOISE L13.1. INTRODUCTION The requency stability o an oscillator

More information

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals

speech signal S(n). This involves a transformation of S(n) into another signal or a set of signals 16 3. SPEECH ANALYSIS 3.1 INTRODUCTION TO SPEECH ANALYSIS Many speech processing [22] applications exploits speech production and perception to accomplish speech analysis. By speech analysis we extract

More information

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gajić Department o Telecommunications, Norwegian University o Science and Technology 7491 Trondheim, Norway gajic@tele.ntnu.no

More information

Sinusoidal signal. Arbitrary signal. Periodic rectangular pulse. Sampling function. Sampled sinusoidal signal. Sampled arbitrary signal

Sinusoidal signal. Arbitrary signal. Periodic rectangular pulse. Sampling function. Sampled sinusoidal signal. Sampled arbitrary signal Techniques o Physics Worksheet 4 Digital Signal Processing 1 Introduction to Digital Signal Processing The ield o digital signal processing (DSP) is concerned with the processing o signals that have been

More information

Audio Imputation Using the Non-negative Hidden Markov Model

Audio Imputation Using the Non-negative Hidden Markov Model Audio Imputation Using the Non-negative Hidden Markov Model Jinyu Han 1,, Gautham J. Mysore 2, and Bryan Pardo 1 1 EECS Department, Northwestern University 2 Advanced Technology Labs, Adobe Systems Inc.

More information

A MATLAB Model of Hybrid Active Filter Based on SVPWM Technique

A MATLAB Model of Hybrid Active Filter Based on SVPWM Technique International Journal o Electrical Engineering. ISSN 0974-2158 olume 5, Number 5 (2012), pp. 557-569 International Research Publication House http://www.irphouse.com A MATLAB Model o Hybrid Active Filter

More information

Optimizing Reception Performance of new UWB Pulse shape over Multipath Channel using MMSE Adaptive Algorithm

Optimizing Reception Performance of new UWB Pulse shape over Multipath Channel using MMSE Adaptive Algorithm IOSR Journal o Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 05, Issue 01 (January. 2015), V1 PP 44-57 www.iosrjen.org Optimizing Reception Perormance o new UWB Pulse shape over Multipath

More information

NOISE ESTIMATION IN A SINGLE CHANNEL

NOISE ESTIMATION IN A SINGLE CHANNEL SPEECH ENHANCEMENT FOR CROSS-TALK INTERFERENCE by Levent M. Arslan and John H.L. Hansen Robust Speech Processing Laboratory Department of Electrical Engineering Box 99 Duke University Durham, North Carolina

More information

Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain {jordi.bonada,

Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain   {jordi.bonada, GENERATION OF GROWL-TYPE VOICE QUALITIES BY SPECTRAL MORPHING Jordi Bonada Merlijn Blaauw Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain Email: {jordi.bonada, merlijn.blaauw}@up.edu

More information

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012

Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 Preeti Rao 2 nd CompMusicWorkshop, Istanbul 2012 o Music signal characteristics o Perceptual attributes and acoustic properties o Signal representations for pitch detection o STFT o Sinusoidal model o

More information

Introduction of Audio and Music

Introduction of Audio and Music 1 Introduction of Audio and Music Wei-Ta Chu 2009/12/3 Outline 2 Introduction of Audio Signals Introduction of Music 3 Introduction of Audio Signals Wei-Ta Chu 2009/12/3 Li and Drew, Fundamentals of Multimedia,

More information

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing

Project 0: Part 2 A second hands-on lab on Speech Processing Frequency-domain processing Project : Part 2 A second hands-on lab on Speech Processing Frequency-domain processing February 24, 217 During this lab, you will have a first contact on frequency domain analysis of speech signals. You

More information

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm

Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Speech Enhancement Based On Spectral Subtraction For Speech Recognition System With Dpcm A.T. Rajamanickam, N.P.Subiramaniyam, A.Balamurugan*,

More information

EEE 311: Digital Signal Processing I

EEE 311: Digital Signal Processing I EEE 311: Digital Signal Processing I Course Teacher: Dr Newaz Md Syur Rahim Associated Proessor, Dept o EEE, BUET, Dhaka 1000 Syllabus: As mentioned in your course calendar Reerence Books: 1 Digital Signal

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Auditory modelling for speech processing in the perceptual domain

Auditory modelling for speech processing in the perceptual domain ANZIAM J. 45 (E) ppc964 C980, 2004 C964 Auditory modelling for speech processing in the perceptual domain L. Lin E. Ambikairajah W. H. Holmes (Received 8 August 2003; revised 28 January 2004) Abstract

More information

Fatigue Life Assessment Using Signal Processing Techniques

Fatigue Life Assessment Using Signal Processing Techniques Fatigue Lie Assessment Using Signal Processing Techniques S. ABDULLAH 1, M. Z. NUAWI, C. K. E. NIZWAN, A. ZAHARIM, Z. M. NOPIAH Engineering Faculty, Universiti Kebangsaan Malaysia 43600 UKM Bangi, Selangor,

More information

AN EFFICIENT SET OF FEATURES FOR PULSE REPETITION INTERVAL MODULATION RECOGNITION

AN EFFICIENT SET OF FEATURES FOR PULSE REPETITION INTERVAL MODULATION RECOGNITION AN EFFICIENT SET OF FEATURES FOR PULSE REPETITION INTERVAL MODULATION RECOGNITION J-P. Kauppi, K.S. Martikainen Patria Aviation Oy, Naulakatu 3, 33100 Tampere, Finland, ax +358204692696 jukka-pekka.kauppi@patria.i,

More information

Signals and Systems II

Signals and Systems II 1 To appear in IEEE Potentials Signals and Systems II Part III: Analytic signals and QAM data transmission Jerey O. Coleman Naval Research Laboratory, Radar Division This six-part series is a mini-course,

More information

Mel Spectrum Analysis of Speech Recognition using Single Microphone

Mel Spectrum Analysis of Speech Recognition using Single Microphone International Journal of Engineering Research in Electronics and Communication Mel Spectrum Analysis of Speech Recognition using Single Microphone [1] Lakshmi S.A, [2] Cholavendan M [1] PG Scholar, Sree

More information

Cyclostationarity-Based Spectrum Sensing for Wideband Cognitive Radio

Cyclostationarity-Based Spectrum Sensing for Wideband Cognitive Radio 9 International Conerence on Communications and Mobile Computing Cyclostationarity-Based Spectrum Sensing or Wideband Cognitive Radio Qi Yuan, Peng Tao, Wang Wenbo, Qian Rongrong Wireless Signal Processing

More information

Envelope Modulation Spectrum (EMS)

Envelope Modulation Spectrum (EMS) Envelope Modulation Spectrum (EMS) The Envelope Modulation Spectrum (EMS) is a representation of the slow amplitude modulations in a signal and the distribution of energy in the amplitude fluctuations

More information

The psychoacoustics of reverberation

The psychoacoustics of reverberation The psychoacoustics of reverberation Steven van de Par Steven.van.de.Par@uni-oldenburg.de July 19, 2016 Thanks to Julian Grosse and Andreas Häußler 2016 AES International Conference on Sound Field Control

More information

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE

1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER /$ IEEE 1856 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 7, SEPTEMBER 2010 Sequential Organization of Speech in Reverberant Environments by Integrating Monaural Grouping and Binaural

More information

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter

Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Reduction of Musical Residual Noise Using Harmonic- Adapted-Median Filter Ching-Ta Lu, Kun-Fu Tseng 2, Chih-Tsung Chen 2 Department of Information Communication, Asia University, Taichung, Taiwan, ROC

More information

Auditory Segmentation Based on Onset and Offset Analysis

Auditory Segmentation Based on Onset and Offset Analysis Technical Report: OSU-CISRC-1/-TR4 Technical Report: OSU-CISRC-1/-TR4 Department of Computer Science and Engineering The Ohio State University Columbus, OH 4321-1277 Ftp site: ftp.cse.ohio-state.edu Login:

More information

Modulation Domain Spectral Subtraction for Speech Enhancement

Modulation Domain Spectral Subtraction for Speech Enhancement Modulation Domain Spectral Subtraction for Speech Enhancement Author Paliwal, Kuldip, Schwerin, Belinda, Wojcicki, Kamil Published 9 Conference Title Proceedings of Interspeech 9 Copyright Statement 9

More information

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise.

1. Introduction. Keywords: speech enhancement, spectral subtraction, binary masking, Gamma-tone filter bank, musical noise. Journal of Advances in Computer Research Quarterly pissn: 2345-606x eissn: 2345-6078 Sari Branch, Islamic Azad University, Sari, I.R.Iran (Vol. 6, No. 3, August 2015), Pages: 87-95 www.jacr.iausari.ac.ir

More information

Experiment 7: Frequency Modulation and Phase Locked Loops Fall 2009

Experiment 7: Frequency Modulation and Phase Locked Loops Fall 2009 Experiment 7: Frequency Modulation and Phase Locked Loops Fall 2009 Frequency Modulation Normally, we consider a voltage wave orm with a ixed requency o the orm v(t) = V sin(ω c t + θ), (1) where ω c is

More information

REAL-TIME BROADBAND NOISE REDUCTION

REAL-TIME BROADBAND NOISE REDUCTION REAL-TIME BROADBAND NOISE REDUCTION Robert Hoeldrich and Markus Lorber Institute of Electronic Music Graz Jakoministrasse 3-5, A-8010 Graz, Austria email: robert.hoeldrich@mhsg.ac.at Abstract A real-time

More information

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

International Journal of Modern Trends in Engineering and Research   e-issn No.: , Date: 2-4 July, 2015 International Journal of Modern Trends in Engineering and Research www.ijmter.com e-issn No.:2349-9745, Date: 2-4 July, 2015 Analysis of Speech Signal Using Graphic User Interface Solly Joy 1, Savitha

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 5: 12 Feb 2009. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

Pitch-based monaural segregation of reverberant speech

Pitch-based monaural segregation of reverberant speech Pitch-based monaural segregation of reverberant speech Nicoleta Roman a Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210 DeLiang Wang b Department of Computer

More information

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition

Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Spectral estimation using higher-lag autocorrelation coefficients with applications to speech recognition Author Shannon, Ben, Paliwal, Kuldip Published 25 Conference Title The 8th International Symposium

More information

Noise Removal from ECG Signal and Performance Analysis Using Different Filter

Noise Removal from ECG Signal and Performance Analysis Using Different Filter International Journal o Innovative Research in Electronics and Communication (IJIREC) Volume. 1, Issue 2, May 214, PP.32-39 ISSN 2349-442 (Print) & ISSN 2349-45 (Online) www.arcjournal.org Noise Removal

More information

SPEECH ENHANCEMENT BASED ON ITERATIVE WIENER FILTER USING COMPLEX SPEECH ANALYSIS

SPEECH ENHANCEMENT BASED ON ITERATIVE WIENER FILTER USING COMPLEX SPEECH ANALYSIS SPEECH ENHANCEMENT BASED ON TERATVE WENER FLTER USNG COMPLEX SPEECH ANALYSS Keiichi Funaki Computing & Networking Center, Univ. o the Ryukyus Senbaru, Nishihara, Okinawa, 93-3, Japan phone: +(8)98-895-8946,

More information

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS

MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS MODIFIED DCT BASED SPEECH ENHANCEMENT IN VEHICULAR ENVIRONMENTS 1 S.PRASANNA VENKATESH, 2 NITIN NARAYAN, 3 K.SAILESH BHARATHWAAJ, 4 M.P.ACTLIN JEEVA, 5 P.VIJAYALAKSHMI 1,2,3,4,5 SSN College of Engineering,

More information

3.6 Intersymbol interference. 1 Your site here

3.6 Intersymbol interference. 1 Your site here 3.6 Intersymbol intererence 1 3.6 Intersymbol intererence what is intersymbol intererence and what cause ISI 1. The absolute bandwidth o rectangular multilevel pulses is ininite. The channels bandwidth

More information

A classification-based cocktail-party processor

A classification-based cocktail-party processor A classification-based cocktail-party processor Nicoleta Roman, DeLiang Wang Department of Computer and Information Science and Center for Cognitive Science The Ohio State University Columbus, OH 43, USA

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Enhanced Waveform Interpolative Coding at 4 kbps

Enhanced Waveform Interpolative Coding at 4 kbps Enhanced Waveform Interpolative Coding at 4 kbps Oded Gottesman, and Allen Gersho Signal Compression Lab. University of California, Santa Barbara E-mail: [oded, gersho]@scl.ece.ucsb.edu Signal Compression

More information

Introduction to OFDM. Characteristics of OFDM (Orthogonal Frequency Division Multiplexing)

Introduction to OFDM. Characteristics of OFDM (Orthogonal Frequency Division Multiplexing) Introduction to OFDM Characteristics o OFDM (Orthogonal Frequency Division Multiplexing Parallel data transmission with very long symbol duration - Robust under multi-path channels Transormation o a requency-selective

More information

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License

Non-intrusive intelligibility prediction for Mandarin speech in noise. Creative Commons: Attribution 3.0 Hong Kong License Title Non-intrusive intelligibility prediction for Mandarin speech in noise Author(s) Chen, F; Guan, T Citation The 213 IEEE Region 1 Conference (TENCON 213), Xi'an, China, 22-25 October 213. In Conference

More information

Software Defined Radio Forum Contribution

Software Defined Radio Forum Contribution Committee: Technical Sotware Deined Radio Forum Contribution Title: VITA-49 Drat Speciication Appendices Source Lee Pucker SDR Forum 604-828-9846 Lee.Pucker@sdrorum.org Date: 7 March 2007 Distribution:

More information

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner.

Perception of pitch. Importance of pitch: 2. mother hemp horse. scold. Definitions. Why is pitch important? AUDL4007: 11 Feb A. Faulkner. Perception of pitch AUDL4007: 11 Feb 2010. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum, 2005 Chapter 7 1 Definitions

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Speaker Isolation in a Cocktail-Party Setting

Speaker Isolation in a Cocktail-Party Setting Speaker Isolation in a Cocktail-Party Setting M.K. Alisdairi Columbia University M.S. Candidate Electrical Engineering Spring Abstract the human auditory system is capable of performing many interesting

More information

Audio Restoration Based on DSP Tools

Audio Restoration Based on DSP Tools Audio Restoration Based on DSP Tools EECS 451 Final Project Report Nan Wu School of Electrical Engineering and Computer Science University of Michigan Ann Arbor, MI, United States wunan@umich.edu Abstract

More information

MFCC-based perceptual hashing for compressed domain of speech content identification

MFCC-based perceptual hashing for compressed domain of speech content identification Available online www.jocpr.com Journal o Chemical and Pharmaceutical Research, 014, 6(7):379-386 Research Article ISSN : 0975-7384 CODEN(USA) : JCPRC5 MFCC-based perceptual hashing or compressed domain

More information

Digital Signal Processing

Digital Signal Processing COMP ENG 4TL4: Digital Signal Processing Notes for Lecture #27 Tuesday, November 11, 23 6. SPECTRAL ANALYSIS AND ESTIMATION 6.1 Introduction to Spectral Analysis and Estimation The discrete-time Fourier

More information

DARK CURRENT ELIMINATION IN CHARGED COUPLE DEVICES

DARK CURRENT ELIMINATION IN CHARGED COUPLE DEVICES DARK CURRENT ELIMINATION IN CHARGED COUPLE DEVICES L. Kňazovická, J. Švihlík Department o Computing and Control Engineering, ICT Prague Abstract Charged Couple Devices can be ound all around us. They are

More information

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands

Audio Engineering Society Convention Paper Presented at the 110th Convention 2001 May Amsterdam, The Netherlands Audio Engineering Society Convention Paper Presented at the th Convention May 5 Amsterdam, The Netherlands This convention paper has been reproduced from the author's advance manuscript, without editing,

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner.

Perception of pitch. Definitions. Why is pitch important? BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb A. Faulkner. Perception of pitch BSc Audiology/MSc SHS Psychoacoustics wk 4: 7 Feb 2008. A. Faulkner. See Moore, BCJ Introduction to the Psychology of Hearing, Chapter 5. Or Plack CJ The Sense of Hearing Lawrence Erlbaum,

More information

The Research of Electric Energy Measurement Algorithm Based on S-Transform

The Research of Electric Energy Measurement Algorithm Based on S-Transform International Conerence on Energy, Power and Electrical Engineering (EPEE 16 The Research o Electric Energy Measurement Algorithm Based on S-Transorm Xiyang Ou1,*, Bei He, Xiang Du1, Jin Zhang1, Ling Feng1,

More information

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2

MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 MMSE STSA Based Techniques for Single channel Speech Enhancement Application Simit Shah 1, Roma Patel 2 1 Electronics and Communication Department, Parul institute of engineering and technology, Vadodara,

More information

1. Motivation. 2. Periodic non-gaussian noise

1. Motivation. 2. Periodic non-gaussian noise . Motivation One o the many challenges that we ace in wireline telemetry is how to operate highspeed data transmissions over non-ideal, poorly controlled media. The key to any telemetry system design depends

More information

New metallic mesh designing with high electromagnetic shielding

New metallic mesh designing with high electromagnetic shielding MATEC Web o Conerences 189, 01003 (018) MEAMT 018 https://doi.org/10.1051/mateccon/01818901003 New metallic mesh designing with high electromagnetic shielding Longjia Qiu 1,,*, Li Li 1,, Zhieng Pan 1,,

More information

Isolated Digit Recognition Using MFCC AND DTW

Isolated Digit Recognition Using MFCC AND DTW MarutiLimkar a, RamaRao b & VidyaSagvekar c a Terna collegeof Engineering, Department of Electronics Engineering, Mumbai University, India b Vidyalankar Institute of Technology, Department ofelectronics

More information

SEG/San Antonio 2007 Annual Meeting. Summary. Morlet wavelet transform

SEG/San Antonio 2007 Annual Meeting. Summary. Morlet wavelet transform Xiaogui Miao*, CGGVeritas, Calgary, Canada, Xiao-gui_miao@cggveritas.com Dragana Todorovic-Marinic and Tyler Klatt, Encana, Calgary Canada Summary Most geologic changes have a seismic response but sometimes

More information

Overexcitation protection function block description

Overexcitation protection function block description unction block description Document ID: PRELIMIARY VERSIO ser s manual version inormation Version Date Modiication Compiled by Preliminary 24.11.2009. Preliminary version, without technical inormation Petri

More information

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008

I D I A P R E S E A R C H R E P O R T. June published in Interspeech 2008 R E S E A R C H R E P O R T I D I A P Spectral Noise Shaping: Improvements in Speech/Audio Codec Based on Linear Prediction in Spectral Domain Sriram Ganapathy a b Petr Motlicek a Hynek Hermansky a b Harinath

More information

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement

Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement Frequency Domain Analysis for Noise Suppression Using Spectral Processing Methods for Degraded Speech Signal in Speech Enhancement 1 Zeeshan Hashmi Khateeb, 2 Gopalaiah 1,2 Department of Instrumentation

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling

Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Two-channel Separation of Speech Using Direction-of-arrival Estimation And Sinusoids Plus Transients Modeling Mikko Parviainen 1 and Tuomas Virtanen 2 Institute of Signal Processing Tampere University

More information

ECE 5655/4655 Laboratory Problems

ECE 5655/4655 Laboratory Problems Assignment #4 ECE 5655/4655 Laboratory Problems Make Note o the Following: Due Monday April 15, 2019 I possible write your lab report in Jupyter notebook I you choose to use the spectrum/network analyzer

More information

Change Point Determination in Audio Data Using Auditory Features

Change Point Determination in Audio Data Using Auditory Features INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 0, VOL., NO., PP. 8 90 Manuscript received April, 0; revised June, 0. DOI: /eletel-0-00 Change Point Determination in Audio Data Using Auditory Features

More information

Different Approaches of Spectral Subtraction Method for Speech Enhancement

Different Approaches of Spectral Subtraction Method for Speech Enhancement ISSN 2249 5460 Available online at www.internationalejournals.com International ejournals International Journal of Mathematical Sciences, Technology and Humanities 95 (2013 1056 1062 Different Approaches

More information

Implementation of an Intelligent Target Classifier with Bicoherence Feature Set

Implementation of an Intelligent Target Classifier with Bicoherence Feature Set ISSN: 39-8753 International Journal o Innovative Research in Science, (An ISO 397: 007 Certiied Organization Vol. 3, Issue, November 04 Implementation o an Intelligent Target Classiier with Bicoherence

More information

EE482: Digital Signal Processing Applications

EE482: Digital Signal Processing Applications Professor Brendan Morris, SEB 3216, brendan.morris@unlv.edu EE482: Digital Signal Processing Applications Spring 2014 TTh 14:30-15:45 CBC C222 Lecture 12 Speech Signal Processing 14/03/25 http://www.ee.unlv.edu/~b1morris/ee482/

More information

Frequency Hopped Spread Spectrum

Frequency Hopped Spread Spectrum FH- 5. Frequency Hopped pread pectrum ntroduction n the next ew lessons we will be examining spread spectrum communications. This idea was originally developed or military communication systems. However,

More information

A new zoom algorithm and its use in frequency estimation

A new zoom algorithm and its use in frequency estimation Waves Wavelets Fractals Adv. Anal. 5; :7 Research Article Open Access Manuel D. Ortigueira, António S. Serralheiro, and J. A. Tenreiro Machado A new zoom algorithm and its use in requency estimation DOI.55/wwaa-5-

More information

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta

Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification. Daryush Mehta Aspiration Noise during Phonation: Synthesis, Analysis, and Pitch-Scale Modification Daryush Mehta SHBT 03 Research Advisor: Thomas F. Quatieri Speech and Hearing Biosciences and Technology 1 Summary Studied

More information

High Speed Communication Circuits and Systems Lecture 10 Mixers

High Speed Communication Circuits and Systems Lecture 10 Mixers High Speed Communication Circuits and Systems Lecture Mixers Michael H. Perrott March 5, 24 Copyright 24 by Michael H. Perrott All rights reserved. Mixer Design or Wireless Systems From Antenna and Bandpass

More information

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong,

for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong, A Comparative Study of Three Recursive Least Squares Algorithms for Single-Tone Frequency Tracking H. C. So Department of Computer Engineering & Information Technology, City University of Hong Kong, Tat

More information

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS

CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 46 CHAPTER 3 SPEECH ENHANCEMENT ALGORITHMS 3.1 INTRODUCTION Personal communication of today is impaired by nearly ubiquitous noise. Speech communication becomes difficult under these conditions; speech

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012 Signal segmentation and waveform characterization Biosignal processing, 5173S Autumn 01 Short-time analysis of signals Signal statistics may vary in time: nonstationary how to compute signal characterizations?

More information

Binaural Hearing. Reading: Yost Ch. 12

Binaural Hearing. Reading: Yost Ch. 12 Binaural Hearing Reading: Yost Ch. 12 Binaural Advantages Sounds in our environment are usually complex, and occur either simultaneously or close together in time. Studies have shown that the ability to

More information

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment International Journal of Electronics Engineering Research. ISSN 975-645 Volume 9, Number 4 (27) pp. 545-556 Research India Publications http://www.ripublication.com Study Of Sound Source Localization Using

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Wavelet Speech Enhancement based on the Teager Energy Operator

Wavelet Speech Enhancement based on the Teager Energy Operator Wavelet Speech Enhancement based on the Teager Energy Operator Mohammed Bahoura and Jean Rouat ERMETIS, DSA, Université du Québec à Chicoutimi, Chicoutimi, Québec, G7H 2B1, Canada. Abstract We propose

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

PLL AND NUMBER OF SAMPLE SYNCHRONISATION TECHNIQUES FOR ELECTRICAL POWER QUALITY MEASURMENTS

PLL AND NUMBER OF SAMPLE SYNCHRONISATION TECHNIQUES FOR ELECTRICAL POWER QUALITY MEASURMENTS XX IMEKO World Congress Metrology or Green Growth September 9 14, 2012, Busan, Republic o Korea PLL AND NUMBER OF SAMPLE SYNCHRONISATION TECHNIQUES FOR ELECTRICAL POWER QUALITY MEASURMENTS Richárd Bátori

More information

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech

Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech INTERSPEECH 5 Synchronous Overlap and Add of Spectra for Enhancement of Excitation in Artificial Bandwidth Extension of Speech M. A. Tuğtekin Turan and Engin Erzin Multimedia, Vision and Graphics Laboratory,

More information

HCS 7367 Speech Perception

HCS 7367 Speech Perception HCS 7367 Speech Perception Dr. Peter Assmann Fall 212 Power spectrum model of masking Assumptions: Only frequencies within the passband of the auditory filter contribute to masking. Detection is based

More information