FORENSIC AUTOMATION SPEAKER RECOGNITION
|
|
- Sylvia Heath
- 5 years ago
- Views:
Transcription
1 FORENSIC AUTOMATION SPEAKER RECOGNITION June 2, 2 BAE Systems Hirotaka Nakasone Federal Bureau of Investigation Quantico, VA 2235 hnakasone@fbiacademy.edu Steven D. Beck BAE SYSTEMS 65 Tracor Ln. MS 27-6 Austin, TX steve.beck@baesystems.com
2 PRESENTATION OUTLINE The Problem of Forensic Acoustical Analysis FBI Forensic Voice Database (FV) ASR Evaluation Results for FV Confidence Measures The FASR System Conclusions 2
3 Forensic ASR Problems Every month, the FBI receives numerous criminal cases involving recorded voice samples Same Speakers Different Sessions Different Text Spectrogram: File=228st.w av Most voice samples are recorded in uncontrolled environments, and there are many unknown sources of variability. Four primary sources of voice sample variations of interest to the forensic community include: - Speech source characteristics - Transmission channel characteristics - Usable speech duration - Signal-to-Noise Ratio H z n cyin e F requ H z n cyin e F requ Time in Seconds 5 Spectrogram: File=228s6t.w av Time in Seconds 3
4 FV Voice Database Data Collection & 2 The Forensic Voice Data Base was developed as part of Project CAVIS during in cooperation with LA County Sheriff s Department and NIJ/DOJ Grant 85-IJ-CX-24. B&K Model 455 Microphone In-House Telephone Body Microphone and Transmitter Receiver CAVIS Experiment Collection Collection 2 Collection 3 Number of Speakers Number of Sessions 2 Samples per Session 5 Sample Length 3 Seconds 3 Seconds 3 Seconds Speaking Mode Transmission Mode Spontaneous Reading Prescribed (3 sec) Microphone Telephone Body Transmitter Spontaneous Reading Microphone Telephone Body Transmitter Spontaneous Spontaneous 2 Telephone (Remote Call-in) Fostex Model R8 Four Channel Reel-to-Reel Recorder Remote Telephone Data Collection 3 4
5 FV Voice Database Speaking Modes Spontaneous: The speaker is shown a set of slides (one per session sample) and then begins talking about that slide. Speech segments are 29 seconds long, and the text is independent. Reading: The speaker reads a passage of text. Speech segments are 29 seconds long. The text per session sample is independent for data collection, and dependent for data collection 2. Prescribed: The speaker says, There is a bomb in the plant. Get Out! The speech segments are 2-3 seconds long. This mode is only available for data collection, and is text dependent. Spontaneous : The speaker is told to talk about a particular topic per session sample, and is available only for collection 3 - remote telephone. Spontaneous 2: The speaker is told to talk about any topic, and is available only for collection 3 - remote telephone. 5
6 FV Voice Database Transmission Modes Periodogram of Simultaneously Recorded Voice Samples Periodogram: Body Mic Channel, File=sb.wav Body Transmitter: Electret microphone plus AM transmitter. Nominal bandwidth is 3 Hz - 36 Hz. Microphone: B&K Model 455 Nominal bandwidth is 2 Hz - 8 Hz. In-House Telephone : Nominal bandwidth is 3 Hz - 36 Hz. Remote Telephone (no periodogram): Nominal bandwidth is 3 Hz - 36 Hz. Amplitude in db Amplitude in db Amplitude in db Periodogram: Microphone Channel, File=sm.wav Periodogram: Telephone Channel, File=st.wav Frequency in Hz 6
7 FV Voice Database Conditional Data Set Breakdown Speaking Mode (SM) P = Prescribed Text R = Reading S = Spontaneous Transmission Channel (TM) M = B&K Microphone B = Body Mic & Transmitter T = Telephone (in-house) Tlgd = Telephone (remote) Number of Files for each Speaking Mode and Channel Type SM / TM Total M T B Tlgd # S Files # R Files # P Files
8 FV Voice Database FV Voice Database Description Speaking Text Length Trans. Number Sessions Samples Mode Dep. (sec) Mode Speakers S TI 29 M,B,T 5 S TI 6 M,B,T 5 2 S TI 29 Tld S TI 6 Tld 5 2 R TI 29 M,B,T 5 R TI 6 M,B,T 5 2 P TD 3 M,B,T 5 P TD 2 M,B,T 5 2 8
9 FV Voice Database Histograms of SNR and Duration SNR 3-Sec. Probability SNR For All 3 Second Files Probability Signal Dur For All 3 Second Files DUR 3-Sec SNR in db SNR For All 3 Second S Files Duration in sec. Signal Dur For All 3 Second S Files 5 SNR 3-Sec. Probability.4.3. Probability.5..5 DUR 3-Sec SNR in db 2 3 Duration in sec. 9
10 FV Voice Database Histograms for Spontaneous Speaking Mode SNR DURATION Microphone Telephone Microphone Telephone SNR Distribution SM=S, TM=M (3 sec).5 SNR Distribution SM=S, TM=T (3 sec) 5 Duration SM=S, TM=M (3 sec).4 Duration SM=S, TM=T (3 sec) Probability.4.3. Probability.5..5 Probability.3. Probability SNR in db SNR Distribution SM=S, TM=B (3 sec) SNR in db SNR Distribution SM=S, TM=Tlgd (3 sec) Duration in sec. Duration SM=S, TM=B (3 sec) Duration in sec. Duration SM=S, TM=Tlgd (3 sec).4 Probability.6.4 Probability.5..5 Probability.5..5 Probability SNR in db Body Transmitter SNR in db Telephone-Remote Duration in sec. Body Transmitter Duration in sec. Telephone-Remote
11 FV Voice Database Data Formats and Filenames Data Formats Evaluation Sampling Rate = 6, samples/sec. Resolution = 6 bits Word Format = Sun <MSB,LSB> File Header = 24 byte SPHERE Filenames Evaluation The randomized filename format is: FV_xxxx.sph where FV signifies FBI Forensic Voice Dataset. xxxx is a unique four place number..sph is the file ending for SPHERE. Data Formats Analysis Sampling Rate = 8, samples/sec. Resolution = 6 bits Word Format = PC <LSB,MSB> File Header = MS WAV Filenames Analysis File Name Example = R4T.wav Speaker = Speech Mode = Reading Sample = 4 out of Transmision = Telephone
12 ASR Evaluation on FV Purpose of Blind Test and Evaluation Assess the maturity of Automatic Speaker Recognition technology for application in the field of forensic science. Time Frame for Test Participants GTE/BBN MIT Lincoln Laboratory Oregon Graduate Institute of Sciences and Technology T-Netix Wagner Associates U.S. Air Force Research Laboratory, Rome, NY U.S. Air Force Research Laboratory, Wright-Patterson, OH 2
13 ASR Evaluation on FV Multiple Levels of Difficulty The speech samples are assigned to one of four Levels of Difficulty, which represent different testing criteria. Each level is further divided into 2 separate trials, giving a total of 48 independent classifier tests. Level of Difficulty Text Dependence Channel Dependence I Independent Independent II Dependent Independent III Independent Dependent IV Dependent Dependent Level of Difficulty Test Number File Length (sec.) Speaking Mode I S I R I 7-9 4*29 S I -2 4*29 R II -3 3 P II P II 7-9 4*3 P II -2 4*3 P III S III R III 7-9 4*29 S III -2 4*29 R IV -3 3 P IV P IV 7-9 4*3 P IV -2 4*3 P 3
14 ASR Evaluation on FV Level Training Sets Example Training and Testing Sets. Levels 2-4 are similar. Level Testing Sets Trial Number CD-ROM Volume Data Directory Files per Speaker Number of Speakers Total Files FVTRN L3TRN FVTRN LTRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN2 L3TRN FVTRN2 LTRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN Trial CD-ROM Directory Number of Total File Len. (sec) Number Volume Speakers Files FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 3 FVTST,2 LTST , 2, 29, 6 4 FVTST,2 LTST , 2, 29, 6 5 FVTST,2 LTST , 2, 29, 6 6 FVTST,2 LTST , 2, 29, 6 7 FVTST,2 LTST , 2, 29, 6 8 FVTST,2 LTST , 2, 29, 6 9 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 4 File Len. (sec)
15 ASR Evaluation on FV Open Set Speaker Verification Compare a voice test segment with a single target voice model. If the resulting score exceeds a detection threshold, then declare a match. DET Curve: Plot vs. Pfa EER: Operating point where = Pfa Neyman-Pearson: Operating point minimizes for fixed Pfa DCF : Operating point based on the relative cost of making Type I and Type II Errors. C Det = C Miss * P + Miss/ T arget * PT arget CFalseAlarm* PFalseAlarmNonT / arget * P NonTarget Closed Set Speaker Identification Compare a voice test segment with a set of target voice models. Decide which target voice model best matches the test segment. Rank-: Rank-3: The correct model is the best match. The correct model is among the top 3 matches. 5
16 ASR Evaluation on FV Speaker Verification Test Results Level I - Text Independent, Transmission Independent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=LTST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=LTST2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=LTST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S T R T P T S B R B P B S M R M P M S B R B P B S M R M P M S T R T P T
17 ASR Evaluation on FV Speaker Verification Test Results Level II - Text Dependent, Transmission Independent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L2TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L2TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L2TST3 TRN SM=P TRN TM=B 4 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 2.5. SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T P B P B P M P M P B P B P M P M P T P T
18 ASR Evaluation on FV Speaker Verification Test Results Level III - Text Independent, Transmission Dependent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=L3TST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=L3TST2 2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=L3TST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=S,TM=M,Tsec=2 SM=R,TM=M,Tsec= 3 SM=R,TM=M,Tsec=2 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=S,TM=T,Tsec=2 SM=R,TM=T,Tsec= 3 SM=R,TM=T,Tsec=2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=B,Tsec= 3 SM=S,TM=B,Tsec=2 SM=R,TM=B,Tsec= 3 SM=R,TM=B,Tsec=2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S M S M R M R M P M P M S T S T R T R T P T P T S B S B R B R B P B P B
19 ASR Evaluation on FV Speaker Verification Test Results Level IV - Text Dependent, Transmission Dependent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L4TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L4TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L4TST3 TRN SM=P TRN TM=B 4 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 4 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 4 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P M P M SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P B P B
20 ASR Evaluation on FV Equal Error Rate (EER) Comparison Channel TRN/TST Level Level 2 Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Developer 3 EER % Channel TRN/TST Developer EER % Developer 2 EER % Developer 3 EER % M/M T/T.. 5. B/B Tlgd/Tlgd Channel TRN/TST Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Level 3 Level 4 Channel TRN/TST * The Developer Numbers have been randomized. Developer 4 EER % Developer EER % Developer 2 EER % M/M T/T B/B Developer 4 EER % Conclusions: Lower SNR (channel B), channel mismatch (Level ), and session variations (Tlgd) all contribute to worse detection performance. 2
21 ASR Evaluation on FV Closed Set ID Results Level Only Tests,2,3 out of 2 for one participant are shown. LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct I M / T S / S 3 / 3 84/93 = /93 = 98.4 I M / T S / S 3 / 2 37/ 4 = / 4 =. I M / T S / R 3 / 3 3/44 = 9. 4/44 = 97.2 I M / T S / R 3 / 2 27/ 29 = / 29 =. I M / T S / P 3 / 3 27/ 85 = 3.8 5/ 85 = 58.8 I M / T S / P 3 / 2 8/ 7 = 47. 2/ 7 = 7.6 I M / B S / S 3 / 3 73/94 = /94 = 92.8 I M / B S / S 3 / 2 24/ 4 = 6. 35/ 4 = 87.5 I M / B S / R 3 / 3 2/43 = /43 = 88. I M / B S / R 3 / 2 / 29 = / 29 = 79.3 I M / B S / P 3 / 3 44/ 9 = / 9 = 66.7 I M / B S / P 3 / 2 8/ 8 = / 8 = 77.8 I T / M S / S 3 / 3 66/94 = /94 = 93.8 I T / M S / S 3 / 2 3/ 4 = / 4 = 87.5 I T / M S / R 3 / 3 96/43 = 67. 4/43 = 79.7 I T / M S / R 3 / 2 4/ 29 = / 29 = 72.4 I T / M S / P 3 / 3 34/ 89 = / 89 = 49.4 I T / M S / P 3 / 2 8/ 8 = 44.4 / 8 = 55.6 I T / B S / S 3 / 3 64/98 = /98 = 88.9 I T / B S / S 3 / 2 32/ 4 = / 4 = 9 I T / B S / R 3 / 3 27/48 = /48 = 9.5 I T / B S / R 3 / 2 2/ 3 = / 3 = 9. I T / B S / P 3 / 3 28/ 95 = / 95 = 43.2 I T / B S / P 3 / 2 7/ 9 = / 9 = 47.4 I B / M S / S 3 / 3 53/94 = /94 = 89.7 I B / M S / S 3 / 2 29/ 4 = / 4 = 87.5 I B / M S / R 3 / 3 87/42 = 6.3 2/42 = 78.9 I B / M S / R 3 / 2 2/ 28 = / 28 = 78.6 I B / M S / P 3 / 3 47/ 9 = / 9 = 6. I B / M S / P 3 / 2 / 8 = 55.6 / 8 = 55.6 I B / T S / S 3 / 3 59/9 = /9 = 88.5 I B / T S / S 3 / 2 34/ 4 = / 4 = 9 I B / T S / R 3 / 3 99/49 = /49 = 8.9 I B / T S / R 3 / 2 2/ 3 = / 3 = 8. I B / T S / P 3 / 3 38/ 87 = / 87 = 59.8 I B / T S / P 3 / 2 8/ 7 = 47. / 7 =
22 ASR Evaluation on FV Closed Set Identification Results-Level 3 LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct III M / M S / S 3 / 3 94/94=. 94/94=. III M / M S / S 3 / 2 39/ 39 =. 39/ 39 =. IIIa M / M S / R 3 / 3 34/4 = 95. 4/4 = 99.3 III M / M S / R 3 / 2 26/ 27 = / 27 =. IIIa M / M S / P 3 / 3 58/ 89 = / 89 = 8.9 III M / M S / P 3 / 2 3/ 8 = / 8 = 83.3 III T / T S / S 3 / 3 96/96=. 96/96=. III T / T S / S 3 / 2 4/ 4=. 4/ 4=. IIIa T / T S / R 3 / 3 3/34 = /34=. III T / T S / R 3 / 2 24/ 26 = / 26 =. IIIa T / T S / P 3 / 3 45/ 83 = / 83 = 79.5 III T / T S / P 3 / 2 / 8 = / 8 = 77.8 IIIa B / B S / S 3 / 3 86/95 = /95 = 99. III B / B S / S 3 / 2 22/ 4 = / 4 = 73.2 IIIa B / B S / R 3 / 3 27/47 = /47 = 94.6 III B / B S / R 3 / 2 2/ 3 = 4. 5/ 3 = 5. IIIa B / B S / P 3 / 3 5/ 93 = / 93 = 63.4 III B / B S / P 3 / 2 / 9 = / 9 = 63.2 IIIa Tall / Tall S / S 3 / 3 329/425 = /425 = 88.5 III Tall / Tall S / S 3 / 2 54/ 86 = / 86 = 74.4 IIIa Tlgd / Tlgd S / S 3 / 3 34/229 = /229 = 79. III T lgd/ Tlgd S / S 3 / 2 6/ 46= / 46=
23 ASR Evaluation on FV Closed Set Identification (ID) Comparison * The Developer Numbers have been randomized. Level Rank- ID Performance Level 3 Rank- ID Performance Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/M S/S 29/ T/M S/R 29/ T/M S/P 29/ Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/T S/S 29/ T/T S/R 29/ T/T S/P 29/ Tld/Tld S/S 29/ Conclusions: Channel mismatch (Level ), signal duration mismatch (S/P), and session variations (Tlgd and S/R) all contribute to worse ID performance. Lack of channel normalization (CMS or RASTA) can result in random performance. 23
24 Confidence Measures FBI Forensic Voice Database 4 MITLL, Ver, Level/Test=L3TST SM=S TM=M Level III, SM3 Detection Error Trade-off (DET) Trades off the Miss Error Probability with the False Alarm Error Probability False/True Score PDFs Displays the PDF for ASR false model scores and true model scores for a relatively large population. The Equal Error Rate (EER) or the Decision Cost Function (DCF) operating point can be calculated and plotted. P P ( x H ) ( x H ) HT > < H ( C C ) P( H F ) ( C C ) P( H ) Threshold T = F F T ) % (in y b i lit a b o p r M i s s y b i lit a P rob False Alarm probability (in %) PDF for TRUE and FALSE Scores, Test=L3TST PDF-False Scores PDF-True Scores EER Threshold GMM LLRT Score (EER=.5464% Thresh=353) 24
25 Confidence Measures For a given GMM LRT score, find the confidence in a True decision based on a sample True/False population.6 True, False, and Test Scores y it D ens y b i lit a P rob ) x P (Ht.4.2 False Score True Score Test Score Probability Confidence Measure Confidence Curve Test Score Confidence Value GMM Output Scores False Distribution N(.,.35) True Distribution N(.,.35) Score =.7 Confidence = 84.6% P ( H x) T = P P( HT ) P( x HT ) ( H ) P( x H ) + P( H ) P( x H ) T T F F 25
26 Confidence Measures The posterior probability, p, is a curvilinear function that can be modeled with the form: Using the logistic transformation: We get the following linear form: The Logistic Model p exp = + exp p = ln p p ( β + βx ) ( β + β X ) p = β + X β In matrix form, the linear coefficients are solved using a pseudo-inverse: β = T ( X X ) X p T For a given score, first compute the linear model confidence measure: C M = β + Score*. β Then compute the natural confidence measure: CM exp( CM ) = + exp( CM ) 26
27 Confidence Measures Empirical Detection Data The Logit transformation can be used when the dependent variable is binary, e.g. True/False ( β + βx ) ( β + β X ) exp E ( Y ) = = + exp Y = [,] p Use the linear regression model: p i = + βx i β + ε i Use weighted least squares to insure a constant error variance terms for an optimal solution: β = T ( X WX ) X Wp T The diagonal weight matrix is: w i = ( ) σ 2 ε i The weights can be estimated using wˆ i = n p ( i i p i ) where and p i is the sample proportion in bin i n is the total number of test scores in bin i. i 27
28 Confidence Measures Confidence Measures For True-False Data Could Fit Least Squares Line Problems: - Data Doesn t Look Linear - Line Would Not Stay In Interval [,] Use empirical estimates as starting point for iterative procedure (Newton method or Levinberg-Marquart). 28
29 Confidence Measures Score distribution and confidence measures from NIST 999 eval Balanced mixture of electret and carbon-button telephone handsets ) % (in y t b ili a b o p r M i s s DET Curves for 3 Models, NIST99 Test Set=NIST Balanced All, SM=S, TM=T Male Balanced Female Balanced Gender Independent. Operating points and confidence measures have been derived for: gender dependent male models gender dependent female models gender independent models BKG Model EER % EER Threshold Male Bal Female Bal Gender Indep Bal False Alarm probability (in %) D F P e V al u e n c onfid e C Example for Male,Balanced, NIST99 PDF for TRUE and FALSE Scores, Test=Male-bal:mal-nist-all-t.out PDF of False PDF of True Est. Confidence Measure for Male-bal:mal-nist-all-t.out, Equal Priors GMM Log Likelihood Ratio Score
30 Confidence Measures Multivariate Logistic Model Logistic Model can be extended to use more than one independent variable The multiple regression model: p = β + β x + β x
31 FASR Description The Forensic Automatic Speaker Recognition (FASR) Program Developed by U.S. Air Force Research Laboratories, Rome NY With inputs from MITLL, FBI ERF, and BAE SYSTEMS FASR is a PC-based stand-alone workstation with an efficient GUI supporting: Data acquisition and playback Signal and spectrographic display Speech segmentation and labeling Tone detection and removal Speech quality measures (SNR, duration, bandwidth) Speaker Identification and Verification FASR uses robust speaker recognition algorithms: Mel cepstral coefficients,, Cepstral mean subtraction or RASTA filtering Gaussian mixture models with Universal Background Models 3
32 Conclusions The FBI is using a PC-based forensic automatic speaker recognition (FASR) system. Project Conclusions FASR has been extensively tested on NIST single speaker and FV speech corpuses. The outputs are based on statistics with known error rates from large sample populations. Improve on existing channel normalization techniques. Future Directions Integrate automatic or manual pre-screening based upon quantifiable signal quality measures. Provide for a no decision rule when signal quality does not meet predefined conditions. Address the issue of using different background models for detected differences in the voice samples. 32
Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches
Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art
More informationDetecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems
Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Jesús Villalba and Eduardo Lleida Communications Technology Group (GTC), Aragon Institute for Engineering Research (I3A),
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and
More informationCan binary masks improve intelligibility?
Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +
More informationApplications of Music Processing
Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite
More informationSONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS
SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R
More informationAuditory Based Feature Vectors for Speech Recognition Systems
Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines
More informationMikko Myllymäki and Tuomas Virtanen
NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,
More informationPrinceton ELE 201, Spring 2014 Laboratory No. 2 Shazam
Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam 1 Background In this lab we will begin to code a Shazam-like program to identify a short clip of music using a database of songs. The basic procedure
More informationTSA 6000 System Features Summary
2006-03-01 1. TSA 6000 Introduction... 2 1.1 TSA 6000 Overview... 2 1.2 TSA 6000 Base System... 2 1.3 TSA 6000 Software Options... 2 1.4 TSA 6000 Hardware Options... 2 2. TSA 6000 Hardware... 3 2.1 Signal
More informationAutomatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs
Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems
More informationSingle Channel Speaker Segregation using Sinusoidal Residual Modeling
NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology
More informationVQ Source Models: Perceptual & Phase Issues
VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu
More informationSOUND SOURCE RECOGNITION AND MODELING
SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental
More informationEvaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation
Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate
More informationVoice Activity Detection
Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class
More informationSPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes
SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,
More informationNCCF ACF. cepstrum coef. error signal > samples
ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based
More informationA Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation
A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation SEPTIMIU MISCHIE Faculty of Electronics and Telecommunications Politehnica University of Timisoara Vasile
More informationOn The Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems
On The Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems J.K. Schneider, C. E. Richardson, F.W. Kiefer, and Venu Govindaraju Ultra-Scan Corporation, 4240 Ridge
More informationDistinguishing Identical Twins by Face Recognition
Distinguishing Identical Twins by Face Recognition P. Jonathon Phillips, Patrick J. Flynn, Kevin W. Bowyer, Richard W. Vorder Bruegge, Patrick J. Grother, George W. Quinn, and Matthew Pruitt Abstract The
More information24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE
24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai
More informationIsolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques
Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT
More informationRECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz
Rec. ITU-R F.240-7 1 RECOMMENDATION ITU-R F.240-7 *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz (Question ITU-R 143/9) (1953-1956-1959-1970-1974-1978-1986-1990-1992-2006)
More informationDetection of Targets in Noise and Pulse Compression Techniques
Introduction to Radar Systems Detection of Targets in Noise and Pulse Compression Techniques Radar Course_1.ppt ODonnell 6-18-2 Disclaimer of Endorsement and Liability The video courseware and accompanying
More informationRec. ITU-R F RECOMMENDATION ITU-R F *,**
Rec. ITU-R F.240-6 1 RECOMMENDATION ITU-R F.240-6 *,** SIGNAL-TO-INTERFERENCE PROTECTION RATIOS FOR VARIOUS CLASSES OF EMISSION IN THE FIXED SERVICE BELOW ABOUT 30 MHz (Question 143/9) Rec. ITU-R F.240-6
More informationTHE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE
THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE A. Martin*, G. Doddington#, T. Kamm+, M. Ordowski+, M. Przybocki* *National Institute of Standards and Technology, Bldg. 225-Rm. A216, Gaithersburg,
More informationELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises
ELT-44006 Receiver Architectures and Signal Processing Fall 2014 1 Mandatory homework exercises - Individual solutions to be returned to Markku Renfors by email or in paper format. - Solutions are expected
More informationRobust Speaker Recognition using Microphone Arrays
ISCA Archive Robust Speaker Recognition using Microphone Arrays Iain A. McCowan Jason Pelecanos Sridha Sridharan Speech Research Laboratory, RCSAVT, School of EESE Queensland University of Technology GPO
More informationAUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES
AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES N. Sunil 1, K. Sahithya Reddy 2, U.N.D.L.mounika 3 1 ECE, Gurunanak Institute of Technology, (India) 2 ECE,
More information(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods
More informationSpeakerID - Voice Activity Detection
SpeakerID - Voice Activity Detection Victor Lenoir Technical Report n o 1112, June 2011 revision 2288 Voice Activity Detection has many applications. It s for example a mandatory front-end process in speech
More informationDynamic thresholding for automated analysis of bobbin probe eddy current data
International Journal of Applied Electromagnetics and Mechanics 15 (2001/2002) 39 46 39 IOS Press Dynamic thresholding for automated analysis of bobbin probe eddy current data H. Shekhar, R. Polikar, P.
More informationClassification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise
Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to
More informationSpeech Synthesis using Mel-Cepstral Coefficient Feature
Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract
More informationCooperative Networked Radar: The Two-Step Detector
Cooperative Networked Radar: The Two-Step Detector Max Scharrenbroich*, Michael Zatman*, and Radu Balan** * QinetiQ North America, ** University of Maryland, College Park Asilomar Conference on Signals,
More informationA Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification
A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department
More informationThe fundamentals of detection theory
Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection
More informationHigh-speed Noise Cancellation with Microphone Array
Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent
More informationCO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM
CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,
More informationUNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik
UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,
More informationPDF hosted at the Radboud Repository of the Radboud University Nijmegen
PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this
More informationPulse Code Modulation
Pulse Code Modulation EE 44 Spring Semester Lecture 9 Analog signal Pulse Amplitude Modulation Pulse Width Modulation Pulse Position Modulation Pulse Code Modulation (3-bit coding) 1 Advantages of Digital
More informationSpeaker verification in a time-feature space
Oregon Health & Science University OHSU Digital Commons Scholar Archive 3-1-1999 Speaker verification in a time-feature space Sarel Van Vuuren Follow this and additional works at: http://digitalcommons.ohsu.edu/etd
More informationIndividuality of Fingerprints
Individuality of Fingerprints Sargur N. Srihari Department of Computer Science and Engineering University at Buffalo, State University of New York srihari@cedar.buffalo.edu IAI Conference, San Diego, CA
More informationROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE
- @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu
More informationSinging Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection
Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation
More informationNIST SRE 2008 IIR and I4U Submissions. Presented by Haizhou LI, Bin MA and Kong Aik LEE NIST SRE08 Workshop, Montreal, Jun 17-18, 2008
NIST SRE 2008 IIR and I4U Submissions Presented by Haizhou LI, Bin MA and Kong Aik LEE NIST SRE08 Workshop, Montreal, Jun 17-18, 2008 Agenda IIR and I4U System Overview Subsystems & Features Fusion Strategies
More informationJoint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events
INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory
More informationImproving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research
Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using
More informationRASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991
RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response
More informationEfficiency and detectability of random reactive jamming in wireless networks
Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering
More informationAutomotive three-microphone voice activity detector and noise-canceller
Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR
More informationComparison of Spectral Analysis Methods for Automatic Speech Recognition
INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering
More informationRelative phase information for detecting human speech and spoofed speech
Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University
More informationSpeech/Music Discrimination via Energy Density Analysis
Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,
More informationSupplementary Materials for
advances.sciencemag.org/cgi/content/full/1/11/e1501057/dc1 Supplementary Materials for Earthquake detection through computationally efficient similarity search The PDF file includes: Clara E. Yoon, Ossian
More informationSpeech and Music Discrimination based on Signal Modulation Spectrum.
Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we
More information4.5.1 Mirroring Gain/Offset Registers GPIO CMV Snapshot Control... 14
Thank you for choosing the MityCAM-C8000 from Critical Link. The MityCAM-C8000 MityViewer Quick Start Guide will guide you through the software installation process and the steps to acquire your first
More informationMultiple Sound Sources Localization Using Energetic Analysis Method
VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova
More informationDetection of Compound Structures in Very High Spatial Resolution Images
Detection of Compound Structures in Very High Spatial Resolution Images Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara, Turkey saksoy@cs.bilkent.edu.tr Joint work
More informationForced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection
Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection John Pierre University of Wyoming pierre@uwyo.edu IEEE PES General Meeting July 17-21, 2016 Boston Outline Fundamental
More informationA JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS
A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS Evren Terzi, Hasan B. Celebi, and Huseyin Arslan Department of Electrical Engineering, University of South Florida
More informationCampus Location Recognition using Audio Signals
1 Campus Location Recognition using Audio Signals James Sun,Reid Westwood SUNetID:jsun2015,rwestwoo Email: jsun2015@stanford.edu, rwestwoo@stanford.edu I. INTRODUCTION People use sound both consciously
More informationSignal Processing First Lab 20: Extracting Frequencies of Musical Tones
Signal Processing First Lab 20: Extracting Frequencies of Musical Tones Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in
More informationDimension Reduction of the Modulation Spectrogram for Speaker Verification
Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi
More informationIMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS
1 International Conference on Cyberworlds IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS Di Liu, Andy W. H. Khong School of Electrical
More informationUsing the Time Dimension to Sense Signals with Partial Spectral Overlap. Mihir Laghate and Danijela Cabric 5 th December 2016
Using the Time Dimension to Sense Signals with Partial Spectral Overlap Mihir Laghate and Danijela Cabric 5 th December 2016 Outline Goal, Motivation, and Existing Work System Model Assumptions Time-Frequency
More informationCombining Voice Activity Detection Algorithms by Decision Fusion
Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland
More informationLong Range Acoustic Classification
Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire
More informationVoiced/nonvoiced detection based on robustness of voiced epochs
Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies
More informationSymbol Timing Recovery for Low-SNR Partial Response Recording Channels
Symbol Timing Recovery for Low-SNR Partial Response Recording Channels Jingfeng Liu, Hongwei Song and B. V. K. Vijaya Kumar Data Storage Systems Center Carnegie Mellon University 5 Forbes Ave Pittsburgh,
More informationSPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim
SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology
More informationNon-coherent pulse compression - concept and waveforms Nadav Levanon and Uri Peer Tel Aviv University
Non-coherent pulse compression - concept and waveforms Nadav Levanon and Uri Peer Tel Aviv University nadav@eng.tau.ac.il Abstract - Non-coherent pulse compression (NCPC) was suggested recently []. It
More informationStatistical Signal Processing. Project: PC-Based Acoustic Radar
Statistical Signal Processing Project: PC-Based Acoustic Radar Mats Viberg Revised February, 2002 Abstract The purpose of this project is to demonstrate some fundamental issues in detection and estimation.
More informationSpeech Enhancement using Wiener filtering
Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing
More informationEnhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis
Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins
More informationLow Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers
Low Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers Architecture I: standalone µc Microphone Microcontroller User Output Microcontroller used to implement
More informationRobust Low-Resource Sound Localization in Correlated Noise
INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem
More informationUniversity of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICCE.2012.
Zhu, X., Doufexi, A., & Koçak, T. (2012). A performance enhancement for 60 GHz wireless indoor applications. In ICCE 2012, Las Vegas Institute of Electrical and Electronics Engineers (IEEE). DOI: 10.1109/ICCE.2012.6161865
More informationA Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion
American Journal of Applied Sciences 5 (4): 30-37, 008 ISSN 1546-939 008 Science Publications A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion Zayed M. Ramadan
More informationDigital Speech Processing and Coding
ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/
More informationNonuniform multi level crossing for signal reconstruction
6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven
More informationSIGNAL DETECTION IN NON-GAUSSIAN NOISE BY A KURTOSIS-BASED PROBABILITY DENSITY FUNCTION MODEL
SIGNAL DETECTION IN NON-GAUSSIAN NOISE BY A KURTOSIS-BASED PROBABILITY DENSITY FUNCTION MODEL A. Tesei, and C.S. Regazzoni Department of Biophysical and Electronic Engineering (DIBE), University of Genoa
More informationFeature Extraction Using 2-D Autoregressive Models For Speaker Recognition
Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology
More informationQuantitative Assessment of the Individuality of Friction Ridge Patterns
Quantitative Assessment of the Individuality of Friction Ridge Patterns Sargur N. Srihari with H. Srinivasan, G. Fang, P. Phatak, V. Krishnaswamy Department of Computer Science and Engineering University
More informationHomework Assignment 13
Question 1 Short Takes 2 points each. Homework Assignment 13 1. Classify the type of feedback uses in the circuit below (i.e., shunt-shunt, series-shunt, ) 2. True or false: an engineer uses series-shunt
More informationA Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image
Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)
More informationPerceptive Speech Filters for Speech Signal Noise Reduction
International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science
More informationSIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS
SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,
More informationJerry Reiter Department of Statistical Science Information Initiative at Duke Duke University
Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University jreiter@duke.edu 1 Acknowledgements Research supported by National Science Foundation ACI 14-43014, SES-11-31897,
More informationAugmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data
INTERSPEECH 2013 Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data Cong-Thanh Do 1, Claude Barras 1, Viet-Bac Le 2, Achintya K. Sarkar
More informationOnline Signature Verification by Using FPGA
Online Signature Verification by Using FPGA D.Sandeep Assistant Professor, Department of ECE, Vignan Institute of Technology & Science, Telangana, India. ABSTRACT: The main aim of this project is used
More informationComputational Complexity of Multiuser. Receivers in DS-CDMA Systems. Syed Rizvi. Department of Electrical & Computer Engineering
Computational Complexity of Multiuser Receivers in DS-CDMA Systems Digital Signal Processing (DSP)-I Fall 2004 By Syed Rizvi Department of Electrical & Computer Engineering Old Dominion University Outline
More informationTime-of-arrival estimation for blind beamforming
Time-of-arrival estimation for blind beamforming Pasi Pertilä, pasi.pertila (at) tut.fi www.cs.tut.fi/~pertila/ Aki Tinakari, aki.tinakari (at) tut.fi Tampere University of Technology Tampere, Finland
More informationStudents: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa
Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions
More informationRecent Advances in Acoustic Signal Extraction and Dereverberation
Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing
More informationExperiments with An Improved Iris Segmentation Algorithm
Experiments with An Improved Iris Segmentation Algorithm Xiaomei Liu, Kevin W. Bowyer, Patrick J. Flynn Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, U.S.A.
More informationSpectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition
Circuits, Systems, and Signal Processing manuscript No. (will be inserted by the editor) Spectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition
More informationShort Paper: The Softwater Modem A Software Modem for Underwater Acoustic Communication
Short Paper: The Softwater Modem A Software Modem for Underwater Acoustic Communication Brian Borowski and Dan Duchamp Department of Computer Science Stevens Institute of Technology Castle Point on Hudson,
More information