FORENSIC AUTOMATION SPEAKER RECOGNITION

Size: px
Start display at page:

Download "FORENSIC AUTOMATION SPEAKER RECOGNITION"

Transcription

1 FORENSIC AUTOMATION SPEAKER RECOGNITION June 2, 2 BAE Systems Hirotaka Nakasone Federal Bureau of Investigation Quantico, VA 2235 hnakasone@fbiacademy.edu Steven D. Beck BAE SYSTEMS 65 Tracor Ln. MS 27-6 Austin, TX steve.beck@baesystems.com

2 PRESENTATION OUTLINE The Problem of Forensic Acoustical Analysis FBI Forensic Voice Database (FV) ASR Evaluation Results for FV Confidence Measures The FASR System Conclusions 2

3 Forensic ASR Problems Every month, the FBI receives numerous criminal cases involving recorded voice samples Same Speakers Different Sessions Different Text Spectrogram: File=228st.w av Most voice samples are recorded in uncontrolled environments, and there are many unknown sources of variability. Four primary sources of voice sample variations of interest to the forensic community include: - Speech source characteristics - Transmission channel characteristics - Usable speech duration - Signal-to-Noise Ratio H z n cyin e F requ H z n cyin e F requ Time in Seconds 5 Spectrogram: File=228s6t.w av Time in Seconds 3

4 FV Voice Database Data Collection & 2 The Forensic Voice Data Base was developed as part of Project CAVIS during in cooperation with LA County Sheriff s Department and NIJ/DOJ Grant 85-IJ-CX-24. B&K Model 455 Microphone In-House Telephone Body Microphone and Transmitter Receiver CAVIS Experiment Collection Collection 2 Collection 3 Number of Speakers Number of Sessions 2 Samples per Session 5 Sample Length 3 Seconds 3 Seconds 3 Seconds Speaking Mode Transmission Mode Spontaneous Reading Prescribed (3 sec) Microphone Telephone Body Transmitter Spontaneous Reading Microphone Telephone Body Transmitter Spontaneous Spontaneous 2 Telephone (Remote Call-in) Fostex Model R8 Four Channel Reel-to-Reel Recorder Remote Telephone Data Collection 3 4

5 FV Voice Database Speaking Modes Spontaneous: The speaker is shown a set of slides (one per session sample) and then begins talking about that slide. Speech segments are 29 seconds long, and the text is independent. Reading: The speaker reads a passage of text. Speech segments are 29 seconds long. The text per session sample is independent for data collection, and dependent for data collection 2. Prescribed: The speaker says, There is a bomb in the plant. Get Out! The speech segments are 2-3 seconds long. This mode is only available for data collection, and is text dependent. Spontaneous : The speaker is told to talk about a particular topic per session sample, and is available only for collection 3 - remote telephone. Spontaneous 2: The speaker is told to talk about any topic, and is available only for collection 3 - remote telephone. 5

6 FV Voice Database Transmission Modes Periodogram of Simultaneously Recorded Voice Samples Periodogram: Body Mic Channel, File=sb.wav Body Transmitter: Electret microphone plus AM transmitter. Nominal bandwidth is 3 Hz - 36 Hz. Microphone: B&K Model 455 Nominal bandwidth is 2 Hz - 8 Hz. In-House Telephone : Nominal bandwidth is 3 Hz - 36 Hz. Remote Telephone (no periodogram): Nominal bandwidth is 3 Hz - 36 Hz. Amplitude in db Amplitude in db Amplitude in db Periodogram: Microphone Channel, File=sm.wav Periodogram: Telephone Channel, File=st.wav Frequency in Hz 6

7 FV Voice Database Conditional Data Set Breakdown Speaking Mode (SM) P = Prescribed Text R = Reading S = Spontaneous Transmission Channel (TM) M = B&K Microphone B = Body Mic & Transmitter T = Telephone (in-house) Tlgd = Telephone (remote) Number of Files for each Speaking Mode and Channel Type SM / TM Total M T B Tlgd # S Files # R Files # P Files

8 FV Voice Database FV Voice Database Description Speaking Text Length Trans. Number Sessions Samples Mode Dep. (sec) Mode Speakers S TI 29 M,B,T 5 S TI 6 M,B,T 5 2 S TI 29 Tld S TI 6 Tld 5 2 R TI 29 M,B,T 5 R TI 6 M,B,T 5 2 P TD 3 M,B,T 5 P TD 2 M,B,T 5 2 8

9 FV Voice Database Histograms of SNR and Duration SNR 3-Sec. Probability SNR For All 3 Second Files Probability Signal Dur For All 3 Second Files DUR 3-Sec SNR in db SNR For All 3 Second S Files Duration in sec. Signal Dur For All 3 Second S Files 5 SNR 3-Sec. Probability.4.3. Probability.5..5 DUR 3-Sec SNR in db 2 3 Duration in sec. 9

10 FV Voice Database Histograms for Spontaneous Speaking Mode SNR DURATION Microphone Telephone Microphone Telephone SNR Distribution SM=S, TM=M (3 sec).5 SNR Distribution SM=S, TM=T (3 sec) 5 Duration SM=S, TM=M (3 sec).4 Duration SM=S, TM=T (3 sec) Probability.4.3. Probability.5..5 Probability.3. Probability SNR in db SNR Distribution SM=S, TM=B (3 sec) SNR in db SNR Distribution SM=S, TM=Tlgd (3 sec) Duration in sec. Duration SM=S, TM=B (3 sec) Duration in sec. Duration SM=S, TM=Tlgd (3 sec).4 Probability.6.4 Probability.5..5 Probability.5..5 Probability SNR in db Body Transmitter SNR in db Telephone-Remote Duration in sec. Body Transmitter Duration in sec. Telephone-Remote

11 FV Voice Database Data Formats and Filenames Data Formats Evaluation Sampling Rate = 6, samples/sec. Resolution = 6 bits Word Format = Sun <MSB,LSB> File Header = 24 byte SPHERE Filenames Evaluation The randomized filename format is: FV_xxxx.sph where FV signifies FBI Forensic Voice Dataset. xxxx is a unique four place number..sph is the file ending for SPHERE. Data Formats Analysis Sampling Rate = 8, samples/sec. Resolution = 6 bits Word Format = PC <LSB,MSB> File Header = MS WAV Filenames Analysis File Name Example = R4T.wav Speaker = Speech Mode = Reading Sample = 4 out of Transmision = Telephone

12 ASR Evaluation on FV Purpose of Blind Test and Evaluation Assess the maturity of Automatic Speaker Recognition technology for application in the field of forensic science. Time Frame for Test Participants GTE/BBN MIT Lincoln Laboratory Oregon Graduate Institute of Sciences and Technology T-Netix Wagner Associates U.S. Air Force Research Laboratory, Rome, NY U.S. Air Force Research Laboratory, Wright-Patterson, OH 2

13 ASR Evaluation on FV Multiple Levels of Difficulty The speech samples are assigned to one of four Levels of Difficulty, which represent different testing criteria. Each level is further divided into 2 separate trials, giving a total of 48 independent classifier tests. Level of Difficulty Text Dependence Channel Dependence I Independent Independent II Dependent Independent III Independent Dependent IV Dependent Dependent Level of Difficulty Test Number File Length (sec.) Speaking Mode I S I R I 7-9 4*29 S I -2 4*29 R II -3 3 P II P II 7-9 4*3 P II -2 4*3 P III S III R III 7-9 4*29 S III -2 4*29 R IV -3 3 P IV P IV 7-9 4*3 P IV -2 4*3 P 3

14 ASR Evaluation on FV Level Training Sets Example Training and Testing Sets. Levels 2-4 are similar. Level Testing Sets Trial Number CD-ROM Volume Data Directory Files per Speaker Number of Speakers Total Files FVTRN L3TRN FVTRN LTRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN2 L3TRN FVTRN2 LTRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN Trial CD-ROM Directory Number of Total File Len. (sec) Number Volume Speakers Files FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 3 FVTST,2 LTST , 2, 29, 6 4 FVTST,2 LTST , 2, 29, 6 5 FVTST,2 LTST , 2, 29, 6 6 FVTST,2 LTST , 2, 29, 6 7 FVTST,2 LTST , 2, 29, 6 8 FVTST,2 LTST , 2, 29, 6 9 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 4 File Len. (sec)

15 ASR Evaluation on FV Open Set Speaker Verification Compare a voice test segment with a single target voice model. If the resulting score exceeds a detection threshold, then declare a match. DET Curve: Plot vs. Pfa EER: Operating point where = Pfa Neyman-Pearson: Operating point minimizes for fixed Pfa DCF : Operating point based on the relative cost of making Type I and Type II Errors. C Det = C Miss * P + Miss/ T arget * PT arget CFalseAlarm* PFalseAlarmNonT / arget * P NonTarget Closed Set Speaker Identification Compare a voice test segment with a set of target voice models. Decide which target voice model best matches the test segment. Rank-: Rank-3: The correct model is the best match. The correct model is among the top 3 matches. 5

16 ASR Evaluation on FV Speaker Verification Test Results Level I - Text Independent, Transmission Independent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=LTST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=LTST2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=LTST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S T R T P T S B R B P B S M R M P M S B R B P B S M R M P M S T R T P T

17 ASR Evaluation on FV Speaker Verification Test Results Level II - Text Dependent, Transmission Independent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L2TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L2TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L2TST3 TRN SM=P TRN TM=B 4 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 2.5. SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T P B P B P M P M P B P B P M P M P T P T

18 ASR Evaluation on FV Speaker Verification Test Results Level III - Text Independent, Transmission Dependent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=L3TST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=L3TST2 2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=L3TST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=S,TM=M,Tsec=2 SM=R,TM=M,Tsec= 3 SM=R,TM=M,Tsec=2 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=S,TM=T,Tsec=2 SM=R,TM=T,Tsec= 3 SM=R,TM=T,Tsec=2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=B,Tsec= 3 SM=S,TM=B,Tsec=2 SM=R,TM=B,Tsec= 3 SM=R,TM=B,Tsec=2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S M S M R M R M P M P M S T S T R T R T P T P T S B S B R B R B P B P B

19 ASR Evaluation on FV Speaker Verification Test Results Level IV - Text Dependent, Transmission Dependent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L4TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L4TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L4TST3 TRN SM=P TRN TM=B 4 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 4 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 4 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P M P M SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P B P B

20 ASR Evaluation on FV Equal Error Rate (EER) Comparison Channel TRN/TST Level Level 2 Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Developer 3 EER % Channel TRN/TST Developer EER % Developer 2 EER % Developer 3 EER % M/M T/T.. 5. B/B Tlgd/Tlgd Channel TRN/TST Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Level 3 Level 4 Channel TRN/TST * The Developer Numbers have been randomized. Developer 4 EER % Developer EER % Developer 2 EER % M/M T/T B/B Developer 4 EER % Conclusions: Lower SNR (channel B), channel mismatch (Level ), and session variations (Tlgd) all contribute to worse detection performance. 2

21 ASR Evaluation on FV Closed Set ID Results Level Only Tests,2,3 out of 2 for one participant are shown. LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct I M / T S / S 3 / 3 84/93 = /93 = 98.4 I M / T S / S 3 / 2 37/ 4 = / 4 =. I M / T S / R 3 / 3 3/44 = 9. 4/44 = 97.2 I M / T S / R 3 / 2 27/ 29 = / 29 =. I M / T S / P 3 / 3 27/ 85 = 3.8 5/ 85 = 58.8 I M / T S / P 3 / 2 8/ 7 = 47. 2/ 7 = 7.6 I M / B S / S 3 / 3 73/94 = /94 = 92.8 I M / B S / S 3 / 2 24/ 4 = 6. 35/ 4 = 87.5 I M / B S / R 3 / 3 2/43 = /43 = 88. I M / B S / R 3 / 2 / 29 = / 29 = 79.3 I M / B S / P 3 / 3 44/ 9 = / 9 = 66.7 I M / B S / P 3 / 2 8/ 8 = / 8 = 77.8 I T / M S / S 3 / 3 66/94 = /94 = 93.8 I T / M S / S 3 / 2 3/ 4 = / 4 = 87.5 I T / M S / R 3 / 3 96/43 = 67. 4/43 = 79.7 I T / M S / R 3 / 2 4/ 29 = / 29 = 72.4 I T / M S / P 3 / 3 34/ 89 = / 89 = 49.4 I T / M S / P 3 / 2 8/ 8 = 44.4 / 8 = 55.6 I T / B S / S 3 / 3 64/98 = /98 = 88.9 I T / B S / S 3 / 2 32/ 4 = / 4 = 9 I T / B S / R 3 / 3 27/48 = /48 = 9.5 I T / B S / R 3 / 2 2/ 3 = / 3 = 9. I T / B S / P 3 / 3 28/ 95 = / 95 = 43.2 I T / B S / P 3 / 2 7/ 9 = / 9 = 47.4 I B / M S / S 3 / 3 53/94 = /94 = 89.7 I B / M S / S 3 / 2 29/ 4 = / 4 = 87.5 I B / M S / R 3 / 3 87/42 = 6.3 2/42 = 78.9 I B / M S / R 3 / 2 2/ 28 = / 28 = 78.6 I B / M S / P 3 / 3 47/ 9 = / 9 = 6. I B / M S / P 3 / 2 / 8 = 55.6 / 8 = 55.6 I B / T S / S 3 / 3 59/9 = /9 = 88.5 I B / T S / S 3 / 2 34/ 4 = / 4 = 9 I B / T S / R 3 / 3 99/49 = /49 = 8.9 I B / T S / R 3 / 2 2/ 3 = / 3 = 8. I B / T S / P 3 / 3 38/ 87 = / 87 = 59.8 I B / T S / P 3 / 2 8/ 7 = 47. / 7 =

22 ASR Evaluation on FV Closed Set Identification Results-Level 3 LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct III M / M S / S 3 / 3 94/94=. 94/94=. III M / M S / S 3 / 2 39/ 39 =. 39/ 39 =. IIIa M / M S / R 3 / 3 34/4 = 95. 4/4 = 99.3 III M / M S / R 3 / 2 26/ 27 = / 27 =. IIIa M / M S / P 3 / 3 58/ 89 = / 89 = 8.9 III M / M S / P 3 / 2 3/ 8 = / 8 = 83.3 III T / T S / S 3 / 3 96/96=. 96/96=. III T / T S / S 3 / 2 4/ 4=. 4/ 4=. IIIa T / T S / R 3 / 3 3/34 = /34=. III T / T S / R 3 / 2 24/ 26 = / 26 =. IIIa T / T S / P 3 / 3 45/ 83 = / 83 = 79.5 III T / T S / P 3 / 2 / 8 = / 8 = 77.8 IIIa B / B S / S 3 / 3 86/95 = /95 = 99. III B / B S / S 3 / 2 22/ 4 = / 4 = 73.2 IIIa B / B S / R 3 / 3 27/47 = /47 = 94.6 III B / B S / R 3 / 2 2/ 3 = 4. 5/ 3 = 5. IIIa B / B S / P 3 / 3 5/ 93 = / 93 = 63.4 III B / B S / P 3 / 2 / 9 = / 9 = 63.2 IIIa Tall / Tall S / S 3 / 3 329/425 = /425 = 88.5 III Tall / Tall S / S 3 / 2 54/ 86 = / 86 = 74.4 IIIa Tlgd / Tlgd S / S 3 / 3 34/229 = /229 = 79. III T lgd/ Tlgd S / S 3 / 2 6/ 46= / 46=

23 ASR Evaluation on FV Closed Set Identification (ID) Comparison * The Developer Numbers have been randomized. Level Rank- ID Performance Level 3 Rank- ID Performance Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/M S/S 29/ T/M S/R 29/ T/M S/P 29/ Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/T S/S 29/ T/T S/R 29/ T/T S/P 29/ Tld/Tld S/S 29/ Conclusions: Channel mismatch (Level ), signal duration mismatch (S/P), and session variations (Tlgd and S/R) all contribute to worse ID performance. Lack of channel normalization (CMS or RASTA) can result in random performance. 23

24 Confidence Measures FBI Forensic Voice Database 4 MITLL, Ver, Level/Test=L3TST SM=S TM=M Level III, SM3 Detection Error Trade-off (DET) Trades off the Miss Error Probability with the False Alarm Error Probability False/True Score PDFs Displays the PDF for ASR false model scores and true model scores for a relatively large population. The Equal Error Rate (EER) or the Decision Cost Function (DCF) operating point can be calculated and plotted. P P ( x H ) ( x H ) HT > < H ( C C ) P( H F ) ( C C ) P( H ) Threshold T = F F T ) % (in y b i lit a b o p r M i s s y b i lit a P rob False Alarm probability (in %) PDF for TRUE and FALSE Scores, Test=L3TST PDF-False Scores PDF-True Scores EER Threshold GMM LLRT Score (EER=.5464% Thresh=353) 24

25 Confidence Measures For a given GMM LRT score, find the confidence in a True decision based on a sample True/False population.6 True, False, and Test Scores y it D ens y b i lit a P rob ) x P (Ht.4.2 False Score True Score Test Score Probability Confidence Measure Confidence Curve Test Score Confidence Value GMM Output Scores False Distribution N(.,.35) True Distribution N(.,.35) Score =.7 Confidence = 84.6% P ( H x) T = P P( HT ) P( x HT ) ( H ) P( x H ) + P( H ) P( x H ) T T F F 25

26 Confidence Measures The posterior probability, p, is a curvilinear function that can be modeled with the form: Using the logistic transformation: We get the following linear form: The Logistic Model p exp = + exp p = ln p p ( β + βx ) ( β + β X ) p = β + X β In matrix form, the linear coefficients are solved using a pseudo-inverse: β = T ( X X ) X p T For a given score, first compute the linear model confidence measure: C M = β + Score*. β Then compute the natural confidence measure: CM exp( CM ) = + exp( CM ) 26

27 Confidence Measures Empirical Detection Data The Logit transformation can be used when the dependent variable is binary, e.g. True/False ( β + βx ) ( β + β X ) exp E ( Y ) = = + exp Y = [,] p Use the linear regression model: p i = + βx i β + ε i Use weighted least squares to insure a constant error variance terms for an optimal solution: β = T ( X WX ) X Wp T The diagonal weight matrix is: w i = ( ) σ 2 ε i The weights can be estimated using wˆ i = n p ( i i p i ) where and p i is the sample proportion in bin i n is the total number of test scores in bin i. i 27

28 Confidence Measures Confidence Measures For True-False Data Could Fit Least Squares Line Problems: - Data Doesn t Look Linear - Line Would Not Stay In Interval [,] Use empirical estimates as starting point for iterative procedure (Newton method or Levinberg-Marquart). 28

29 Confidence Measures Score distribution and confidence measures from NIST 999 eval Balanced mixture of electret and carbon-button telephone handsets ) % (in y t b ili a b o p r M i s s DET Curves for 3 Models, NIST99 Test Set=NIST Balanced All, SM=S, TM=T Male Balanced Female Balanced Gender Independent. Operating points and confidence measures have been derived for: gender dependent male models gender dependent female models gender independent models BKG Model EER % EER Threshold Male Bal Female Bal Gender Indep Bal False Alarm probability (in %) D F P e V al u e n c onfid e C Example for Male,Balanced, NIST99 PDF for TRUE and FALSE Scores, Test=Male-bal:mal-nist-all-t.out PDF of False PDF of True Est. Confidence Measure for Male-bal:mal-nist-all-t.out, Equal Priors GMM Log Likelihood Ratio Score

30 Confidence Measures Multivariate Logistic Model Logistic Model can be extended to use more than one independent variable The multiple regression model: p = β + β x + β x

31 FASR Description The Forensic Automatic Speaker Recognition (FASR) Program Developed by U.S. Air Force Research Laboratories, Rome NY With inputs from MITLL, FBI ERF, and BAE SYSTEMS FASR is a PC-based stand-alone workstation with an efficient GUI supporting: Data acquisition and playback Signal and spectrographic display Speech segmentation and labeling Tone detection and removal Speech quality measures (SNR, duration, bandwidth) Speaker Identification and Verification FASR uses robust speaker recognition algorithms: Mel cepstral coefficients,, Cepstral mean subtraction or RASTA filtering Gaussian mixture models with Universal Background Models 3

32 Conclusions The FBI is using a PC-based forensic automatic speaker recognition (FASR) system. Project Conclusions FASR has been extensively tested on NIST single speaker and FV speech corpuses. The outputs are based on statistics with known error rates from large sample populations. Improve on existing channel normalization techniques. Future Directions Integrate automatic or manual pre-screening based upon quantifiable signal quality measures. Provide for a no decision rule when signal quality does not meet predefined conditions. Address the issue of using different background models for detected differences in the voice samples. 32

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art

More information

Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems

Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Detecting Replay Attacks from Far-Field Recordings on Speaker Verification Systems Jesús Villalba and Eduardo Lleida Communications Technology Group (GTC), Aragon Institute for Engineering Research (I3A),

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland Kong Aik Lee and

More information

Can binary masks improve intelligibility?

Can binary masks improve intelligibility? Can binary masks improve intelligibility? Mike Brookes (Imperial College London) & Mark Huckvale (University College London) Apparently so... 2 How does it work? 3 Time-frequency grid of local SNR + +

More information

Applications of Music Processing

Applications of Music Processing Lecture Music Processing Applications of Music Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Singing Voice Detection Important pre-requisite

More information

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS AKSHAY CHANDRASHEKARAN ANOOP RAMAKRISHNA akshayc@cmu.edu anoopr@andrew.cmu.edu ABHISHEK JAIN GE YANG ajain2@andrew.cmu.edu younger@cmu.edu NIDHI KOHLI R

More information

Auditory Based Feature Vectors for Speech Recognition Systems

Auditory Based Feature Vectors for Speech Recognition Systems Auditory Based Feature Vectors for Speech Recognition Systems Dr. Waleed H. Abdulla Electrical & Computer Engineering Department The University of Auckland, New Zealand [w.abdulla@auckland.ac.nz] 1 Outlines

More information

Mikko Myllymäki and Tuomas Virtanen

Mikko Myllymäki and Tuomas Virtanen NON-STATIONARY NOISE MODEL COMPENSATION IN VOICE ACTIVITY DETECTION Mikko Myllymäki and Tuomas Virtanen Department of Signal Processing, Tampere University of Technology Korkeakoulunkatu 1, 3370, Tampere,

More information

Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam

Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam Princeton ELE 201, Spring 2014 Laboratory No. 2 Shazam 1 Background In this lab we will begin to code a Shazam-like program to identify a short clip of music using a database of songs. The basic procedure

More information

TSA 6000 System Features Summary

TSA 6000 System Features Summary 2006-03-01 1. TSA 6000 Introduction... 2 1.1 TSA 6000 Overview... 2 1.2 TSA 6000 Base System... 2 1.3 TSA 6000 Software Options... 2 1.4 TSA 6000 Hardware Options... 2 2. TSA 6000 Hardware... 3 2.1 Signal

More information

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs

Automatic Text-Independent. Speaker. Recognition Approaches Using Binaural Inputs Automatic Text-Independent Speaker Recognition Approaches Using Binaural Inputs Karim Youssef, Sylvain Argentieri and Jean-Luc Zarader 1 Outline Automatic speaker recognition: introduction Designed systems

More information

Single Channel Speaker Segregation using Sinusoidal Residual Modeling

Single Channel Speaker Segregation using Sinusoidal Residual Modeling NCC 2009, January 16-18, IIT Guwahati 294 Single Channel Speaker Segregation using Sinusoidal Residual Modeling Rajesh M Hegde and A. Srinivas Dept. of Electrical Engineering Indian Institute of Technology

More information

VQ Source Models: Perceptual & Phase Issues

VQ Source Models: Perceptual & Phase Issues VQ Source Models: Perceptual & Phase Issues Dan Ellis & Ron Weiss Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA {dpwe,ronw}@ee.columbia.edu

More information

SOUND SOURCE RECOGNITION AND MODELING

SOUND SOURCE RECOGNITION AND MODELING SOUND SOURCE RECOGNITION AND MODELING CASA seminar, summer 2000 Antti Eronen antti.eronen@tut.fi Contents: Basics of human sound source recognition Timbre Voice recognition Recognition of environmental

More information

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation

Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Evaluation of clipping-noise suppression of stationary-noisy speech based on spectral compensation Takahiro FUKUMORI ; Makoto HAYAKAWA ; Masato NAKAYAMA 2 ; Takanobu NISHIURA 2 ; Yoichi YAMASHITA 2 Graduate

More information

Voice Activity Detection

Voice Activity Detection Voice Activity Detection Speech Processing Tom Bäckström Aalto University October 2015 Introduction Voice activity detection (VAD) (or speech activity detection, or speech detection) refers to a class

More information

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes

SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN. Yu Wang and Mike Brookes SPEECH ENHANCEMENT USING A ROBUST KALMAN FILTER POST-PROCESSOR IN THE MODULATION DOMAIN Yu Wang and Mike Brookes Department of Electrical and Electronic Engineering, Exhibition Road, Imperial College London,

More information

NCCF ACF. cepstrum coef. error signal > samples

NCCF ACF. cepstrum coef. error signal > samples ESTIMATION OF FUNDAMENTAL FREQUENCY IN SPEECH Petr Motl»cek 1 Abstract This paper presents an application of one method for improving fundamental frequency detection from the speech. The method is based

More information

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation

A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation A Comparison of the Convolutive Model and Real Recording for Using in Acoustic Echo Cancellation SEPTIMIU MISCHIE Faculty of Electronics and Telecommunications Politehnica University of Timisoara Vasile

More information

On The Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems

On The Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems On The Correlation of Image Size to System Accuracy in Automatic Fingerprint Identification Systems J.K. Schneider, C. E. Richardson, F.W. Kiefer, and Venu Govindaraju Ultra-Scan Corporation, 4240 Ridge

More information

Distinguishing Identical Twins by Face Recognition

Distinguishing Identical Twins by Face Recognition Distinguishing Identical Twins by Face Recognition P. Jonathon Phillips, Patrick J. Flynn, Kevin W. Bowyer, Richard W. Vorder Bruegge, Patrick J. Grother, George W. Quinn, and Matthew Pruitt Abstract The

More information

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE

24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY /$ IEEE 24 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 1, JANUARY 2009 Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation Jiucang Hao, Hagai

More information

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques

Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques 81 Isolated Word Recognition Based on Combination of Multiple Noise-Robust Techniques Noboru Hayasaka 1, Non-member ABSTRACT

More information

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz

RECOMMENDATION ITU-R F *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz Rec. ITU-R F.240-7 1 RECOMMENDATION ITU-R F.240-7 *, ** Signal-to-interference protection ratios for various classes of emission in the fixed service below about 30 MHz (Question ITU-R 143/9) (1953-1956-1959-1970-1974-1978-1986-1990-1992-2006)

More information

Detection of Targets in Noise and Pulse Compression Techniques

Detection of Targets in Noise and Pulse Compression Techniques Introduction to Radar Systems Detection of Targets in Noise and Pulse Compression Techniques Radar Course_1.ppt ODonnell 6-18-2 Disclaimer of Endorsement and Liability The video courseware and accompanying

More information

Rec. ITU-R F RECOMMENDATION ITU-R F *,**

Rec. ITU-R F RECOMMENDATION ITU-R F *,** Rec. ITU-R F.240-6 1 RECOMMENDATION ITU-R F.240-6 *,** SIGNAL-TO-INTERFERENCE PROTECTION RATIOS FOR VARIOUS CLASSES OF EMISSION IN THE FIXED SERVICE BELOW ABOUT 30 MHz (Question 143/9) Rec. ITU-R F.240-6

More information

THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE

THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE A. Martin*, G. Doddington#, T. Kamm+, M. Ordowski+, M. Przybocki* *National Institute of Standards and Technology, Bldg. 225-Rm. A216, Gaithersburg,

More information

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises

ELT Receiver Architectures and Signal Processing Fall Mandatory homework exercises ELT-44006 Receiver Architectures and Signal Processing Fall 2014 1 Mandatory homework exercises - Individual solutions to be returned to Markku Renfors by email or in paper format. - Solutions are expected

More information

Robust Speaker Recognition using Microphone Arrays

Robust Speaker Recognition using Microphone Arrays ISCA Archive Robust Speaker Recognition using Microphone Arrays Iain A. McCowan Jason Pelecanos Sridha Sridharan Speech Research Laboratory, RCSAVT, School of EESE Queensland University of Technology GPO

More information

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES

AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES AUTOMATIC SPEECH RECOGNITION FOR NUMERIC DIGITS USING TIME NORMALIZATION AND ENERGY ENVELOPES N. Sunil 1, K. Sahithya Reddy 2, U.N.D.L.mounika 3 1 ECE, Gurunanak Institute of Technology, (India) 2 ECE,

More information

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods Tools and Applications Chapter Intended Learning Outcomes: (i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

More information

SpeakerID - Voice Activity Detection

SpeakerID - Voice Activity Detection SpeakerID - Voice Activity Detection Victor Lenoir Technical Report n o 1112, June 2011 revision 2288 Voice Activity Detection has many applications. It s for example a mandatory front-end process in speech

More information

Dynamic thresholding for automated analysis of bobbin probe eddy current data

Dynamic thresholding for automated analysis of bobbin probe eddy current data International Journal of Applied Electromagnetics and Mechanics 15 (2001/2002) 39 46 39 IOS Press Dynamic thresholding for automated analysis of bobbin probe eddy current data H. Shekhar, R. Polikar, P.

More information

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise

Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Classification of ships using autocorrelation technique for feature extraction of the underwater acoustic noise Noha KORANY 1 Alexandria University, Egypt ABSTRACT The paper applies spectral analysis to

More information

Speech Synthesis using Mel-Cepstral Coefficient Feature

Speech Synthesis using Mel-Cepstral Coefficient Feature Speech Synthesis using Mel-Cepstral Coefficient Feature By Lu Wang Senior Thesis in Electrical Engineering University of Illinois at Urbana-Champaign Advisor: Professor Mark Hasegawa-Johnson May 2018 Abstract

More information

Cooperative Networked Radar: The Two-Step Detector

Cooperative Networked Radar: The Two-Step Detector Cooperative Networked Radar: The Two-Step Detector Max Scharrenbroich*, Michael Zatman*, and Radu Balan** * QinetiQ North America, ** University of Maryland, College Park Asilomar Conference on Signals,

More information

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification

A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification A Correlation-Maximization Denoising Filter Used as An Enhancement Frontend for Noise Robust Bird Call Classification Wei Chu and Abeer Alwan Speech Processing and Auditory Perception Laboratory Department

More information

The fundamentals of detection theory

The fundamentals of detection theory Advanced Signal Processing: The fundamentals of detection theory Side 1 of 18 Index of contents: Advanced Signal Processing: The fundamentals of detection theory... 3 1 Problem Statements... 3 2 Detection

More information

High-speed Noise Cancellation with Microphone Array

High-speed Noise Cancellation with Microphone Array Noise Cancellation a Posteriori Probability, Maximum Criteria Independent Component Analysis High-speed Noise Cancellation with Microphone Array We propose the use of a microphone array based on independent

More information

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM

CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM CO-CHANNEL SPEECH DETECTION APPROACHES USING CYCLOSTATIONARITY OR WAVELET TRANSFORM Arvind Raman Kizhanatham, Nishant Chandra, Robert E. Yantorno Temple University/ECE Dept. 2 th & Norris Streets, Philadelphia,

More information

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this

More information

Pulse Code Modulation

Pulse Code Modulation Pulse Code Modulation EE 44 Spring Semester Lecture 9 Analog signal Pulse Amplitude Modulation Pulse Width Modulation Pulse Position Modulation Pulse Code Modulation (3-bit coding) 1 Advantages of Digital

More information

Speaker verification in a time-feature space

Speaker verification in a time-feature space Oregon Health & Science University OHSU Digital Commons Scholar Archive 3-1-1999 Speaker verification in a time-feature space Sarel Van Vuuren Follow this and additional works at: http://digitalcommons.ohsu.edu/etd

More information

Individuality of Fingerprints

Individuality of Fingerprints Individuality of Fingerprints Sargur N. Srihari Department of Computer Science and Engineering University at Buffalo, State University of New York srihari@cedar.buffalo.edu IAI Conference, San Diego, CA

More information

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE

ROBUST PITCH TRACKING USING LINEAR REGRESSION OF THE PHASE - @ Ramon E Prieto et al Robust Pitch Tracking ROUST PITCH TRACKIN USIN LINEAR RERESSION OF THE PHASE Ramon E Prieto, Sora Kim 2 Electrical Engineering Department, Stanford University, rprieto@stanfordedu

More information

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection

Singing Voice Detection. Applications of Music Processing. Singing Voice Detection. Singing Voice Detection. Singing Voice Detection Detection Lecture usic Processing Applications of usic Processing Christian Dittmar International Audio Laboratories Erlangen christian.dittmar@audiolabs-erlangen.de Important pre-requisite for: usic segmentation

More information

NIST SRE 2008 IIR and I4U Submissions. Presented by Haizhou LI, Bin MA and Kong Aik LEE NIST SRE08 Workshop, Montreal, Jun 17-18, 2008

NIST SRE 2008 IIR and I4U Submissions. Presented by Haizhou LI, Bin MA and Kong Aik LEE NIST SRE08 Workshop, Montreal, Jun 17-18, 2008 NIST SRE 2008 IIR and I4U Submissions Presented by Haizhou LI, Bin MA and Kong Aik LEE NIST SRE08 Workshop, Montreal, Jun 17-18, 2008 Agenda IIR and I4U System Overview Subsystems & Features Fusion Strategies

More information

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events

Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events INTERSPEECH 2013 Joint recognition and direction-of-arrival estimation of simultaneous meetingroom acoustic events Rupayan Chakraborty and Climent Nadeu TALP Research Centre, Department of Signal Theory

More information

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research

Improving Meetings with Microphone Array Algorithms. Ivan Tashev Microsoft Research Improving Meetings with Microphone Array Algorithms Ivan Tashev Microsoft Research Why microphone arrays? They ensure better sound quality: less noises and reverberation Provide speaker position using

More information

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991

RASTA-PLP SPEECH ANALYSIS. Aruna Bayya. Phil Kohn y TR December 1991 RASTA-PLP SPEECH ANALYSIS Hynek Hermansky Nelson Morgan y Aruna Bayya Phil Kohn y TR-91-069 December 1991 Abstract Most speech parameter estimation techniques are easily inuenced by the frequency response

More information

Efficiency and detectability of random reactive jamming in wireless networks

Efficiency and detectability of random reactive jamming in wireless networks Efficiency and detectability of random reactive jamming in wireless networks Ni An, Steven Weber Modeling & Analysis of Networks Laboratory Drexel University Department of Electrical and Computer Engineering

More information

Automotive three-microphone voice activity detector and noise-canceller

Automotive three-microphone voice activity detector and noise-canceller Res. Lett. Inf. Math. Sci., 005, Vol. 7, pp 47-55 47 Available online at http://iims.massey.ac.nz/research/letters/ Automotive three-microphone voice activity detector and noise-canceller Z. QI and T.J.MOIR

More information

Comparison of Spectral Analysis Methods for Automatic Speech Recognition

Comparison of Spectral Analysis Methods for Automatic Speech Recognition INTERSPEECH 2013 Comparison of Spectral Analysis Methods for Automatic Speech Recognition Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian Department of Electrical and Computer Engineering

More information

Relative phase information for detecting human speech and spoofed speech

Relative phase information for detecting human speech and spoofed speech Relative phase information for detecting human speech and spoofed speech Longbiao Wang 1, Yohei Yoshida 1, Yuta Kawakami 1 and Seiichi Nakagawa 2 1 Nagaoka University of Technology, Japan 2 Toyohashi University

More information

Speech/Music Discrimination via Energy Density Analysis

Speech/Music Discrimination via Energy Density Analysis Speech/Music Discrimination via Energy Density Analysis Stanis law Kacprzak and Mariusz Zió lko Department of Electronics, AGH University of Science and Technology al. Mickiewicza 30, Kraków, Poland {skacprza,

More information

Supplementary Materials for

Supplementary Materials for advances.sciencemag.org/cgi/content/full/1/11/e1501057/dc1 Supplementary Materials for Earthquake detection through computationally efficient similarity search The PDF file includes: Clara E. Yoon, Ossian

More information

Speech and Music Discrimination based on Signal Modulation Spectrum.

Speech and Music Discrimination based on Signal Modulation Spectrum. Speech and Music Discrimination based on Signal Modulation Spectrum. Pavel Balabko June 24, 1999 1 Introduction. This work is devoted to the problem of automatic speech and music discrimination. As we

More information

4.5.1 Mirroring Gain/Offset Registers GPIO CMV Snapshot Control... 14

4.5.1 Mirroring Gain/Offset Registers GPIO CMV Snapshot Control... 14 Thank you for choosing the MityCAM-C8000 from Critical Link. The MityCAM-C8000 MityViewer Quick Start Guide will guide you through the software installation process and the steps to acquire your first

More information

Multiple Sound Sources Localization Using Energetic Analysis Method

Multiple Sound Sources Localization Using Energetic Analysis Method VOL.3, NO.4, DECEMBER 1 Multiple Sound Sources Localization Using Energetic Analysis Method Hasan Khaddour, Jiří Schimmel Department of Telecommunications FEEC, Brno University of Technology Purkyňova

More information

Detection of Compound Structures in Very High Spatial Resolution Images

Detection of Compound Structures in Very High Spatial Resolution Images Detection of Compound Structures in Very High Spatial Resolution Images Selim Aksoy Department of Computer Engineering Bilkent University Bilkent, 06800, Ankara, Turkey saksoy@cs.bilkent.edu.tr Joint work

More information

Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection

Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection Forced Oscillation Detection Fundamentals Fundamentals of Forced Oscillation Detection John Pierre University of Wyoming pierre@uwyo.edu IEEE PES General Meeting July 17-21, 2016 Boston Outline Fundamental

More information

A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS

A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS A JOINT MODULATION IDENTIFICATION AND FREQUENCY OFFSET CORRECTION ALGORITHM FOR QAM SYSTEMS Evren Terzi, Hasan B. Celebi, and Huseyin Arslan Department of Electrical Engineering, University of South Florida

More information

Campus Location Recognition using Audio Signals

Campus Location Recognition using Audio Signals 1 Campus Location Recognition using Audio Signals James Sun,Reid Westwood SUNetID:jsun2015,rwestwoo Email: jsun2015@stanford.edu, rwestwoo@stanford.edu I. INTRODUCTION People use sound both consciously

More information

Signal Processing First Lab 20: Extracting Frequencies of Musical Tones

Signal Processing First Lab 20: Extracting Frequencies of Musical Tones Signal Processing First Lab 20: Extracting Frequencies of Musical Tones Pre-Lab and Warm-Up: You should read at least the Pre-Lab and Warm-up sections of this lab assignment and go over all exercises in

More information

Dimension Reduction of the Modulation Spectrogram for Speaker Verification

Dimension Reduction of the Modulation Spectrogram for Speaker Verification Dimension Reduction of the Modulation Spectrogram for Speaker Verification Tomi Kinnunen Speech and Image Processing Unit Department of Computer Science University of Joensuu, Finland tkinnu@cs.joensuu.fi

More information

IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS

IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS 1 International Conference on Cyberworlds IMPROVEMENT OF SPEECH SOURCE LOCALIZATION IN NOISY ENVIRONMENT USING OVERCOMPLETE RATIONAL-DILATION WAVELET TRANSFORMS Di Liu, Andy W. H. Khong School of Electrical

More information

Using the Time Dimension to Sense Signals with Partial Spectral Overlap. Mihir Laghate and Danijela Cabric 5 th December 2016

Using the Time Dimension to Sense Signals with Partial Spectral Overlap. Mihir Laghate and Danijela Cabric 5 th December 2016 Using the Time Dimension to Sense Signals with Partial Spectral Overlap Mihir Laghate and Danijela Cabric 5 th December 2016 Outline Goal, Motivation, and Existing Work System Model Assumptions Time-Frequency

More information

Combining Voice Activity Detection Algorithms by Decision Fusion

Combining Voice Activity Detection Algorithms by Decision Fusion Combining Voice Activity Detection Algorithms by Decision Fusion Evgeny Karpov, Zaur Nasibov, Tomi Kinnunen, Pasi Fränti Speech and Image Processing Unit, University of Eastern Finland, Joensuu, Finland

More information

Long Range Acoustic Classification

Long Range Acoustic Classification Approved for public release; distribution is unlimited. Long Range Acoustic Classification Authors: Ned B. Thammakhoune, Stephen W. Lang Sanders a Lockheed Martin Company P. O. Box 868 Nashua, New Hampshire

More information

Voiced/nonvoiced detection based on robustness of voiced epochs

Voiced/nonvoiced detection based on robustness of voiced epochs Voiced/nonvoiced detection based on robustness of voiced epochs by N. Dhananjaya, B.Yegnanarayana in IEEE Signal Processing Letters, 17, 3 : 273-276 Report No: IIIT/TR/2010/50 Centre for Language Technologies

More information

Symbol Timing Recovery for Low-SNR Partial Response Recording Channels

Symbol Timing Recovery for Low-SNR Partial Response Recording Channels Symbol Timing Recovery for Low-SNR Partial Response Recording Channels Jingfeng Liu, Hongwei Song and B. V. K. Vijaya Kumar Data Storage Systems Center Carnegie Mellon University 5 Forbes Ave Pittsburgh,

More information

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim

SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION. Changkyu Choi, Seungho Choi, and Sang-Ryong Kim SPEECH ENHANCEMENT USING SPARSE CODE SHRINKAGE AND GLOBAL SOFT DECISION Changkyu Choi, Seungho Choi, and Sang-Ryong Kim Human & Computer Interaction Laboratory Samsung Advanced Institute of Technology

More information

Non-coherent pulse compression - concept and waveforms Nadav Levanon and Uri Peer Tel Aviv University

Non-coherent pulse compression - concept and waveforms Nadav Levanon and Uri Peer Tel Aviv University Non-coherent pulse compression - concept and waveforms Nadav Levanon and Uri Peer Tel Aviv University nadav@eng.tau.ac.il Abstract - Non-coherent pulse compression (NCPC) was suggested recently []. It

More information

Statistical Signal Processing. Project: PC-Based Acoustic Radar

Statistical Signal Processing. Project: PC-Based Acoustic Radar Statistical Signal Processing Project: PC-Based Acoustic Radar Mats Viberg Revised February, 2002 Abstract The purpose of this project is to demonstrate some fundamental issues in detection and estimation.

More information

Speech Enhancement using Wiener filtering

Speech Enhancement using Wiener filtering Speech Enhancement using Wiener filtering S. Chirtmay and M. Tahernezhadi Department of Electrical Engineering Northern Illinois University DeKalb, IL 60115 ABSTRACT The problem of reducing the disturbing

More information

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis Mohini Avatade & S.L. Sahare Electronics & Telecommunication Department, Cummins

More information

Low Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers

Low Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers Low Power Microphone Acquisition and Processing for Always-on Applications Based on Microcontrollers Architecture I: standalone µc Microphone Microcontroller User Output Microcontroller used to implement

More information

Robust Low-Resource Sound Localization in Correlated Noise

Robust Low-Resource Sound Localization in Correlated Noise INTERSPEECH 2014 Robust Low-Resource Sound Localization in Correlated Noise Lorin Netsch, Jacek Stachurski Texas Instruments, Inc. netsch@ti.com, jacek@ti.com Abstract In this paper we address the problem

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICCE.2012.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /ICCE.2012. Zhu, X., Doufexi, A., & Koçak, T. (2012). A performance enhancement for 60 GHz wireless indoor applications. In ICCE 2012, Las Vegas Institute of Electrical and Electronics Engineers (IEEE). DOI: 10.1109/ICCE.2012.6161865

More information

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion

A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion American Journal of Applied Sciences 5 (4): 30-37, 008 ISSN 1546-939 008 Science Publications A Three-Microphone Adaptive Noise Canceller for Minimizing Reverberation and Signal Distortion Zayed M. Ramadan

More information

Digital Speech Processing and Coding

Digital Speech Processing and Coding ENEE408G Spring 2006 Lecture-2 Digital Speech Processing and Coding Spring 06 Instructor: Shihab Shamma Electrical & Computer Engineering University of Maryland, College Park http://www.ece.umd.edu/class/enee408g/

More information

Nonuniform multi level crossing for signal reconstruction

Nonuniform multi level crossing for signal reconstruction 6 Nonuniform multi level crossing for signal reconstruction 6.1 Introduction In recent years, there has been considerable interest in level crossing algorithms for sampling continuous time signals. Driven

More information

SIGNAL DETECTION IN NON-GAUSSIAN NOISE BY A KURTOSIS-BASED PROBABILITY DENSITY FUNCTION MODEL

SIGNAL DETECTION IN NON-GAUSSIAN NOISE BY A KURTOSIS-BASED PROBABILITY DENSITY FUNCTION MODEL SIGNAL DETECTION IN NON-GAUSSIAN NOISE BY A KURTOSIS-BASED PROBABILITY DENSITY FUNCTION MODEL A. Tesei, and C.S. Regazzoni Department of Biophysical and Electronic Engineering (DIBE), University of Genoa

More information

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition

Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Feature Extraction Using 2-D Autoregressive Models For Speaker Recognition Sriram Ganapathy 1, Samuel Thomas 1 and Hynek Hermansky 1,2 1 Dept. of ECE, Johns Hopkins University, USA 2 Human Language Technology

More information

Quantitative Assessment of the Individuality of Friction Ridge Patterns

Quantitative Assessment of the Individuality of Friction Ridge Patterns Quantitative Assessment of the Individuality of Friction Ridge Patterns Sargur N. Srihari with H. Srinivasan, G. Fang, P. Phatak, V. Krishnaswamy Department of Computer Science and Engineering University

More information

Homework Assignment 13

Homework Assignment 13 Question 1 Short Takes 2 points each. Homework Assignment 13 1. Classify the type of feedback uses in the circuit below (i.e., shunt-shunt, series-shunt, ) 2. True or false: an engineer uses series-shunt

More information

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image

A Method for Voiced/Unvoiced Classification of Noisy Speech by Analyzing Time-Domain Features of Spectrogram Image Science Journal of Circuits, Systems and Signal Processing 2017; 6(2): 11-17 http://www.sciencepublishinggroup.com/j/cssp doi: 10.11648/j.cssp.20170602.12 ISSN: 2326-9065 (Print); ISSN: 2326-9073 (Online)

More information

Perceptive Speech Filters for Speech Signal Noise Reduction

Perceptive Speech Filters for Speech Signal Noise Reduction International Journal of Computer Applications (975 8887) Volume 55 - No. *, October 22 Perceptive Speech Filters for Speech Signal Noise Reduction E.S. Kasthuri and A.P. James School of Computer Science

More information

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS

SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS SIMULATION VOICE RECOGNITION SYSTEM FOR CONTROLING ROBOTIC APPLICATIONS 1 WAHYU KUSUMA R., 2 PRINCE BRAVE GUHYAPATI V 1 Computer Laboratory Staff., Department of Information Systems, Gunadarma University,

More information

Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University

Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University Jerry Reiter Department of Statistical Science Information Initiative at Duke Duke University jreiter@duke.edu 1 Acknowledgements Research supported by National Science Foundation ACI 14-43014, SES-11-31897,

More information

Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data

Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data INTERSPEECH 2013 Augmenting Short-term Cepstral Features with Long-term Discriminative Features for Speaker Verification of Telephone Data Cong-Thanh Do 1, Claude Barras 1, Viet-Bac Le 2, Achintya K. Sarkar

More information

Online Signature Verification by Using FPGA

Online Signature Verification by Using FPGA Online Signature Verification by Using FPGA D.Sandeep Assistant Professor, Department of ECE, Vignan Institute of Technology & Science, Telangana, India. ABSTRACT: The main aim of this project is used

More information

Computational Complexity of Multiuser. Receivers in DS-CDMA Systems. Syed Rizvi. Department of Electrical & Computer Engineering

Computational Complexity of Multiuser. Receivers in DS-CDMA Systems. Syed Rizvi. Department of Electrical & Computer Engineering Computational Complexity of Multiuser Receivers in DS-CDMA Systems Digital Signal Processing (DSP)-I Fall 2004 By Syed Rizvi Department of Electrical & Computer Engineering Old Dominion University Outline

More information

Time-of-arrival estimation for blind beamforming

Time-of-arrival estimation for blind beamforming Time-of-arrival estimation for blind beamforming Pasi Pertilä, pasi.pertila (at) tut.fi www.cs.tut.fi/~pertila/ Aki Tinakari, aki.tinakari (at) tut.fi Tampere University of Technology Tampere, Finland

More information

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa

Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Students: Avihay Barazany Royi Levy Supervisor: Kuti Avargel In Association with: Zoran, Haifa Spring 2008 Introduction Problem Formulation Possible Solutions Proposed Algorithm Experimental Results Conclusions

More information

Recent Advances in Acoustic Signal Extraction and Dereverberation

Recent Advances in Acoustic Signal Extraction and Dereverberation Recent Advances in Acoustic Signal Extraction and Dereverberation Emanuël Habets Erlangen Colloquium 2016 Scenario Spatial Filtering Estimated Desired Signal Undesired sound components: Sensor noise Competing

More information

Experiments with An Improved Iris Segmentation Algorithm

Experiments with An Improved Iris Segmentation Algorithm Experiments with An Improved Iris Segmentation Algorithm Xiaomei Liu, Kevin W. Bowyer, Patrick J. Flynn Department of Computer Science and Engineering University of Notre Dame Notre Dame, IN 46556, U.S.A.

More information

Spectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition

Spectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition Circuits, Systems, and Signal Processing manuscript No. (will be inserted by the editor) Spectral Reconstruction and Noise Model Estimation based on a Masking Model for Noise-Robust Speech Recognition

More information

Short Paper: The Softwater Modem A Software Modem for Underwater Acoustic Communication

Short Paper: The Softwater Modem A Software Modem for Underwater Acoustic Communication Short Paper: The Softwater Modem A Software Modem for Underwater Acoustic Communication Brian Borowski and Dan Duchamp Department of Computer Science Stevens Institute of Technology Castle Point on Hudson,

More information