FORENSIC AUTOMATION SPEAKER RECOGNITION

Size: px

Start display at page:

Download "FORENSIC AUTOMATION SPEAKER RECOGNITION"

Sylvia Heath
5 years ago
Views:

1 FORENSIC AUTOMATION SPEAKER RECOGNITION June 2, 2 BAE Systems Hirotaka Nakasone Federal Bureau of Investigation Quantico, VA 2235 hnakasone@fbiacademy.edu Steven D. Beck BAE SYSTEMS 65 Tracor Ln. MS 27-6 Austin, TX steve.beck@baesystems.com

2 PRESENTATION OUTLINE The Problem of Forensic Acoustical Analysis FBI Forensic Voice Database (FV) ASR Evaluation Results for FV Confidence Measures The FASR System Conclusions 2

w av Most voice samples are recorded in uncontrolled environments, and there are many unknown sources of variability.

3 Forensic ASR Problems Every month, the FBI receives numerous criminal cases involving recorded voice samples Same Speakers Different Sessions Different Text Spectrogram: File=228st.w av Most voice samples are recorded in uncontrolled environments, and there are many unknown sources of variability. Four primary sources of voice sample variations of interest to the forensic community include: - Speech source characteristics - Transmission channel characteristics - Usable speech duration - Signal-to-Noise Ratio H z n cyin e F requ H z n cyin e F requ Time in Seconds 5 Spectrogram: File=228s6t.w av Time in Seconds 3

4 FV Voice Database Data Collection & 2 The Forensic Voice Data Base was developed as part of Project CAVIS during in cooperation with LA County Sheriff s Department and NIJ/DOJ Grant 85-IJ-CX-24. B&K Model 455 Microphone In-House Telephone Body Microphone and Transmitter Receiver CAVIS Experiment Collection Collection 2 Collection 3 Number of Speakers Number of Sessions 2 Samples per Session 5 Sample Length 3 Seconds 3 Seconds 3 Seconds Speaking Mode Transmission Mode Spontaneous Reading Prescribed (3 sec) Microphone Telephone Body Transmitter Spontaneous Reading Microphone Telephone Body Transmitter Spontaneous Spontaneous 2 Telephone (Remote Call-in) Fostex Model R8 Four Channel Reel-to-Reel Recorder Remote Telephone Data Collection 3 4

5 FV Voice Database Speaking Modes Spontaneous: The speaker is shown a set of slides (one per session sample) and then begins talking about that slide. Speech segments are 29 seconds long, and the text is independent. Reading: The speaker reads a passage of text. Speech segments are 29 seconds long. The text per session sample is independent for data collection, and dependent for data collection 2. Prescribed: The speaker says, There is a bomb in the plant. Get Out! The speech segments are 2-3 seconds long. This mode is only available for data collection, and is text dependent. Spontaneous : The speaker is told to talk about a particular topic per session sample, and is available only for collection 3 - remote telephone. Spontaneous 2: The speaker is told to talk about any topic, and is available only for collection 3 - remote telephone. 5

6 FV Voice Database Transmission Modes Periodogram of Simultaneously Recorded Voice Samples Periodogram: Body Mic Channel, File=sb.wav Body Transmitter: Electret microphone plus AM transmitter. Nominal bandwidth is 3 Hz - 36 Hz. Microphone: B&K Model 455 Nominal bandwidth is 2 Hz - 8 Hz. In-House Telephone : Nominal bandwidth is 3 Hz - 36 Hz. Remote Telephone (no periodogram): Nominal bandwidth is 3 Hz - 36 Hz. Amplitude in db Amplitude in db Amplitude in db Periodogram: Microphone Channel, File=sm.wav Periodogram: Telephone Channel, File=st.wav Frequency in Hz 6

7 FV Voice Database Conditional Data Set Breakdown Speaking Mode (SM) P = Prescribed Text R = Reading S = Spontaneous Transmission Channel (TM) M = B&K Microphone B = Body Mic & Transmitter T = Telephone (in-house) Tlgd = Telephone (remote) Number of Files for each Speaking Mode and Channel Type SM / TM Total M T B Tlgd # S Files # R Files # P Files

8 FV Voice Database FV Voice Database Description Speaking Text Length Trans. Number Sessions Samples Mode Dep. (sec) Mode Speakers S TI 29 M,B,T 5 S TI 6 M,B,T 5 2 S TI 29 Tld S TI 6 Tld 5 2 R TI 29 M,B,T 5 R TI 6 M,B,T 5 2 P TD 3 M,B,T 5 P TD 2 M,B,T 5 2 8

9 FV Voice Database Histograms of SNR and Duration SNR 3-Sec. Probability SNR For All 3 Second Files Probability Signal Dur For All 3 Second Files DUR 3-Sec SNR in db SNR For All 3 Second S Files Duration in sec. Signal Dur For All 3 Second S Files 5 SNR 3-Sec. Probability.4.3. Probability.5..5 DUR 3-Sec SNR in db 2 3 Duration in sec. 9

10 FV Voice Database Histograms for Spontaneous Speaking Mode SNR DURATION Microphone Telephone Microphone Telephone SNR Distribution SM=S, TM=M (3 sec).5 SNR Distribution SM=S, TM=T (3 sec) 5 Duration SM=S, TM=M (3 sec).4 Duration SM=S, TM=T (3 sec) Probability.4.3. Probability.5..5 Probability.3. Probability SNR in db SNR Distribution SM=S, TM=B (3 sec) SNR in db SNR Distribution SM=S, TM=Tlgd (3 sec) Duration in sec. Duration SM=S, TM=B (3 sec) Duration in sec. Duration SM=S, TM=Tlgd (3 sec).4 Probability.6.4 Probability.5..5 Probability.5..5 Probability SNR in db Body Transmitter SNR in db Telephone-Remote Duration in sec. Body Transmitter Duration in sec. Telephone-Remote

11 FV Voice Database Data Formats and Filenames Data Formats Evaluation Sampling Rate = 6, samples/sec. Resolution = 6 bits Word Format = Sun <MSB,LSB> File Header = 24 byte SPHERE Filenames Evaluation The randomized filename format is: FV_xxxx.sph where FV signifies FBI Forensic Voice Dataset. xxxx is a unique four place number..sph is the file ending for SPHERE. Data Formats Analysis Sampling Rate = 8, samples/sec. Resolution = 6 bits Word Format = PC <LSB,MSB> File Header = MS WAV Filenames Analysis File Name Example = R4T.wav Speaker = Speech Mode = Reading Sample = 4 out of Transmision = Telephone

12 ASR Evaluation on FV Purpose of Blind Test and Evaluation Assess the maturity of Automatic Speaker Recognition technology for application in the field of forensic science. Time Frame for Test Participants GTE/BBN MIT Lincoln Laboratory Oregon Graduate Institute of Sciences and Technology T-Netix Wagner Associates U.S. Air Force Research Laboratory, Rome, NY U.S. Air Force Research Laboratory, Wright-Patterson, OH 2

13 ASR Evaluation on FV Multiple Levels of Difficulty The speech samples are assigned to one of four Levels of Difficulty, which represent different testing criteria. Each level is further divided into 2 separate trials, giving a total of 48 independent classifier tests. Level of Difficulty Text Dependence Channel Dependence I Independent Independent II Dependent Independent III Independent Dependent IV Dependent Dependent Level of Difficulty Test Number File Length (sec.) Speaking Mode I S I R I 7-9 4*29 S I -2 4*29 R II -3 3 P II P II 7-9 4*3 P II -2 4*3 P III S III R III 7-9 4*29 S III -2 4*29 R IV -3 3 P IV P IV 7-9 4*3 P IV -2 4*3 P 3

14 ASR Evaluation on FV Level Training Sets Example Training and Testing Sets. Levels 2-4 are similar. Level Testing Sets Trial Number CD-ROM Volume Data Directory Files per Speaker Number of Speakers Total Files FVTRN L3TRN FVTRN LTRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN L3TRN FVTRN2 L3TRN FVTRN2 LTRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN FVTRN2 L3TRN Trial CD-ROM Directory Number of Total File Len. (sec) Number Volume Speakers Files FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 3 FVTST,2 LTST , 2, 29, 6 4 FVTST,2 LTST , 2, 29, 6 5 FVTST,2 LTST , 2, 29, 6 6 FVTST,2 LTST , 2, 29, 6 7 FVTST,2 LTST , 2, 29, 6 8 FVTST,2 LTST , 2, 29, 6 9 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 FVTST,2 LTST , 2, 29, 6 2 FVTST,2 LTST , 2, 29, 6 4 File Len. (sec)

15 ASR Evaluation on FV Open Set Speaker Verification Compare a voice test segment with a single target voice model. If the resulting score exceeds a detection threshold, then declare a match. DET Curve: Plot vs. Pfa EER: Operating point where = Pfa Neyman-Pearson: Operating point minimizes for fixed Pfa DCF : Operating point based on the relative cost of making Type I and Type II Errors. C Det = C Miss * P + Miss/ T arget * PT arget CFalseAlarm* PFalseAlarmNonT / arget * P NonTarget Closed Set Speaker Identification Compare a voice test segment with a set of target voice models. Decide which target voice model best matches the test segment. Rank-: Rank-3: The correct model is the best match. The correct model is among the top 3 matches. 5

16 ASR Evaluation on FV Speaker Verification Test Results Level I - Text Independent, Transmission Independent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=LTST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=LTST2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=LTST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=B,Tsec= 3 SM=R,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 3 Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=R,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 3 SM=S,TM=T,Tsec= 3 SM=R,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S T R T P T S B R B P B S M R M P M S B R B P B S M R M P M S T R T P T

17 ASR Evaluation on FV Speaker Verification Test Results Level II - Text Dependent, Transmission Independent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L2TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L2TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L2TST3 TRN SM=P TRN TM=B 4 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= 2.5. SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T P B P B P M P M P B P B P M P M P T P T

18 ASR Evaluation on FV Speaker Verification Test Results Level III - Text Independent, Transmission Dependent Test : TRN Set Desc. SM=S, TM=M, Len=3 Test 2: TRN Set Desc. SM=S, TM=T, Len=3 Test 3: TRN Set Desc. SM=S, TM=B, Len=3 MITLL, Ver, Level/Test=L3TST TRN SM=S TRN TM=M MITLL, Ver, Level/Test=L3TST2 2 TRN SM=S TRN TM=T MITLL, Ver, Level/Test=L3TST3 TRN SM=S TRN TM=B Miss probability (in %) SM=S,TM=M,Tsec= 3 SM=S,TM=M,Tsec=2 SM=R,TM=M,Tsec= 3 SM=R,TM=M,Tsec=2 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=T,Tsec= 3 SM=S,TM=T,Tsec=2 SM=R,TM=T,Tsec= 3 SM=R,TM=T,Tsec=2 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= False Alarm probability (in %) Miss probability (in %) SM=S,TM=B,Tsec= 3 SM=S,TM=B,Tsec=2 SM=R,TM=B,Tsec= 3 SM=R,TM=B,Tsec=2 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. S M S M R M R M P M P M S T S T R T R T P T P T S B S B R B R B P B P B

19 ASR Evaluation on FV Speaker Verification Test Results Level IV - Text Dependent, Transmission Dependent Test : TRN Set Desc. SM=P, TM=M, Len=3 Test 2: TRN Set Desc. SM=P, TM=T, Len=3 Test 3: TRN Set Desc. SM=P, TM=B, Len=3 MITLL, Ver, Level/Test=L4TST TRN SM=P TRN TM=M MITLL, Ver, Level/Test=L4TST2 TRN SM=P TRN TM=T MITLL, Ver, Level/Test=L4TST3 TRN SM=P TRN TM=B 4 SM=P,TM=M,Tsec= 3 SM=P,TM=M,Tsec= 2 4 SM=P,TM=T,Tsec= 3 SM=P,TM=T,Tsec= 2 4 SM=P,TM=B,Tsec= 3 SM=P,TM=B,Tsec= Miss probability (in %) 5 2 Miss probability (in %) 5 2 Miss probability (in %) False Alarm probability (in %) False Alarm probability (in %) False Alarm probability (in %) SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P M P M SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P T P T SM TM LEN (Sec) EER % Pfa % for Fixed Pfa=. P B P B

20 ASR Evaluation on FV Equal Error Rate (EER) Comparison Channel TRN/TST Level Level 2 Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Developer 3 EER % Channel TRN/TST Developer EER % Developer 2 EER % Developer 3 EER % M/M T/T.. 5. B/B Tlgd/Tlgd Channel TRN/TST Developer EER % Developer 2 EER % M/T M/B T/M T/B B/M B/T Level 3 Level 4 Channel TRN/TST * The Developer Numbers have been randomized. Developer 4 EER % Developer EER % Developer 2 EER % M/M T/T B/B Developer 4 EER % Conclusions: Lower SNR (channel B), channel mismatch (Level ), and session variations (Tlgd) all contribute to worse detection performance. 2

21 ASR Evaluation on FV Closed Set ID Results Level Only Tests,2,3 out of 2 for one participant are shown. LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct I M / T S / S 3 / 3 84/93 = /93 = 98.4 I M / T S / S 3 / 2 37/ 4 = / 4 =. I M / T S / R 3 / 3 3/44 = 9. 4/44 = 97.2 I M / T S / R 3 / 2 27/ 29 = / 29 =. I M / T S / P 3 / 3 27/ 85 = 3.8 5/ 85 = 58.8 I M / T S / P 3 / 2 8/ 7 = 47. 2/ 7 = 7.6 I M / B S / S 3 / 3 73/94 = /94 = 92.8 I M / B S / S 3 / 2 24/ 4 = 6. 35/ 4 = 87.5 I M / B S / R 3 / 3 2/43 = /43 = 88. I M / B S / R 3 / 2 / 29 = / 29 = 79.3 I M / B S / P 3 / 3 44/ 9 = / 9 = 66.7 I M / B S / P 3 / 2 8/ 8 = / 8 = 77.8 I T / M S / S 3 / 3 66/94 = /94 = 93.8 I T / M S / S 3 / 2 3/ 4 = / 4 = 87.5 I T / M S / R 3 / 3 96/43 = 67. 4/43 = 79.7 I T / M S / R 3 / 2 4/ 29 = / 29 = 72.4 I T / M S / P 3 / 3 34/ 89 = / 89 = 49.4 I T / M S / P 3 / 2 8/ 8 = 44.4 / 8 = 55.6 I T / B S / S 3 / 3 64/98 = /98 = 88.9 I T / B S / S 3 / 2 32/ 4 = / 4 = 9 I T / B S / R 3 / 3 27/48 = /48 = 9.5 I T / B S / R 3 / 2 2/ 3 = / 3 = 9. I T / B S / P 3 / 3 28/ 95 = / 95 = 43.2 I T / B S / P 3 / 2 7/ 9 = / 9 = 47.4 I B / M S / S 3 / 3 53/94 = /94 = 89.7 I B / M S / S 3 / 2 29/ 4 = / 4 = 87.5 I B / M S / R 3 / 3 87/42 = 6.3 2/42 = 78.9 I B / M S / R 3 / 2 2/ 28 = / 28 = 78.6 I B / M S / P 3 / 3 47/ 9 = / 9 = 6. I B / M S / P 3 / 2 / 8 = 55.6 / 8 = 55.6 I B / T S / S 3 / 3 59/9 = /9 = 88.5 I B / T S / S 3 / 2 34/ 4 = / 4 = 9 I B / T S / R 3 / 3 99/49 = /49 = 8.9 I B / T S / R 3 / 2 2/ 3 = / 3 = 8. I B / T S / P 3 / 3 38/ 87 = / 87 = 59.8 I B / T S / P 3 / 2 8/ 7 = 47. / 7 =

22 ASR Evaluation on FV Closed Set Identification Results-Level 3 LEVEL Transmission TRN / TST Speaking TRN / TST Length (sec) TRN / TST RANK % Correct RANK3 % Correct III M / M S / S 3 / 3 94/94=. 94/94=. III M / M S / S 3 / 2 39/ 39 =. 39/ 39 =. IIIa M / M S / R 3 / 3 34/4 = 95. 4/4 = 99.3 III M / M S / R 3 / 2 26/ 27 = / 27 =. IIIa M / M S / P 3 / 3 58/ 89 = / 89 = 8.9 III M / M S / P 3 / 2 3/ 8 = / 8 = 83.3 III T / T S / S 3 / 3 96/96=. 96/96=. III T / T S / S 3 / 2 4/ 4=. 4/ 4=. IIIa T / T S / R 3 / 3 3/34 = /34=. III T / T S / R 3 / 2 24/ 26 = / 26 =. IIIa T / T S / P 3 / 3 45/ 83 = / 83 = 79.5 III T / T S / P 3 / 2 / 8 = / 8 = 77.8 IIIa B / B S / S 3 / 3 86/95 = /95 = 99. III B / B S / S 3 / 2 22/ 4 = / 4 = 73.2 IIIa B / B S / R 3 / 3 27/47 = /47 = 94.6 III B / B S / R 3 / 2 2/ 3 = 4. 5/ 3 = 5. IIIa B / B S / P 3 / 3 5/ 93 = / 93 = 63.4 III B / B S / P 3 / 2 / 9 = / 9 = 63.2 IIIa Tall / Tall S / S 3 / 3 329/425 = /425 = 88.5 III Tall / Tall S / S 3 / 2 54/ 86 = / 86 = 74.4 IIIa Tlgd / Tlgd S / S 3 / 3 34/229 = /229 = 79. III T lgd/ Tlgd S / S 3 / 2 6/ 46= / 46=

23 ASR Evaluation on FV Closed Set Identification (ID) Comparison * The Developer Numbers have been randomized. Level Rank- ID Performance Level 3 Rank- ID Performance Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/M S/S 29/ T/M S/R 29/ T/M S/P 29/ Trans. Trn/Tst Speech Trn/Tst Length Trn/Tst Dev. % Dev. 2 % Dev. 5 % T/T S/S 29/ T/T S/R 29/ T/T S/P 29/ Tld/Tld S/S 29/ Conclusions: Channel mismatch (Level ), signal duration mismatch (S/P), and session variations (Tlgd and S/R) all contribute to worse ID performance. Lack of channel normalization (CMS or RASTA) can result in random performance. 23

24 Confidence Measures FBI Forensic Voice Database 4 MITLL, Ver, Level/Test=L3TST SM=S TM=M Level III, SM3 Detection Error Trade-off (DET) Trades off the Miss Error Probability with the False Alarm Error Probability False/True Score PDFs Displays the PDF for ASR false model scores and true model scores for a relatively large population. The Equal Error Rate (EER) or the Decision Cost Function (DCF) operating point can be calculated and plotted. P P ( x H ) ( x H ) HT > < H ( C C ) P( H F ) ( C C ) P( H ) Threshold T = F F T ) % (in y b i lit a b o p r M i s s y b i lit a P rob False Alarm probability (in %) PDF for TRUE and FALSE Scores, Test=L3TST PDF-False Scores PDF-True Scores EER Threshold GMM LLRT Score (EER=.5464% Thresh=353) 24

25 Confidence Measures For a given GMM LRT score, find the confidence in a True decision based on a sample True/False population.6 True, False, and Test Scores y it D ens y b i lit a P rob ) x P (Ht.4.2 False Score True Score Test Score Probability Confidence Measure Confidence Curve Test Score Confidence Value GMM Output Scores False Distribution N(.,.35) True Distribution N(.,.35) Score =.7 Confidence = 84.6% P ( H x) T = P P( HT ) P( x HT ) ( H ) P( x H ) + P( H ) P( x H ) T T F F 25

26 Confidence Measures The posterior probability, p, is a curvilinear function that can be modeled with the form: Using the logistic transformation: We get the following linear form: The Logistic Model p exp = + exp p = ln p p ( β + βx ) ( β + β X ) p = β + X β In matrix form, the linear coefficients are solved using a pseudo-inverse: β = T ( X X ) X p T For a given score, first compute the linear model confidence measure: C M = β + Score*. β Then compute the natural confidence measure: CM exp( CM ) = + exp( CM ) 26

27 Confidence Measures Empirical Detection Data The Logit transformation can be used when the dependent variable is binary, e.g. True/False ( β + βx ) ( β + β X ) exp E ( Y ) = = + exp Y = [,] p Use the linear regression model: p i = + βx i β + ε i Use weighted least squares to insure a constant error variance terms for an optimal solution: β = T ( X WX ) X Wp T The diagonal weight matrix is: w i = ( ) σ 2 ε i The weights can be estimated using wˆ i = n p ( i i p i ) where and p i is the sample proportion in bin i n is the total number of test scores in bin i. i 27

28 Confidence Measures Confidence Measures For True-False Data Could Fit Least Squares Line Problems: - Data Doesn t Look Linear - Line Would Not Stay In Interval [,] Use empirical estimates as starting point for iterative procedure (Newton method or Levinberg-Marquart). 28

29 Confidence Measures Score distribution and confidence measures from NIST 999 eval Balanced mixture of electret and carbon-button telephone handsets ) % (in y t b ili a b o p r M i s s DET Curves for 3 Models, NIST99 Test Set=NIST Balanced All, SM=S, TM=T Male Balanced Female Balanced Gender Independent. Operating points and confidence measures have been derived for: gender dependent male models gender dependent female models gender independent models BKG Model EER % EER Threshold Male Bal Female Bal Gender Indep Bal False Alarm probability (in %) D F P e V al u e n c onfid e C Example for Male,Balanced, NIST99 PDF for TRUE and FALSE Scores, Test=Male-bal:mal-nist-all-t.out PDF of False PDF of True Est. Confidence Measure for Male-bal:mal-nist-all-t.out, Equal Priors GMM Log Likelihood Ratio Score

30 Confidence Measures Multivariate Logistic Model Logistic Model can be extended to use more than one independent variable The multiple regression model: p = β + β x + β x

31 FASR Description The Forensic Automatic Speaker Recognition (FASR) Program Developed by U.S. Air Force Research Laboratories, Rome NY With inputs from MITLL, FBI ERF, and BAE SYSTEMS FASR is a PC-based stand-alone workstation with an efficient GUI supporting: Data acquisition and playback Signal and spectrographic display Speech segmentation and labeling Tone detection and removal Speech quality measures (SNR, duration, bandwidth) Speaker Identification and Verification FASR uses robust speaker recognition algorithms: Mel cepstral coefficients,, Cepstral mean subtraction or RASTA filtering Gaussian mixture models with Universal Background Models 3

32 Conclusions The FBI is using a PC-based forensic automatic speaker recognition (FASR) system. Project Conclusions FASR has been extensively tested on NIST single speaker and FV speech corpuses. The outputs are based on statistics with known error rates from large sample populations. Improve on existing channel normalization techniques. Future Directions Integrate automatic or manual pre-screening based upon quantifiable signal quality measures. Provide for a no decision rule when signal quality does not meet predefined conditions. Address the issue of using different background models for detected differences in the voice samples. 32

Performance study of Text-independent Speaker identification system using MFCC & IMFCC for Telephone and Microphone Speeches

Performance study of Text-independent Speaker identification system using & I for Telephone and Microphone Speeches Ruchi Chaudhary, National Technical Research Organization Abstract: A state-of-the-art