THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE

Similar documents
Investigation of a Forward Looking Conformal Broadband Antenna for Airborne Wide Area Surveillance

Adaptive CFAR Performance Prediction in an Uncertain Environment

Radar Detection of Marine Mammals

COM DEV AIS Initiative. TEXAS II Meeting September 03, 2008 Ian D Souza

Signal Processing Architectures for Ultra-Wideband Wide-Angle Synthetic Aperture Radar Applications

Innovative 3D Visualization of Electro-optic Data for MCM

Modeling Antennas on Automobiles in the VHF and UHF Frequency Bands, Comparisons of Predictions and Measurements

Modeling an HF NVIS Towel-Bar Antenna on a Coast Guard Patrol Boat A Comparison of WIPL-D and the Numerical Electromagnetics Code (NEC)

Digital Radiography and X-ray Computed Tomography Slice Inspection of an Aluminum Truss Section

Report Documentation Page

UNCLASSIFIED UNCLASSIFIED 1

Improving the Detection of Near Earth Objects for Ground Based Telescopes

INTEGRATIVE MIGRATORY BIRD MANAGEMENT ON MILITARY BASES: THE ROLE OF RADAR ORNITHOLOGY

Modeling of Ionospheric Refraction of UHF Radar Signals at High Latitudes

Loop-Dipole Antenna Modeling using the FEKO code

Robotics and Artificial Intelligence. Rodney Brooks Director, MIT Computer Science and Artificial Intelligence Laboratory CTO, irobot Corp

Effects of Radar Absorbing Material (RAM) on the Radiated Power of Monopoles with Finite Ground Plane

Strategic Technical Baselines for UK Nuclear Clean-up Programmes. Presented by Brian Ensor Strategy and Engineering Manager NDA

August 9, Attached please find the progress report for ONR Contract N C-0230 for the period of January 20, 2015 to April 19, 2015.

Durable Aircraft. February 7, 2011

PULSED POWER SWITCHING OF 4H-SIC VERTICAL D-MOSFET AND DEVICE CHARACTERIZATION

David Siegel Masters Student University of Cincinnati. IAB 17, May 5 7, 2009 Ford & UM

THE NATIONAL SHIPBUILDING RESEARCH PROGRAM

Lattice Spacing Effect on Scan Loss for Bat-Wing Phased Array Antennas

MONITORING RUBBLE-MOUND COASTAL STRUCTURES WITH PHOTOGRAMMETRY

Wavelet Shrinkage and Denoising. Brian Dadson & Lynette Obiero Summer 2009 Undergraduate Research Supported by NSF through MAA

A HIGH-PRECISION COUNTER USING THE DSP TECHNIQUE

Underwater Intelligent Sensor Protection System

A RENEWED SPIRIT OF DISCOVERY

Drexel Object Occlusion Repository (DOOR) Trip Denton, John Novatnack and Ali Shokoufandeh

RADAR SATELLITES AND MARITIME DOMAIN AWARENESS

REPORT DOCUMENTATION PAGE. A peer-to-peer non-line-of-sight localization system scheme in GPS-denied scenarios. Dr.

Design of Synchronization Sequences in a MIMO Demonstration System 1

NPAL Acoustic Noise Field Coherence and Broadband Full Field Processing

Effects of Fiberglass Poles on Radiation Patterns of Log-Periodic Antennas

Combining High Dynamic Range Photography and High Range Resolution RADAR for Pre-discharge Threat Cues

SPOT 5 / HRS: a key source for navigation database

Tom Cat Designs LLC Protective Hull Modeling & Simulation Results For Iteration 1

Student Independent Research Project : Evaluation of Thermal Voltage Converters Low-Frequency Errors

CALIBRATION OF THE BEV GPS RECEIVER BY USING TWSTFT

SHIPBUILDING ACCURACY PHASE II

Thermal Simulation of Switching Pulses in an Insulated Gate Bipolar Transistor (IGBT) Power Module

Army Acoustics Needs

2008 Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies INFRAMONITOR: A TOOL FOR REGIONAL INFRASOUND MONITORING

Acoustic Change Detection Using Sources of Opportunity

[Research Title]: Electro-spun fine fibers of shape memory polymer used as an engineering part. Contractor (PI): Hirohisa Tamagawa

Two-Way Time Transfer Modem

CFDTD Solution For Large Waveguide Slot Arrays

Analytical Evaluation Framework

IREAP. MURI 2001 Review. John Rodgers, T. M. Firestone,V. L. Granatstein, M. Walter

Remote Sediment Property From Chirp Data Collected During ASIAEX

Noise Tolerance of Improved Max-min Scanning Method for Phase Determination

Active Denial Array. Directed Energy. Technology, Modeling, and Assessment

PSEUDO-RANDOM CODE CORRELATOR TIMING ERRORS DUE TO MULTIPLE REFLECTIONS IN TRANSMISSION LINES

Department of Defense Partners in Flight

USAARL NUH-60FS Acoustic Characterization

Marine~4 Pbscl~ PHYS(O laboratory -Ip ISUt

Validated Antenna Models for Standard Gain Horn Antennas

Development of a charged-particle accumulator using an RF confinement method FA

Technology Maturation Planning for the Autonomous Approach and Landing Capability (AALC) Program

U.S. Army Training and Doctrine Command (TRADOC) Virtual World Project

PULSED BREAKDOWN CHARACTERISTICS OF HELIUM IN PARTIAL VACUUM IN KHZ RANGE

Satellite Observations of Nonlinear Internal Waves and Surface Signatures in the South China Sea

14. Model Based Systems Engineering: Issues of application to Soft Systems

Wavelength Division Multiplexing (WDM) Technology for Naval Air Applications

Bistatic Underwater Optical Imaging Using AUVs

AUVFEST 05 Quick Look Report of NPS Activities

3. Faster, Better, Cheaper The Fallacy of MBSE?

Frequency Stabilization Using Matched Fabry-Perots as References

Reduced Power Laser Designation Systems

Rump Session: Advanced Silicon Technology Foundry Access Options for DoD Research. Prof. Ken Shepard. Columbia University

Hybrid QR Factorization Algorithm for High Performance Computing Architectures. Peter Vouras Naval Research Laboratory Radar Division

Measurement of Ocean Spatial Coherence by Spaceborne Synthetic Aperture Radar

Solar Radar Experiments

SA Joint USN/USMC Spectrum Conference. Gerry Fitzgerald. Organization: G036 Project: 0710V250-A1

Summary: Phase III Urban Acoustics Data

High Speed Machining of IN100. Final Report. Florida Turbine Technology (FTT) Jupiter, FL

DARPA TRUST in IC s Effort. Dr. Dean Collins Deputy Director, MTO 7 March 2007

FLASH X-RAY (FXR) ACCELERATOR OPTIMIZATION BEAM-INDUCED VOLTAGE SIMULATION AND TDR MEASUREMENTS *

REPORT DOCUMENTATION PAGE

Cross-layer Approach to Low Energy Wireless Ad Hoc Networks

Tracking Moving Ground Targets from Airborne SAR via Keystoning and Multiple Phase Center Interferometry

Simulation Comparisons of Three Different Meander Line Dipoles

Buttress Thread Machining Technical Report Summary Final Report Raytheon Missile Systems Company NCDMM Project # NP MAY 12, 2006

Target Behavioral Response Laboratory

POSTPRINT UNITED STATES AIR FORCE RESEARCH ON AIRFIELD PAVEMENT REPAIRS USING PRECAST PORTLAND CEMENT CONCRETE (PCC) SLABS (BRIEFING SLIDES)

REPORT DOCUMENTATION PAGE

MATLAB Algorithms for Rapid Detection and Embedding of Palindrome and Emordnilap Electronic Watermarks in Simulated Chemical and Biological Image Data

GLOBAL POSITIONING SYSTEM SHIPBORNE REFERENCE SYSTEM

EFFECTS OF ELECTROMAGNETIC PULSES ON A MULTILAYERED SYSTEM

MINIATURIZED ANTENNAS FOR COMPACT SOLDIER COMBAT SYSTEMS

Management of Toxic Materials in DoD: The Emerging Contaminants Program

NEURAL NETWORKS IN ANTENNA ENGINEERING BEYOND BLACK-BOX MODELING

Fall 2014 SEI Research Review Aligning Acquisition Strategy and Software Architecture

Best Practices for Technology Transition. Technology Maturity Conference September 12, 2007

Thermal Simulation of a Silicon Carbide (SiC) Insulated-Gate Bipolar Transistor (IGBT) in Continuous Switching Mode

Final Progress Report for Award FA Project: Trace Effect Analysis for Software Security PI: Dr. Christian Skalka The University of

REPORT DOCUMENTATION PAGE

DIELECTRIC ROTMAN LENS ALTERNATIVES FOR BROADBAND MULTIPLE BEAM ANTENNAS IN MULTI-FUNCTION RF APPLICATIONS. O. Kilic U.S. Army Research Laboratory

RECENT TIMING ACTIVITIES AT THE U.S. NAVAL RESEARCH LABORATORY

Transcription:

THE DET CURVE IN ASSESSMENT OF DETECTION TASK PERFORMANCE A. Martin*, G. Doddington#, T. Kamm+, M. Ordowski+, M. Przybocki* *National Institute of Standards and Technology, Bldg. 225-Rm. A216, Gaithersburg, MD 20899, USA #SRI International/Department of Defense, 1566 Forest Villa Lane, McLean, VA 22101, USA +Department of Defense, Ft. Meade, MD 20755, USA ABSTRACT We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognition and language recognition. We explain why it is likely to produce approximately linear curves. We also note special points that may be included on these curves, how they are used with multiple targets, and possible further applications. plot and how better spread out they are permitting easy observation of system contrasts. INTRODUCTION Detection tasks can be viewed as involving a tradeoff between two error types: missed detections and false alarms. An example of a speech processing task is to recognize the person who is speaking, or to recognize the language being spoken. A recognition system may fail to detect a target speaker or language known to the system, or it may declare such a detection when the target is not present. When there is a tradeoff of error types, a single performance number is inadequate to represent the capabilities of a system. Such a system has many operating points, and is best represented by a performance curve. The ROC Curve traditionally has been used for this purpose. Here ROC has been taken to denote either the Receiver Operating Characteristic [2,3,4] or alternatively, the Relative Operating Characteristic [1]. Generally, false alarm rate is plotted on the horizontal axis, while correct detection rate is plotted on the vertical. We have found it useful in speech applications to use a variant of this which we call the DET (Detection Error Tradeoff) Curve, described below. In the DET curve we plot error rates on both axes, giving uniform treatment to both types of error, and use a scale for both axes which spreads out the plot and better distinguishes different well performing systems and usually produces plots that are close to linear. Figure 1 gives an example of DET curves, while Figure 2 contrasts this with traditional ROC type curves for the same data. Note the near linearity of the curves in the DET Figure 1: Plot of DET Curves for a speaker recognition evaluation. GENERAL EVALUATION PROTOCOL Our evaluations of speech processing systems are comparable to fundamental detection tasks. Participants are given a set of known targets (speakers or languages) for which their systems have trained models and a set of unknown speech segments. During the evaluation the speech processing system must determine whether or not the unknown segment is one of the known targets. The system output is a likelihood that the segment is an instance of the target. The scale of the likelihood is arbitrary, but should be consistent across all decisions, with larger values indicating greater likelihood of being a target. These likelihoods are used to generate the performance curve displaying the range of possible operating characteristics. Figure 2 shows a traditional ROC curve for a NIST coordinated speaker recognition evaluation task. The abscissa axis shows the false alarm rate while the ordinate axis shows the detection rate on linear scales. The optimal point is at the upper left of the plot, and the curves of well performing systems tend to bunch together near this corner.

Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 1997 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE The Det Curve In Assessment Of Detection Task Performance 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) National Institute of Standards and Technology (NIST) Gaithersburg, MD 20899 8940 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR S ACRONYM(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release, distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 11. SPONSOR/MONITOR S REPORT NUMBER(S) 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU a. REPORT b. ABSTRACT c. THIS PAGE 18. NUMBER OF PAGES 4 19a. NAME OF RESPONSIBLE PERSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

(In the figures we omit the keys identifying the individual systems.) deviates that correspond to the probabilities. This is displayed in Figure 4. In figure 4, we show probabilities on the bottom and left, and standard deviations on the top and right. The standard deviations are omitted from subsequent plots. Figure 2: Plot of ROC Curves for the same evaluation data as in Figure 1. NORMAL DEVIATE SCALE Let us suppose that the likelihood distributions for nontargets and targets are both normally distributed with respective means u0 and u1. This is illustrated in Figure 3, where the variances of the distributions are taken to be equal. Figure 3: Normal Distributions. The choice of an operating point c is shown by a bold line, and the two error types are represented by the areas of the shaded regions. Now suppose that when we go to plot the miss versus the false alarm probabilities, rather than plotting the probabilities themselves, we plot instead the normal Figure 4: Normal Deviate Scale. Note that the linearity of the plot is a result of the assumed normality of the likelihood distributions. The unit slope is a consequence of the equal variances. Also note that on the diagonal scale indicated we have d = u1 u0 DET EXAMPLES Figure 1 is a presentation of the DET curve for the same data as Figure 2. Note that the use of the normal deviate scale moves the curves away from the lower left when performance is high, making comparisons easier. We also see, as we typically do, that the resulting curves are approximately straight lines, corresponding to normal likelihood distributions, for at least a wide portion of their range. There are two items to note about the DET curve. First, if the resulting curves are straight lines, then this provides a visual confirmation that the underlying likelihood distributions from the system are normal. Second, the diagonal y = -x on the normal deviate scale represents random performance. If performance is reasonably good, we limit the curves to the lower left quadrant, as in Figure 1. We also somewhat arbitrarily limit the error rates plotted to 0.05%, or a bit over three standard deviations. Figure 5 shows another set of typical DET Curves, in this case for a language recognition task. Once again, by visual

inspection of the DET curve we can verify that the underlying likelihood distributions are close to normal. implementers chose the hard decision operating points to optimize the chosen cost function. AVERAGING ACROSS TARGETS The DET curves presented all involved multiple targets, and required systems to provide likelihood scores on the same scale for all targets. For some applications, requiring a common scale may be considered undesirable. Furthermore, if all targets do not occur with about equal frequency, it is arguable that combining data from multiple targets may present a misleading indication of performance. The alternative is to generate separate curves for each target, and then generate an average curve across targets from these. Figure 5: Plot of DET Curves from a language recognition evaluation. Further examples of speaker recognition and language recognition DET Curves may be viewed at the NIST web site [7 ]. SPECIAL POINTS A number of special points may be included on the DET curve. These points are not limited to speech processing tasks and can be applied to the fundamental detection task. For example, it may be of interest to designate points corresponding to a fixed false alarm rate or fixed missed detection rate, perhaps a performance objective for an evaluation. The grid lines on the example curves may be viewed this way. Confidence intervals, or a confidence box, around such points may also be included. A weighted average of the missed detection and false alarm rates may be used as a kind of figure of merit or cost function. The point on the DET Curve where such an average is minimized may be indicated. In Figures 1 and 5, these points are indicated by s. (The error type weighting in figure 1 is 10:1. This corresponds to a cost of 10 for a missed detection and a cost of 1 for a false alarm. In figure 5, the error type weighting is 1:1.) In our evaluations, the speech processing systems must also provide a hard yes or no decision as well as a likelihood score for each decision. The operating points of the hard decisions may be indicated on the curves. These are designated by * s in Figures 1 and 5. The proximity of these points to the weighted average points described above is an indication of how appropriately the system If the same non-targets are used with each target, then the ordinate values may be averaged for each abscissa value. This situation will not hold, however, if each target example also serves as a non-target example for each of the other targets. In this case, interpolation may be used to obtain a common set of abscissa values for the individual target curves which may then be averaged. We prefer, however, to combine data from multiple targets directly. This requires systems to develop a common likelihood scale for all targets, which we believe desirable for many applications. We believe that with a large number of targets and roughly equal occurrences of all targets overall performance is effectively represented. OTHER APPLICATIONS The DET curve form of presentation is relevant to any detection task where a tradeoff of error types is involved. In previous years we have coordinated keyword and topic spotting evaluations involving such tasks. We have also used the DET curve concept in large vocabulary speech recognition tasks where participants are asked to rate their confidence in the correctness of the words they hypothesize. A DET curve then shows the tradeoffs obtainable in the partial transcripts that result from setting thresholds on the confidence required to include hypothesized words. Figure 6 shows an example. Since performance at this task is poor at this point, all four quadrants are included in the curves. CONCLUSION The DET Curve has distinct advantages over the standard ROC type curve for presenting performance results where tradeoffs of two error types are involved. We have made it our standard way of presenting performance results of speaker and language recognition evaluations.

Figure 6: Plot of DET Curves from confidence scores in a large vocabulary speech recognition evaluation. REFERENCES [1] Swets, John A, The Relative Operating Characteristic in Psychology, Science, Vol. 182, pp. 990-1000 [2] Swets, John A, ed., Signal Detection and Recognition by Human Observers, John Wiley & Sons, Inc., pp. 611-648, 1964 [3] Green, David M. and Swets, John A., Signal Detection Theory and Psychophysics, John Wiley and Sons, Inc., 1966 [4] Egan, James P., Signal Detection Theory and ROC Analysis, Academic Press, 1975 [5] Speaker Recognition Workshop Notebook, Linthicum, MD, March 1996, unpublished [6] Language Recognition Workshop Notebook, Linthicum, MD, June 1996, unpublished [7] NIST web site - http://www.nist.gov/speech/