Architecture. 2. Implementation was a European project. of V2X Security Subsystem. 3. Preparing Secure Vehicle-to-X

Similar documents
How cryptographic benchmarking goes wrong. Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance.

How fast is cryptography? D. J. Bernstein University of Illinois at Chicago

High Speed ECC Implementation on FPGA over GF(2 m )

SLIDE: Evaluation of a Formalized Encryption Library for Safety- Critical Embedded Systems

REAL TIME DIGITAL SIGNAL PROCESSING. Introduction

AREA AND DELAY EFFICIENT DESIGN FOR PARALLEL PREFIX FINITE FIELD MULTIPLIER

High-Speed RSA Crypto-Processor with Radix-4 4 Modular Multiplication and Chinese Remainder Theorem

Data Acquisition & Computer Control

A Blueprint for Civil GPS Navigation Message Authentication

Przesłuchania do chóru "Muzyka zespołu Queen symfonicznie" TENOR

Measuring and Evaluating Computer System Performance

Les s on Objectives. Student Files Us ed. Student Files Crea ted

SM 15K - Series 15kW DC POWER SUP PLIES

1) Fixed point [15 points] a) What are the primary reasons we might use fixed point rather than floating point? [2]

CS4617 Computer Architecture

Overview of Design Methodology. A Few Points Before We Start 11/4/2012. All About Handling The Complexity. Lecture 1. Put things into perspective

J œ. œ œ. œ œ. œ J œ œ.

Assessing and. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

6RZ. > <8ms and <30ms performance versions > Single and dual element versions > Draw out case > Flush panel or rack mount > Made in Australia

CESEL: Flexible Crypto Acceleration. Kevin Kiningham Dan Boneh, Mark Horowitz, Philip Levis

Communication Networks. Braunschweiger Verkehrskolloquium

AN EFFICIENT APPROACH TO MINIMIZE POWER AND AREA IN CARRY SELECT ADDER USING BINARY TO EXCESS ONE CONVERTER

SmartSoft MDSD Toolchain 7 May 2010 / SDIR V - Anchorage

SM se ries watts DC POWER SUP PLIES

SX - se ries 75 SX 5

(12) United States Patent (10) Patent No.: US 6,208,104 B1

SM 15K - Series 15kW DC POWER SUP PLIES

Ra ti omet ric Lin ear Hall Ef fect Sen sor

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Wi-Fi Performances: Under the Hood of Wireless Clients Jerome Henry, Technical Cisco Systems. IT Professional Wi-Fi Trek 2015 #wifitrek

SX - se ries 150 SX 15-15

2 Assoc Prof, Dept of ECE, George Institute of Engineering & Technology, Markapur, AP, India,

How to sing An Fold-In Round:

Lessons Learned from Designing a 65 nm ASIC for Third Round SHA-3 Candidates

SOFTWARE IMPLEMENTATION OF THE

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1

Seaml ess Val ue and I nt egr at i on: Sony Medi a Backbone Conduct or and Di gi t al Backbone. Apr i l 2011, v2

Instruction Scheduling for Low Power Dissipation in High Performance Microprocessors

American Physical Society March Meeting 2016 Sing-along / Listen-along! Sheet music

Horizontal DEMA Attack as the Criterion to Select the Best Suitable EM Probe

Design of High-Performance Intra Prediction Circuit for H.264 Video Decoder

SM 15K - Series 15kW DC POWER SUP PLIES

Low Power Embedded Systems in Bioimplants

Overview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture

(12) Patent Application Publication (10) Pub. No.: US 2015/ A1

DESIGEL 7.757, Š7. bšikškkšešeš-6. United States Patent (19) Schriber et al. 35ESFSSS 4,155,027 S3, S2, S3% - 7S s2 2s2 s2. May 15, 1979 S2S2Š2S2S2

Interconnect-Power Dissipation in a Microprocessor

Perspective platforms for BOINC distributed computing network

How Public Key Cryptography Influences Wireless Sensor Node Lifetime

CMOS Process Variations: A Critical Operation Point Hypothesis

2.7 B R C 7 E 6 18 / B R C 7 E 6 19 / 2.8 B R C 7 E 5 3 0W / B R C 7 E 5 3 1W

RSU-101E Specifica on

Lif e Cycle an d Du r abilit y Test in g f or M edical Devices

Keysight p WAVE (wireless access in vehicular environments)

PerSec. Pervasive Computing and Security Lab. Enabling Transportation Safety Services Using Mobile Devices

EE251: Tuesday October 10

Lessons Learned from Designing a 65 nm ASIC for Third Round SHA-3 Candidates

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Design and Implementation of High Speed Carry Select Adder

Real-Time Systems Hermann Härtig Introduction

SM Series 3300 W DC POWER SUP PLIES

lb / 1b / 2%: 512 /516 52o (54) (75) (DK) (73) Neubiberg (DE) (DK); Peter Bundgaard, Aalborg (21) Appl. No.: 12/206,567 In?neon Technologies AG,

ADVANCED EMBEDDED MONITORING SYSTEM FOR ELECTROMAGNETIC RADIATION

A new serial/parallel architecture for a low power modular multiplier*

Overview. The Big Picture... CSC 580 Cryptography and Computer Security. January 25, Math Basics for Cryptography

Lecture 1: Introduction to Digital System Design & Co-Design

Minimum key length for cryptographic security

Datorstödd Elektronikkonstruktion

Overview of Information Barrier Concepts

From Single to Formation Flying CubeSats: An Update of the Delfi Programme

SM Series 3300 W DC POWER SUP PLIES

United States Patent (19) Price, Jr.

Final Report: DBmbench

4.4 Implementation Structures in FPGAs and DSPs. Presented by Lee Pucker President, ForwardLink Consulting

ELLIPTIC curve cryptography (ECC) was proposed by

Ben Baker. Sponsored by:

Formal Hardware Verification: Theory Meets Practice

MASS OF SAINT ISAAC JOGUES

SM Series 3300 W DC POWER SUP PLIES

APLUS INTEGRATED CIRCUITS INC. APR9301- V2

REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.

73 ASSignee: Iside The logies. Saint Clement les 5,420,412 5/1995 Kowalski /492

(12) Patent Application Publication (10) Pub. No.: US 2006/ A1

United States Patent (19)

DATA ENCODING TECHNIQUES FOR LOW POWER CONSUMPTION IN NETWORK-ON-CHIP

(12) Patent Application Publication (10) Pub. No.: US 2016/ A1

Outline Simulators and such. What defines a simulator? What about emulation?

Junior Entrance and Scholarship Examination 2013 First Form Entry. Mathematics. Time Allowed: 1 hour

Robert L. vienneau A REVIEW OF FORMAL METHODS

William Milam Ford Motor Co

El Segundo, Calif. (21) Appl. No.: 321,490 (22 Filed: Mar. 9, ) Int, Cl."... H03B5/04; H03B 5/32 52 U.S. Cl /158; 331/10; 331/175

United States Patent (19) 11) 4,163,947

Power = 36² mod 99 Power = 9 5 a 5 = 0 x = 81 Power = 9² mod 99 Power = 81 6 a 6 = 1 x = 81 x 81 mod 99 x = 27 7 a 7 = 1 x = 27 x 27 mod 99 x = 36

VLSI System Testing. Outline

(12) United States Patent

DTP4700 Next Generation Software Defined Radio Platform

Dust Bunny Rag SAMPLE. ting. no your. and. get. temp. tled nies. set bun. time you came clean. Stop that ca - vort - ing where you can be seen.

MITOCW watch?v=3v5von-onug

Reconfigurable Hardware Implementation and Analysis of Mesh Routing for the Matrix Step of the Number Field Sieve Factorization

Da ta Drive n Op e ra tions in Equinor. Sidsel Godal

Transcription:

How cryptographic benchmarking 1 About PRESERVE : The 2 goes wrong mission of PRESERVE is, Daniel J. Bernstein Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security PRESERVE, ending 2015.06.30, Architecture. 2. Implementation was a European project of V2X Security Subsystem. 3. Preparing Secure Vehicle-to-X Cheap and scalable security ASIC Communication Systems. for V2X. 4. Testing results VSS Project cost: 5383431 EUR, including 3850000 EUR from the European Commission. under realistic conditions. 5. Research results for deployment challenges.

ptographic benchmarking 1 About PRESERVE : The 2 Cars alre ng mission of PRESERVE is, Why bui. Bernstein o NIST 60NANB12D261 ng this work, and for not these slides in advance. VE, ending 2015.06.30, ropean project ng Secure Vehicle-to-X ication Systems. ost: 5383431 EUR, 3850000 EUR from pean Commission. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESER Security Security Process second a ms can h hardware a Pentiu needs ab a verifica cryptogr likely to

benchmarking 1 About PRESERVE : The 2 Cars already includ mission of PRESERVE is, Why build an ASIC 0NANB12D261 rk, and for not es in advance. g 2015.06.30, oject Vehicle-to-X stems. 431 EUR, EUR from mission. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE delive Security Requirem Security Architectu Processing 1,000 second and proces ms can hardly be m hardware. As discu a Pentium D 3.4 G needs about 5 tim a verification : : : a cryptographic co-p likely to be necessa

rking 1 About PRESERVE : The 2 Cars already include many C mission of PRESERVE is, Why build an ASIC? D261 r not ance..30, -X to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE deliverable 1.1, Security Requirements of V Security Architecture, 2011 Processing 1,000 packets p second and processing each ms can hardly be met by cur hardware. As discussed in [3 a Pentium D 3.4 GHz proces needs about 5 times as long a verification : : : a dedicated cryptographic co-processor is likely to be necessary.

About PRESERVE : The 2 Cars already include many CPUs. 3 mission of PRESERVE is, Why build an ASIC? to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.

RESERVE : The 2 Cars already include many CPUs. 3 PRESER f PRESERVE is, Why build an ASIC? Deploym, implement, and cure and scalable urity Subsystem for deployment scenarios. ected Results:] 1. zed V2X Security ture. 2. Implementation ecurity Subsystem. 3. d scalable security ASIC 4. Testing results VSS alistic conditions. 5. results for deployment s. PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 201 ECC sign second is factor fo environm 4mm 4m technolo space for 90nm wi cores and more. F max 100

E : The 2 Cars already include many CPUs. 3 PRESERVE delive VE is, Why build an ASIC? Deployment Issue nt, and calable ystem for t scenarios. lts:] 1. ecurity plementation bsystem. 3. security ASIC g results VSS ditions. 5. r deployment PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the n ECC signature veri second is the key p factor for ASICs in environment : : : [O 4mm 4mm chip] technology may on space for one ECC 90nm will allow fo cores and 55nm w more. For 180nm max 100MHz, 100

2 Cars already include many CPUs. 3 PRESERVE deliverable 5.4, Why build an ASIC? Deployment Issues Report s. tion 3. ASIC SS ent PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the number of ECC signature verifications p second is the key performanc factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield en space for one ECC core, whe 90nm will allow for up to ten cores and 55nm will allow fo more. For 180nm core says max 100MHz, 100 verif/seco

Cars already include many CPUs. 3 PRESERVE deliverable 5.4, 4 Why build an ASIC? Deployment Issues Report PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second.

ady include many CPUs. 3 PRESERVE deliverable 5.4, 4 Compare ld an ASIC? Deployment Issues Report IAIK NIS VE deliverable 1.1, Requirements of Vehicle Architecture, 2011: ing 1,000 packets per nd processing each in 1 ardly be met by current. As discussed in [32], m D 3.4 GHz processor out 5 times as long for tion : : : a dedicated aphic co-processor is be necessary. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. 858 scala in 11162 at 180nm technolo standard 9.3744 condition core volt Signatur somewha Still clos than the

e many CPUs. 3 PRESERVE deliverable 5.4, 4 Compare to, e.g.,? Deployment Issues Report IAIK NIST P-256 rable 1.1, ents of Vehicle re, 2011: packets per sing each in 1 et by current ssed in [32], Hz processor es as long for dedicated rocessor is ry. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. 858 scalarmult/sec in 111620 GE at 1 at 180nm ( UMC technology using F standard cell librar 9.3744 m 2 /GE; w conditions (temper core voltage 1.62V Signature verificat somewhat slower t Still close to 100 than the PRESERV

PUs. 3 PRESERVE deliverable 5.4, 4 Compare to, e.g., Deployment Issues Report IAIK NIST P-256 ECC Mod V4, 2016: the number of 858 scalarmult/second ehicle : er in 1 rent 2], sor for ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. in 111620 GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f18 standard cell library (FSA0A 9.3744 m 2 /GE; worst case conditions (temperature 125 core voltage 1.62V) ). Signature verification will be somewhat slower than scalar Still close to 100 more effi than the PRESERVE estima

PRESERVE deliverable 5.4, 4 Compare to, e.g., 5 Deployment Issues Report IAIK NIST P-256 ECC Module: V4, 2016: the number of 858 scalarmult/second ECC signature verifications per in 111620 GE at 192 MHz second is the key performance at 180nm ( UMC L180GII factor for ASICs in a C2C technology using Faraday f180 environment : : : [On a standard cell library (FSA0A C), 4mm 4mm chip] the 180nm 9.3744 m 2 /GE; worst case technology may only yield enough conditions (temperature 125 C, space for one ECC core, whereas core voltage 1.62V) ). 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates.

VE deliverable 5.4, 4 Compare to, e.g., 5 Let s go ent Issues Report IAIK NIST P-256 ECC Module: core argu 6: the number of ature verifications per the key performance r ASICs in a C2C ent : : : [On a m chip] the 180nm 858 scalarmult/second in 111620 GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 m 2 /GE; worst case Central c in [32], a processo (i.e., 17 for signa gy may only yield enough conditions (temperature 125 C, [32] is P one ECC core, whereas core voltage 1.62V) ). Z., Ana ll allow for up to ten ECC 55nm will allow for even or 180nm core says MHz, 100 verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead Third Jo Mobile N (WMNC

rable 5.4, 4 Compare to, e.g., 5 Let s go back to P s Report IAIK NIST P-256 ECC Module: core argument for umber of fications per erformance a C2C n a the 180nm 858 scalarmult/second in 111620 GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 m 2 /GE; worst case Central claim: As in [32], a Pentium processor needs ab (i.e., 17 million CP for signature verifi ly yield enough conditions (temperature 125 C, [32] is Petit, J., M core, whereas core voltage 1.62V) ). Z., Analysis of au r up to ten ECC ill allow for even core says verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicu Third Joint IFIP W Mobile Networking (WMNC), 2010.

4 Compare to, e.g., 5 Let s go back to PRESERVE IAIK NIST P-256 ECC Module: core argument for an ASIC. er e 858 scalarmult/second in 111620 GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GH processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. ough conditions (temperature 125 C, [32] is Petit, J., Mammeri, reas core voltage 1.62V) ). Z., Analysis of authenticatio ECC r even nd. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicular networ Third Joint IFIP Wireless an Mobile Networking Conferen (WMNC), 2010.

Compare to, e.g., 5 Let s go back to PRESERVE s 6 IAIK NIST P-256 ECC Module: core argument for an ASIC. 858 scalarmult/second in 111620 GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), 9.3744 m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. conditions (temperature 125 C, [32] is Petit, J., Mammeri, core voltage 1.62V) ). Z., Analysis of authentication Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.

to, e.g., 5 Let s go back to PRESERVE s 6 [32] says T P-256 ECC Module: core argument for an ASIC. to the hu rmult/second 0 GE at 192 MHz ( UMC L180GII gy using Faraday f180 cell library (FSA0A C), m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economi from veh governm compani have ma vehicular s (temperature 125 C, [32] is Petit, J., Mammeri, [1]. On age 1.62V) ). Z., Analysis of authentication collisions e verification will be t slower than scalarmult. e to 100 more efficient PRESERVE estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010. and 7900 United S economi [2]. : : : [ costing e

5 Let s go back to PRESERVE s 6 [32] says 1. Intro ECC Module: core argument for an ASIC. to the huge life los ond 92 MHz L180GII araday f180 y (FSA0A C), orst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts from vehicular coll governments, auto companies, and ind have made the red vehicular fatalities ature 125 C, [32] is Petit, J., Mammeri, [1]. On average, v ) ). Z., Analysis of authentication collisions cause 10 ion will be han scalarmult. more efficient E estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010. and 7900 injuries d United States, leav economic impact o [2]. : : : [Similar st costing e160 billio

5 Let s go back to PRESERVE s 6 [32] says 1. Introduction. D ule: core argument for an ASIC. to the huge life losses and th 0 C), Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts resulting from vehicular collisions, ma governments, automotive companies, and industry con have made the reduction of vehicular fatalities a top prio C, [32] is Petit, J., Mammeri, [1]. On average, vehicular Z., Analysis of authentication collisions cause 102 deaths overhead in vehicular networks, and 7900 injuries daily in the mult. cient tes. Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010. United States, leaving an economic impact of $230 bil [2]. : : : [Similar story for EU costing e160 billion annually

Let s go back to PRESERVE s 6 [32] says 1. Introduction. Due 7 core argument for an ASIC. to the huge life losses and the Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [32] is Petit, J., Mammeri, [1]. On average, vehicular Z., Analysis of authentication collisions cause 102 deaths overhead in vehicular networks, and 7900 injuries daily in the Third Joint IFIP Wireless and United States, leaving an Mobile Networking Conference economic impact of $230 billion (WMNC), 2010. [2]. : : : [Similar story for EU:] costing e160 billion annually [3].

back to PRESERVE s 6 [32] says 1. Introduction. Due 7 Vehicles ment for an ASIC. to the huge life losses and the informat laim: As discussed Pentium D 3.4 GHz r needs about 5ms million CPU cycles) ture verification. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority of IEEE1 support Signatur [8] over P-224 an paper, w etit, J., Mammeri, [1]. On average, vehicular and com lysis of authentication collisions cause 102 deaths the auth in vehicular networks, and 7900 injuries daily in the provided int IFIP Wireless and United States, leaving an II. Signa etworking Conference economic impact of $230 billion verificati ), 2010. [2]. : : : [Similar story for EU:] D 3.4Gh costing e160 billion annually [3].

RESERVE s 6 [32] says 1. Introduction. Due 7 Vehicles will comm an ASIC. to the huge life losses and the information. All i discussed D 3.4 GHz out 5ms U cycles) cation. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority of IEEE1609.2 stan support the Elliptic Signature Algorith [8] over the two N P-224 and P-256. paper, we assess th ammeri, [1]. On average, vehicular and communicatio thentication collisions cause 102 deaths the authentication lar networks, and 7900 injuries daily in the provided by ECDS ireless and United States, leaving an II. Signature gener Conference economic impact of $230 billion verification times o [2]. : : : [Similar story for EU:] D 3.4Ghz workstat costing e160 billion annually [3].

s 6 [32] says 1. Introduction. Due 7 Vehicles will communicate sa to the huge life losses and the information. All implement economic impacts resulting of IEEE1609.2 standard [7] s z from vehicular collisions, many governments, automotive support the Elliptic Curve D Signature Algorithm (ECDSA companies, and industry consortia [8] over the two NIST curve have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the process [1]. On average, vehicular and communication overhead n collisions cause 102 deaths the authentication mechanis ks, and 7900 injuries daily in the provided by ECDSA. : : : Tab d United States, leaving an II. Signature generation and ce economic impact of $230 billion verification times on a Penti [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10] costing e160 billion annually [3].

[32] says 1. Introduction. Due 7 Vehicles will communicate safety 8 to the huge life losses and the information. All implementations economic impacts resulting of IEEE1609.2 standard [7] shall from vehicular collisions, many support the Elliptic Curve Digital governments, automotive Signature Algorithm (ECDSA) companies, and industry consortia [8] over the two NIST curves have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the processing [1]. On average, vehicular and communication overhead of collisions cause 102 deaths the authentication mechanism and 7900 injuries daily in the provided by ECDSA. : : : Table United States, leaving an II. Signature generation and economic impact of $230 billion verification times on a Pentium [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10] costing e160 billion annually [3].

1. Introduction. Due 7 Vehicles will communicate safety 8 [10] (in [ ge life losses and the information. All implementations J., Anal c impacts resulting of IEEE1609.2 standard [7] shall Authenti icular collisions, many support the Elliptic Curve Digital VANETs ents, automotive Signature Algorithm (ECDSA) Conferen es, and industry consortia [8] over the two NIST curves Mobility de the reduction of P-224 and P-256. : : : In this Cairo, D fatalities a top priority average, vehicular cause 102 deaths injuries daily in the tates, leaving an c impact of $230 billion Similar story for EU:] 160 billion annually [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says impleme and follo For NIST Pentium 2.50ms/ 4.97ms/

duction. Due 7 Vehicles will communicate safety 8 [10] (in [32]) is P ses and the information. All implementations J., Analysis of EC resulting of IEEE1609.2 standard [7] shall Authentication Pro isions, many support the Elliptic Curve Digital VANETs, 3rd IFIP motive Signature Algorithm (ECDSA) Conference on New ustry consortia [8] over the two NIST curves Mobility and Secur uction of P-224 and P-256. : : : In this Cairo, December 2 a top priority ehicular 2 deaths aily in the ing an f $230 billion ory for EU:] n annually [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA implemented using and following the F For NIST P-224/P Pentium D 3.4GH 2.50ms/3.33ms to 4.97ms/6.63ms to

ue 7 Vehicles will communicate safety 8 [10] (in [32]) is Petit e information. All implementations J., Analysis of ECDSA of IEEE1609.2 standard [7] shall Authentication Processing in ny support the Elliptic Curve Digital VANETs, 3rd IFIP Internati Signature Algorithm (ECDSA) Conference on New Technolo sortia [8] over the two NIST curves Mobility and Security (NTM P-224 and P-256. : : : In this Cairo, December 2009. rity lion :] [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz worksta 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

Vehicles will communicate safety 8 [10] (in [32]) is Petit 9 information. All implementations J., Analysis of ECDSA of IEEE1609.2 standard [7] shall Authentication Processing in support the Elliptic Curve Digital VANETs, 3rd IFIP International Signature Algorithm (ECDSA) Conference on New Technologies, [8] over the two NIST curves Mobility and Security (NTMS), P-224 and P-256. : : : In this Cairo, December 2009. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

will communicate safety 8 [10] (in [32]) is Petit 9 Compare ion. All implementations J., Analysis of ECDSA speeds re 609.2 standard [7] shall Authentication Processing in of 14nm the Elliptic Curve Digital VANETs, 3rd IFIP International ( 2015 I e Algorithm (ECDSA) Conference on New Technologies, https:/ the two NIST curves d P-256. : : : In this e assess the processing munication overhead of entication mechanism by ECDSA. : : : Table ture generation and on times on a Pentium z workstation [10] Mobility and Security (NTMS), Cairo, December 2009. [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify. 0.015ms 0.049ms

unicate safety 8 [10] (in [32]) is Petit 9 Compare to, e.g., mplementations J., Analysis of ECDSA speeds reported fo dard [7] shall Authentication Processing in of 14nm 3.31GHz Curve Digital VANETs, 3rd IFIP International ( 2015 Intel Core m (ECDSA) Conference on New Technologies, https://bench.c IST curves : : : In this e processing n overhead of mechanism A. : : : Table ation and n a Pentium ion [10] Mobility and Security (NTMS), Cairo, December 2009. [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify. 0.015ms to sign (4 0.049ms to verify (

fety 8 [10] (in [32]) is Petit 9 Compare to, e.g., Ed25519 ations J., Analysis of ECDSA speeds reported for single co hall Authentication Processing in of 14nm 3.31GHz Skylake igital VANETs, 3rd IFIP International ( 2015 Intel Core i5-6600 ) ) Conference on New Technologies, https://bench.cr.yp.to: s ing of m le Mobility and Security (NTMS), Cairo, December 2009. [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on 0.015ms to sign (49840 cycl 0.049ms to verify (163206 c um Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

[10] (in [32]) is Petit 9 Compare to, e.g., Ed25519 10 J., Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs, 3rd IFIP International ( 2015 Intel Core i5-6600 ) on Conference on New Technologies, https://bench.cr.yp.to: Mobility and Security (NTMS), Cairo, December 2009. 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

[10] (in [32]) is Petit 9 Compare to, e.g., Ed25519 10 J., Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs, 3rd IFIP International ( 2015 Intel Core i5-6600 ) on Conference on New Technologies, https://bench.cr.yp.to: Mobility and Security (NTMS), Cairo, December 2009. 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : This chip didn t exist in 2009. Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). 2.50ms/3.33ms to sign, 0.065ms to sign (156843 cycles), 4.97ms/6.63ms to verify. 0.232ms to verify (557082 cycles).

32]) is Petit 9 Compare to, e.g., Ed25519 10 2012 Be ysis of ECDSA speeds reported for single core on 720M cation Processing in of 14nm 3.31GHz Skylake 0.9ms to, 3rd IFIP International ce on New Technologies, and Security (NTMS), ecember 2009. ( 2015 Intel Core i5-6600 ) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). ARM Co 1000MH in ipad 1 1000MH ECDSA was nted using MIRACL wing the Fig.1. P-224/P-256 on D 3.4GHz workstation : This chip didn t exist in 2009. Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsu 1000MH Motorola 800MHz Amazon 3.33ms to sign, 0.065ms to sign (156843 cycles), Today: i 6.63ms to verify. 0.232ms to verify (557082 cycles). Cortex-A

etit 9 Compare to, e.g., Ed25519 10 2012 Bernstein Sc DSA speeds reported for single core on 720MHz ARM cessing in of 14nm 3.31GHz Skylake 0.9ms to verify (65 International Technologies, ity (NTMS), 009. ( 2015 Intel Core i5-6600 ) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), 0.049ms to verify (163206 cycles). ARM Cortex-A8 co 1000MHz Apple A in ipad 1, iphone 4 1000MHz Samsun was MIRACL ig.1. -256 on z workstation : This chip didn t exist in 2009. Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy 1000MHz TI OMA Motorola Droid X 800MHz Freescale Amazon Kindle 4 ( sign, 0.065ms to sign (156843 cycles), Today: in CPUs co verify. 0.232ms to verify (557082 cycles). Cortex-A7 is even

9 Compare to, e.g., Ed25519 10 2012 Bernstein Schwabe speeds reported for single core on 720MHz ARM Cortex-A8 of 14nm 3.31GHz Skylake 0.9ms to verify (650102 cycl onal gies, S), ( 2015 Intel Core i5-6600 ) on https://bench.cr.yp.to: 0.015ms to sign (49840 cycles), ARM Cortex-A8 cores were i 1000MHz Apple A4 in ipad 1, iphone 4 (2010); 0.049ms to verify (163206 cycles). 1000MHz Samsung Exynos 3 tion : This chip didn t exist in 2009. Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy S (2010) 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : 0.065ms to sign (156843 cycles), Today: in CPUs costing 2 0.232ms to verify (557082 cycles). Cortex-A7 is even more popu

Compare to, e.g., Ed25519 10 2012 Bernstein Schwabe 11 speeds reported for single core on 720MHz ARM Cortex-A8: of 14nm 3.31GHz Skylake 0.9ms to verify (650102 cycles). ( 2015 Intel Core i5-6600 ) on https://bench.cr.yp.to: ARM Cortex-A8 cores were in 1000MHz Apple A4 0.015ms to sign (49840 cycles), in ipad 1, iphone 4 (2010); 0.049ms to verify (163206 cycles). 1000MHz Samsung Exynos 3110 This chip didn t exist in 2009. Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : 0.065ms to sign (156843 cycles), Today: in CPUs costing 2 EUR. 0.232ms to verify (557082 cycles). Cortex-A7 is even more popular.

to, e.g., Ed25519 10 2012 Bernstein Schwabe 11 180nm 3 ported for single core on 720MHz ARM Cortex-A8: ( 2001 I 3.31GHz Skylake ntel Core i5-6600 ) on /bench.cr.yp.to: 0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 0.46ms ( for Curve using flo to sign (49840 cycles), in ipad 1, iphone 4 (2010); Integer m to verify (163206 cycles). didn t exist in 2009. instead to single core 2.4GHz Core 2 ( 2007 e 2 Quad Q6600 ). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : Nobody adapting Would b 3.4GHz P same ba to sign (156843 cycles), Today: in CPUs costing 2 EUR. more ins to verify (557082 cycles). Cortex-A7 is even more popular. Ed25519 on one c

Ed25519 10 2012 Bernstein Schwabe 11 180nm 32-bit 2GH r single core on 720MHz ARM Cortex-A8: ( 2001 Intel Pentiu Skylake i5-6600 ) on r.yp.to: 0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 0.46ms (0.9 million for Curve25519 sca using floating-poin 9840 cycles), in ipad 1, iphone 4 (2010); Integer multiplier i 163206 cycles). ist in 2009. o single core ore 2 ( 2007 Q6600 ). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : Nobody has ever b adapting this to si Would be 0:6ms 3.4GHz Pentium D same basic microa 56843 cycles), Today: in CPUs costing 2 EUR. more instructions, 557082 cycles). Cortex-A7 is even more popular. Ed25519 would be on one core than P

10 2012 Bernstein Schwabe 11 180nm 32-bit 2GHz Willame re on 720MHz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): on 0.9ms to verify (650102 cycles). ARM Cortex-A8 cores were in 0.46ms (0.9 million cycles) for Curve25519 scalarmult 1000MHz Apple A4 using floating-point multiplie es), in ipad 1, iphone 4 (2010); Integer multiplier is much slo ycles). 9. re 07 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 3.4GHz Pentium D (dual cor Amazon Kindle 4 (2011); : : : same basic microarchitecture les), Today: in CPUs costing 2 EUR. more instructions, faster cloc ycles). Cortex-A7 is even more popular. Ed25519 would be >10 fas on one core than Petit s soft

2012 Bernstein Schwabe 11 180nm 32-bit 2GHz Willamette 12 on 720MHz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): 0.9ms to verify (650102 cycles). 0.46ms (0.9 million cycles) ARM Cortex-A8 cores were in for Curve25519 scalarmult 1000MHz Apple A4 using floating-point multiplier. in ipad 1, iphone 4 (2010); Integer multiplier is much slower! 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 800MHz Freescale i.mx50 in 3.4GHz Pentium D (dual core): Amazon Kindle 4 (2011); : : : same basic microarchitecture, Today: in CPUs costing 2 EUR. more instructions, faster clock. Cortex-A7 is even more popular. Ed25519 would be >10 faster on one core than Petit s software.

rnstein Schwabe 11 180nm 32-bit 2GHz Willamette 12 Bad ECD Hz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): certainly verify (650102 cycles). rtex-a8 cores were in z Apple A4, iphone 4 (2010); 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t u can t u need a etc. Typ z Samsung Exynos 3110 ng Galaxy S (2010); z TI OMAP3630 in Droid X (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 2000 Bro Menezes 4.0ms/6 cycles) f Freescale i.mx50 in 3.4GHz Pentium D (dual core): inside NI Kindle 4 (2011); : : : n CPUs costing 2 EUR. 7 is even more popular. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software. 2001 Be 0.7 millio for NIST

hwabe 11 180nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST Cortex-A8: ( 2001 Intel Pentium 4 ): certainly has some 0102 cycles). res were in 4 (2010); 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest can t use fastest need an annoyin etc. Typical estima g Exynos 3110 S (2010); P3630 in (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 2000 Brown Hank Menezes on 400M 4.0ms/6.4ms (1.6/ cycles) for double i.mx50 in 3.4GHz Pentium D (dual core): inside NIST P-224 2011); : : : sting 2 EUR. more popular. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software. 2001 Bernstein, 0.7 million cycles o for NIST P-224 sc

11 180nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 de : ( 2001 Intel Pentium 4 ): certainly has some impact: es). n 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest mulmods can t use fastest curve form need an annoying inversion etc. Typical estimate: 2 sl 110 ; Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 2000 Brown Hankerson Lóp Menezes on 400MHz Pentiu 4.0ms/6.4ms (1.6/2.6 millio cycles) for double scalarmult 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 ve : EUR. lar. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software. 2001 Bernstein, 1:6 faste 0.7 million cycles on Pentium for NIST P-224 scalarmult.

180nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 design 13 ( 2001 Intel Pentium 4 ): certainly has some impact: 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 2000 Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 verif. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software. 2001 Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult.

2-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 design 13 2000 Bro ntel Pentium 4 ): certainly has some impact: Menezes 0.9 million cycles) 25519 scalarmult ating-point multiplier. ultiplier is much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. cycles on e.g., P-2 1.2 millio 2.7 millio has ever bothered this to signatures. e 0:6ms for verify. entium D (dual core): sic microarchitecture, tructions, faster clock. would be >10 faster ore than Petit s software. 2000 Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2001 Be 0.7 millio 0.8 millio 0.9 millio using co OpenSSL 2.0 millio

z Willamette 12 Bad ECDSA-NIST-P-256 design 13 2000 Brown Hank m 4 ): certainly has some impact: Menezes software cycles) larmult t multiplier. s much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. cycles on P4 than e.g., P-224 scalarm 1.2 million cycles o 2.7 million cycles o othered gnatures. for verify. (dual core): rchitecture, faster clock. >10 faster etit s software. 2000 Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2001 Bernstein P-2 0.7 million cycles o 0.8 million cycles o 0.9 million cycles o using compressed k OpenSSL 1.0.1, P- 2.0 million cycles o

tte 12 Bad ECDSA-NIST-P-256 design 13 2000 Brown Hankerson Lóp certainly has some impact: Menezes software uses many can t use fastest mulmods; cycles on P4 than on PII. r. wer! can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. e.g., P-224 scalarmult: 1.2 million cycles on Pentium 2.7 million cycles on Pentium e):, k. ter ware. 2000 Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. 2001 Bernstein P-224 scalar 0.7 million cycles on Pentium 0.8 million cycles on Pentium 0.9 million cycles on Pentium using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium

Bad ECDSA-NIST-P-256 design 13 2000 Brown Hankerson López 14 certainly has some impact: Menezes software uses many more can t use fastest mulmods; cycles on P4 than on PII. can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2000 Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. 2001 Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D.

SA-NIST-P-256 design 13 2000 Brown Hankerson López 14 How did has some impact: Menezes software uses many more 17 millio se fastest mulmods; cycles on P4 than on PII. 22 millio se fastest curve formulas; n annoying inversion; ical estimate: 2 slower. wn Hankerson López on 400MHz Pentium II:.4ms (1.6/2.6 million or double scalarmult ST P-224/P-256 verif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. Presuma bad mulm Why did ECDSA, underlyin Why did previous rnstein, 1:6 faster: n cycles on Pentium II P-224 scalarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Why did

-P-256 design 13 2000 Brown Hankerson López 14 How did Petit man impact: Menezes software uses many more 17 million cycles fo mulmods; cycles on P4 than on PII. 22 million cycles fo curve formulas; g inversion; te: 2 slower. erson López Hz Pentium II: 2.6 million scalarmult /P-256 verif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. Presumably some bad mulmod and b Why did Petit reim ECDSA, using MIR underlying arithme Why did Petit not previous speed lite 1:6 faster: n Pentium II alarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Petit cho Why did BHLM ch

sign 13 2000 Brown Hankerson López 14 How did Petit manage to us Menezes software uses many more 17 million cycles for P-224 v ; cycles on P4 than on PII. 22 million cycles for P-256 v ulas; ; ower. ez m II: n rif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium 4. 2001 Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. Presumably some combinatio bad mulmod and bad curve Why did Petit reimplement ECDSA, using MIRACL for t underlying arithmetic? Why did Petit not simply cit previous speed literature? r: II OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Petit choose Pentiu Why did BHLM choose PII?

2000 Brown Hankerson López 14 How did Petit manage to use 15 Menezes software uses many more 17 million cycles for P-224 verif, cycles on P4 than on PII. 22 million cycles for P-256 verif? e.g., P-224 scalarmult: Presumably some combination of 1.2 million cycles on Pentium II. bad mulmod and bad curve ops. 2.7 million cycles on Pentium 4. Why did Petit reimplement 2001 Bernstein P-224 scalarmult: ECDSA, using MIRACL for the 0.7 million cycles on Pentium II. underlying arithmetic? 0.8 million cycles on Pentium 4. 0.9 million cycles on Pentium 4 using compressed keys. Why did Petit not simply cite previous speed literature? OpenSSL 1.0.1, P-224 verif: Why did Petit choose Pentium D? 2.0 million cycles on Pentium D. Why did BHLM choose PII?

wn Hankerson López 14 How did Petit manage to use 15 Petit: T software uses many more 17 million cycles for P-224 verif, cryptogr P4 than on PII. 22 million cycles for P-256 verif? OpenSSL 24 scalarmult: n cycles on Pentium II. n cycles on Pentium 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors comparis that MIR performa rnstein P-224 scalarmult: ECDSA, using MIRACL for the elliptic c n cycles on Pentium II. underlying arithmetic? n cycles on Pentium 4. n cycles on Pentium 4 mpressed keys. Why did Petit not simply cite previous speed literature? 1.0.1, P-224 verif: Why did Petit choose Pentium D? n cycles on Pentium D. Why did BHLM choose PII?

erson López 14 How did Petit manage to use 15 Petit: There are uses many more 17 million cycles for P-224 verif, cryptographic libra on PII. 22 million cycles for P-256 verif? OpenSSL and Cryp ult: n Pentium II. n Pentium 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors in [21] pro comparison and co that MIRACL has performance for op 24 scalarmult: ECDSA, using MIRACL for the elliptic curves over n Pentium II. underlying arithmetic? n Pentium 4. n Pentium 4 eys. Why did Petit not simply cite previous speed literature? 224 verif: Why did Petit choose Pentium D? n Pentium D. Why did BHLM choose PII?

ez 14 How did Petit manage to use 15 Petit: There are three main more 17 million cycles for P-224 verif, cryptographic libraries: MIR 22 million cycles for P-256 verif? OpenSSL and Crypto++. II. 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations o mult: ECDSA, using MIRACL for the elliptic curves over binary fie II. underlying arithmetic? 4. 4 Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? D. Why did BHLM choose PII?

How did Petit manage to use 15 Petit: There are three main 16 17 million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Presumably some combination of bad mulmod and bad curve ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields. underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?

How did Petit manage to use 15 Petit: There are three main 16 17 million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Presumably some combination of bad mulmod and bad curve ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields. underlying arithmetic? But NIST P-224 and NIST P-256 Why did Petit not simply cite are defined over prime fields! previous speed literature? [21] says For elliptic curves Why did Petit choose Pentium D? over prime fields, OpenSSL has Why did BHLM choose PII? the best performance under all platforms.

Petit manage to use 15 Petit: There are three main 16 More gen n cycles for P-224 verif, cryptographic libraries: MIRACL, Paper an n cycles for P-256 verif? OpenSSL and Crypto++. crypto u bly some combination of od and bad curve ops. Petit reimplement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the cr Why is t Why sho using MIRACL for the elliptic curves over binary fields. If the cr g arithmetic? Petit not simply cite speed literature? But NIST P-224 and NIST P-256 are defined over prime fields! [21] says For elliptic curves Paper is Look, he More like More like Petit choose Pentium D? over prime fields, OpenSSL has funding BHLM choose PII? the best performance under all platforms.

age to use 15 Petit: There are three main 16 More general situa r P-224 verif, cryptographic libraries: MIRACL, Paper analyzes imp r P-256 verif? OpenSSL and Crypto++. crypto upon an ap combination of ad curve ops. plement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto soun Why is the paper i Why should it be p ACL for the elliptic curves over binary fields. If the crypto soun tic? simply cite rature? But NIST P-224 and NIST P-256 are defined over prime fields! [21] says For elliptic curves Paper is more inte Look, here s a spe More likely to be p More likely to mot ose Pentium D? over prime fields, OpenSSL has funding to fix the oose PII? the best performance under all platforms.

e 15 Petit: There are three main 16 More general situation: erif, cryptographic libraries: MIRACL, Paper analyzes impact of erif? OpenSSL and Crypto++. crypto upon an application. n of ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto sounds fast: Why is the paper interesting Why should it be published? he elliptic curves over binary fields. If the crypto sounds slower: e But NIST P-224 and NIST P-256 are defined over prime fields! Paper is more interesting. Look, here s a speed problem More likely to be published. [21] says For elliptic curves More likely to motivate m D? over prime fields, OpenSSL has funding to fix the problem. the best performance under all platforms.

Petit: There are three main 16 More general situation: 17 cryptographic libraries: MIRACL, Paper analyzes impact of OpenSSL and Crypto++. crypto upon an application. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto sounds fast: Why is the paper interesting? Why should it be published? elliptic curves over binary fields. If the crypto sounds slower: But NIST P-224 and NIST P-256 are defined over prime fields! Paper is more interesting. Look, here s a speed problem! More likely to be published. [21] says For elliptic curves More likely to motivate over prime fields, OpenSSL has funding to fix the problem. the best performance under all platforms.

here are three main 16 More general situation: 17 Obvious aphic libraries: MIRACL, Paper analyzes impact of applicati and Crypto++. crypto upon an application. deploym in [21] proposed a on and concluded ACL has the best nce for operations on urves over binary fields. P-224 and NIST P-256 ed over prime fields! If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! More likely to be published. Many ra answerin CPU to literature mulmod, Slowest, are most For elliptic curves e fields, OpenSSL has performance under all s. More likely to motivate funding to fix the problem. Situation randomn There s n deliberat

three main 16 More general situation: 17 Obvious question w ries: MIRACL, Paper analyzes impact of application conside to++. crypto upon an application. deployment: Is it posed a ncluded the best erations on binary fields. nd NIST P-256 ime fields! If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! More likely to be published. Many random met answering this que CPU to test? Wha literature and libra mulmod, or curve Slowest, least com are most likely to b tic curves penssl has ce under all More likely to motivate funding to fix the problem. Situation is fully e randomness + nat There s no evidenc deliberately slowed

16 More general situation: 17 Obvious question whenever a ACL, Paper analyzes impact of application considers crypto crypto upon an application. deployment: Is it fast enou If the crypto sounds fast: Many random methodologie Why is the paper interesting? answering this question. Wh n lds. -256 Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! CPU to test? What to take literature and libraries? Reus mulmod, or curve ops, or mo Slowest, least competent ans More likely to be published. are most likely to be publish as all More likely to motivate funding to fix the problem. Situation is fully explainable randomness + natural select There s no evidence that Pet deliberately slowed down cry

More general situation: 17 Obvious question whenever an 18 Paper analyzes impact of application considers crypto crypto upon an application. deployment: Is it fast enough? If the crypto sounds fast: Many random methodologies for Why is the paper interesting? answering this question. Which Why should it be published? CPU to test? What to take from If the crypto sounds slower: Paper is more interesting. literature and libraries? Reuse mulmod, or curve ops, or more? Look, here s a speed problem! Slowest, least competent answers More likely to be published. are most likely to be published. More likely to motivate funding to fix the problem. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit deliberately slowed down crypto.

eral situation: 17 Obvious question whenever an 18 Paper in alyzes impact of application considers crypto software pon an application. deployment: Is it fast enough? incentive ypto sounds fast: he paper interesting? Many random methodologies for answering this question. Which slow, and report it uld it be published? CPU to test? What to take from Paper w ypto sounds slower: more interesting. re s a speed problem! ly to be published. ly to motivate to fix the problem. literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit functions lengths, timing m maximiz from old This is n what ma deliberately slowed down crypto.

tion: 17 Obvious question whenever an 18 Paper introducing act of application considers crypto software or hardwa plication. deployment: Is it fast enough? incentive to report ds fast: nteresting? Many random methodologies for answering this question. Which slow, and analogou report its own cryp ublished? CPU to test? What to take from Paper will naturall ds slower: resting. ed problem! ublished. ivate problem. literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit functions, paramet lengths, platforms, timing mechanism maximize reported from old to new. This is not the sam what matters most deliberately slowed down crypto.

17 Obvious question whenever an 18 Paper introducing new crypt application considers crypto software or hardware has sam deployment: Is it fast enough? incentive to report older cryp? Many random methodologies for answering this question. Which slow, and analogous incentiv report its own crypto as fast CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input mulmod, or curve ops, or more? lengths, platforms, I/O form! Slowest, least competent answers are most likely to be published. timing mechanism, etc. that maximize reported improvem from old to new. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit This is not the same as selec what matters most for the u deliberately slowed down crypto.

Obvious question whenever an 18 Paper introducing new crypto 19 application considers crypto software or hardware has same deployment: Is it fast enough? incentive to report older crypto as Many random methodologies for answering this question. Which slow, and analogous incentive to report its own crypto as fast. CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input mulmod, or curve ops, or more? lengths, platforms, I/O format, Slowest, least competent answers are most likely to be published. timing mechanism, etc. that maximize reported improvement from old to new. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit This is not the same as selecting what matters most for the users. deliberately slowed down crypto.

question whenever an 18 Paper introducing new crypto 19 Bit oper on considers crypto software or hardware has same (assumin ent: Is it fast enough? incentive to report older crypto as as listed ndom methodologies for g this question. Which slow, and analogous incentive to report its own crypto as fast. key ops test? What to take from and libraries? Reuse or curve ops, or more? least competent answers likely to be published. is fully explainable by ess + natural selection. o evidence that Petit ely slowed down crypto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. 128 88 128 100 128 117 256 144 128 147 256 156 128 162 128 202 256 283

henever an 18 Paper introducing new crypto 19 Bit operations per rs crypto software or hardware has same (assuming precomp fast enough? incentive to report older crypto as as listed in recent hodologies for stion. Which slow, and analogous incentive to report its own crypto as fast. key ops/bit ciphe t to take from ries? Reuse ops, or more? petent answers e published. xplainable by ural selection. e that Petit down crypto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. 128 88 Simo 128 100 NOE 128 117 Skin 256 144 Simo 128 147.2 PRE 256 156 Skin 128 162.75 Picco 128 202.5 AES 256 283.5 AES

n 18 Paper introducing new crypto 19 Bit operations per bit of pla software or hardware has same (assuming precomputed subk gh? incentive to report older crypto as as listed in recent Skinny pa s for ich slow, and analogous incentive to report its own crypto as fast. key ops/bit cipher from e re? wers ed. by ion. it pto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. 128 88 Simon: 60 ops 128 100 NOEKEON 128 117 Skinny 256 144 Simon: 106 op 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

Paper introducing new crypto 19 Bit operations per bit of plaintext 20 software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as as listed in recent Skinny paper: slow, and analogous incentive to report its own crypto as fast. key ops/bit cipher Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

Paper introducing new crypto 19 Bit operations per bit of plaintext 20 software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as not entirely listed in Skinny paper: slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES

troducing new crypto 19 Bit operations per bit of plaintext 20 Many ba or hardware has same (assuming precomputed subkeys), backed b to report older crypto as analogous incentive to s own crypto as fast. ill naturally select, parameters, input platforms, I/O format, echanism, etc. that e reported improvement to new. ot the same as selecting tters most for the users. not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES e.g. Do w optimize the older Rely on We com most arc do much complete heuristic get little where th slightly w

new crypto 19 Bit operations per bit of plaintext 20 Many bad example re has same (assuming precomputed subkeys), backed by tons of older crypto as s incentive to to as fast. y select ers, input I/O format,, etc. that improvement e as selecting for the users. not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES e.g. Do we bother optimized impleme the older crypto? Rely on optimizin We come so close most architectures do much more wit complete algorithm heuristics. We can get little niggles he where the heuristic slightly wrong answ

o 19 Bit operations per bit of plaintext 20 Many bad examples to imita e (assuming precomputed subkeys), backed by tons of misinform to as e to. at, ent ting sers. not entirely listed in Skinny paper: key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES e.g. Do we bother searching optimized implementations o the older crypto? Take any Rely on optimizing compil We come so close to optim most architectures that we c do much more without using complete algorithms instead heuristics. We can only try t get little niggles here and th where the heuristics get slightly wrong answers.

Bit operations per bit of plaintext 20 Many bad examples to imitate, 21 (assuming precomputed subkeys), backed by tons of misinformation. not entirely listed in Skinny paper: e.g. Do we bother searching for key ops/bit cipher 256 54 Salsa20/8 256 78 Salsa20/12 128 88 Simon: 60 ops broken 128 100 NOEKEON 128 117 Skinny 256 126 Salsa20 256 144 Simon: 106 ops broken 128 147.2 PRESENT 256 156 Skinny 128 162.75 Piccolo 128 202.5 AES 256 283.5 AES optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

ations per bit of plaintext 20 Many bad examples to imitate, 21 Reality is g precomputed subkeys), backed by tons of misinformation. ely listed in Skinny paper: e.g. Do we bother searching for /bit cipher Salsa20/8 Salsa20/12 Simon: 60 ops broken NOEKEON Skinny Salsa20 Simon: 106 ops broken.2 PRESENT Skinny.75 Piccolo.5 AES.5 AES optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

bit of plaintext 20 Many bad examples to imitate, 21 Reality is more com uted subkeys), backed by tons of misinformation. in Skinny paper: e.g. Do we bother searching for r 20/8 20/12 n: 60 ops broken KEON ny 20 n: 106 ops broken SENT ny lo optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

intext 20 Many bad examples to imitate, 21 Reality is more complicated: eys), backed by tons of misinformation. paper: e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! broken Rely on optimizing compiler! We come so close to optimal on most architectures that we can t s broken do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

Many bad examples to imitate, 21 Reality is more complicated: 22 backed by tons of misinformation. e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

d examples to imitate, 21 Reality is more complicated: 22 SUPERC y tons of misinformation. includes e bother searching for d implementations of of 595 cr >20 imp crypto? Take any code! Haswell: optimizing compiler! impleme e so close to optimal on hitectures that we can t more without using NP gcc -O3 is 6:15 Salsa20 algorithms instead of merged s. We can only try to with m niggles here and there optimiza e heuristics get compiler rong answers.

s to imitate, 21 Reality is more complicated: 22 SUPERCOP bench misinformation. includes 2155 impl searching for ntations of of 595 cryptograph >20 implementatio Take any code! Haswell: Reasonab g compiler! implementation co to optimal on that we can t hout using NP gcc -O3 -fomit-f is 6:15 slower th Salsa20 implement s instead of merged implement only try to with machine-ind re and there optimizations and s get compiler options: ers.

te, 21 Reality is more complicated: 22 SUPERCOP benchmarking t ation. includes 2155 implementatio for f of 595 cryptographic primitiv >20 implementations of Sals code! Haswell: Reasonably simple er! implementation compiled wi al on an t NP gcc -O3 -fomit-frame-poi is 6:15 slower than fastest Salsa20 implementation. of merged implementation o with machine-independent ere optimizations and best of 12 compiler options: 4:52 slow

Reality is more complicated: 22 SUPERCOP benchmarking toolkit 23 includes 2155 implementations of 595 cryptographic primitives. >20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15 slower than fastest Salsa20 implementation. merged implementation with machine-independent optimizations and best of 121 compiler options: 4:52 slower.

more complicated: 22 SUPERCOP benchmarking toolkit 23 Another includes 2155 implementations lattice-ba of 595 cryptographic primitives. means g >20 implementations of Salsa20. of rando Haswell: Reasonably simple ref 2017.03 implementation compiled with Valencia gcc -O3 -fomit-frame-pointer Regazzo is 6:15 slower than fastest sources o Salsa20 implementation. discrete merged implementation benchma with machine-independent Qualitati optimizations and best of 121 choice of compiler options: 4:52 slower. sampling

plicated: 22 SUPERCOP benchmarking toolkit 23 Another interesting includes 2155 implementations lattice-based signin of 595 cryptographic primitives. means generating >20 implementations of Salsa20. of random Gaussia Haswell: Reasonably simple ref 2017.03 Brannigan implementation compiled with Valencia O Sulliva gcc -O3 -fomit-frame-pointer Regazzoni An inv is 6:15 slower than fastest sources of random Salsa20 implementation. discrete Gaussian s merged implementation benchmarks for RN with machine-independent Qualitatively large optimizations and best of 121 choice of RNG compiler options: 4:52 slower. sampling cost o

22 23 SUPERCOP benchmarking toolkit includes 2155 implementations of 595 cryptographic primitives. >20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15 slower than fastest Salsa20 implementation. merged implementation with machine-independent optimizations and best of 121 compiler options: 4:52 slower. Another interesting example lattice-based signing typicall means generating a huge nu of random Gaussian samples 2017.03 Brannigan Smyth O Valencia O Sullivan Güneys Regazzoni An investigation sources of randomness withi discrete Gaussian sampling : benchmarks for RNGs, samp Qualitatively large impacts: choice of RNG cost of sampling cost of signing.

SUPERCOP benchmarking toolkit 23 Another interesting example: 24 includes 2155 implementations lattice-based signing typically of 595 cryptographic primitives. means generating a huge number >20 implementations of Salsa20. of random Gaussian samples. Haswell: Reasonably simple ref 2017.03 Brannigan Smyth Oder implementation compiled with Valencia O Sullivan Güneysu gcc -O3 -fomit-frame-pointer Regazzoni An investigation of is 6:15 slower than fastest sources of randomness within Salsa20 implementation. discrete Gaussian sampling : merged implementation benchmarks for RNGs, samplers. with machine-independent Qualitatively large impacts: optimizations and best of 121 choice of RNG cost of compiler options: 4:52 slower. sampling cost of signing.

OP benchmarking toolkit 23 Another interesting example: 24 Two exa 2155 implementations lattice-based signing typically in this 20 yptographic primitives. means generating a huge number Skylake lementations of Salsa20. of random Gaussian samples. 383.69 M Reasonably simple ref 2017.03 Brannigan Smyth Oder cycles/by ntation compiled with Valencia O Sullivan Güneysu using AE -fomit-frame-pointer Regazzoni An investigation of (32 cycle slower than fastest sources of randomness within implementation. discrete Gaussian sampling : implementation benchmarks for RNGs, samplers. achine-independent Qualitatively large impacts: tions and best of 121 choice of RNG cost of options: 4:52 slower. sampling cost of signing.

marking toolkit 23 Another interesting example: 24 Two examples of s ementations lattice-based signing typically in this 2017 paper ic primitives. means generating a huge number Skylake (Intel Core ns of Salsa20. of random Gaussian samples. 383.69 MByte/sec ly simple ref 2017.03 Brannigan Smyth Oder cycles/byte) for AE mpiled with Valencia O Sullivan Güneysu using AES-NI; 106 rame-pointer Regazzoni An investigation of (32 cycles/byte) fo an fastest sources of randomness within ation. discrete Gaussian sampling : ation benchmarks for RNGs, samplers. ependent Qualitatively large impacts: best of 121 choice of RNG cost of 4:52 slower. sampling cost of signing.

oolkit 23 Another interesting example: 24 Two examples of speed repo ns lattice-based signing typically in this 2017 paper for a 3.4G es. means generating a huge number Skylake (Intel Core i7-6700): a20. of random Gaussian samples. 383.69 MByte/sec (8.86 ref 2017.03 Brannigan Smyth Oder cycles/byte) for AES CTR-D th Valencia O Sullivan Güneysu using AES-NI; 106.07 MByte nter Regazzoni An investigation of (32 cycles/byte) for ChaCha sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. Qualitatively large impacts: 1 choice of RNG cost of er. sampling cost of signing.

Another interesting example: 24 Two examples of speed reported 25 lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples. 383.69 MByte/sec (8.86 2017.03 Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG Valencia O Sullivan Güneysu using AES-NI; 106.07 MByte/sec Regazzoni An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG cost of sampling cost of signing.

Another interesting example: 24 Two examples of speed reported 25 lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples. 383.69 MByte/sec (8.86 2017.03 Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG Valencia O Sullivan Güneysu using AES-NI; 106.07 MByte/sec Regazzoni An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Qualitatively large impacts: choice of RNG cost of sampling cost of signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

interesting example: 24 Two examples of speed reported 25 sed signing typically in this 2017 paper for a 3.4GHz enerating a huge number Skylake (Intel Core i7-6700): m Gaussian samples. 383.69 MByte/sec (8.86 Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG O Sullivan Güneysu using AES-NI; 106.07 MByte/sec ni An investigation of (32 cycles/byte) for ChaCha20. f randomness within Gaussian sampling : rks for RNGs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. vely large impacts: RNG cost of cost of signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

example: 24 Two examples of speed reported 25 g typically in this 2017 paper for a 3.4GHz a huge number Skylake (Intel Core i7-6700): n samples. 383.69 MByte/sec (8.86 Smyth Oder cycles/byte) for AES CTR-DRBG n Güneysu using AES-NI; 106.07 MByte/sec estigation of (32 cycles/byte) for ChaCha20. ness within ampling : Gs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. impacts: cost of f signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

: 24 Two examples of speed reported 25 y in this 2017 paper for a 3.4GHz mber Skylake (Intel Core i7-6700):. 383.69 MByte/sec (8.86 der cycles/byte) for AES CTR-DRBG u using AES-NI; 106.07 MByte/sec of (32 cycles/byte) for ChaCha20. n But wait. ebacs reports lers. 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: essential for us to examine standard open implementations. Slow ones?

Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): 383.69 MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; 106.07 MByte/sec (32 cycles/byte) for ChaCha20. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: essential for us to examine standard open implementations. Slow ones? 25 26

mples of speed reported 17 paper for a 3.4GHz (Intel Core i7-6700): Byte/sec (8.86 te) for AES CTR-DRBG S-NI; 106.07 MByte/sec s/byte) for ChaCha20.. ebacs reports les/byte for AES-256-CTR, les/byte for ChaCha20. on-response: essential examine standard open ntations. Slow ones? 25 26

peed reported for a 3.4GHz i7-6700): (8.86 S CTR-DRBG.07 MByte/sec r ChaCha20. reports r AES-256-CTR, r ChaCha20. se: essential standard open Slow ones? 25 26

rted Hz 25 26 RBG /sec 20. -CTR, 20. tial pen s?

26 27

26 27

26 27

26 27