Architecture. 2. Implementation was a European project. of V2X Security Subsystem. 3. Preparing Secure Vehicle-to-X

Size: px

Start display at page:

Download "Architecture. 2. Implementation was a European project. of V2X Security Subsystem. 3. Preparing Secure Vehicle-to-X"

Horace Patrick
5 years ago
Views:

1 How cryptographic benchmarking 1 About PRESERVE : The 2 goes wrong mission of PRESERVE is, Daniel J. Bernstein Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security PRESERVE, ending , Architecture. 2. Implementation was a European project of V2X Security Subsystem. 3. Preparing Secure Vehicle-to-X Cheap and scalable security ASIC Communication Systems. for V2X. 4. Testing results VSS Project cost: EUR, including EUR from the European Commission. under realistic conditions. 5. Research results for deployment challenges.

2 ptographic benchmarking 1 About PRESERVE : The 2 Cars alre ng mission of PRESERVE is, Why bui. Bernstein o NIST 60NANB12D261 ng this work, and for not these slides in advance. VE, ending , ropean project ng Secure Vehicle-to-X ication Systems. ost: EUR, EUR from pean Commission. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESER Security Security Process second a ms can h hardware a Pentiu needs ab a verifica cryptogr likely to

3 benchmarking 1 About PRESERVE : The 2 Cars already includ mission of PRESERVE is, Why build an ASIC 0NANB12D261 rk, and for not es in advance. g , oject Vehicle-to-X stems. 431 EUR, EUR from mission. to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE delive Security Requirem Security Architectu Processing 1,000 second and proces ms can hardly be m hardware. As discu a Pentium D 3.4 G needs about 5 tim a verification : : : a cryptographic co-p likely to be necessa

4 rking 1 About PRESERVE : The 2 Cars already include many C mission of PRESERVE is, Why build an ASIC? D261 r not ance..30, -X to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE deliverable 1.1, Security Requirements of V Security Architecture, 2011 Processing 1,000 packets p second and processing each ms can hardly be met by cur hardware. As discussed in [3 a Pentium D 3.4 GHz proces needs about 5 times as long a verification : : : a dedicated cryptographic co-processor is likely to be necessary.

5 About PRESERVE : The 2 Cars already include many CPUs. 3 mission of PRESERVE is, Why build an ASIC? to design, implement, and test a secure and scalable V2X Security Subsystem for realistic deployment scenarios. : : : [Expected Results:] 1. Harmonized V2X Security Architecture. 2. Implementation of V2X Security Subsystem. 3. Cheap and scalable security ASIC for V2X. 4. Testing results VSS under realistic conditions. 5. Research results for deployment challenges. PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary.

6 RESERVE : The 2 Cars already include many CPUs. 3 PRESER f PRESERVE is, Why build an ASIC? Deploym, implement, and cure and scalable urity Subsystem for deployment scenarios. ected Results:] 1. zed V2X Security ture. 2. Implementation ecurity Subsystem. 3. d scalable security ASIC 4. Testing results VSS alistic conditions. 5. results for deployment s. PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 201 ECC sign second is factor fo environm 4mm 4m technolo space for 90nm wi cores and more. F max 100

7 E : The 2 Cars already include many CPUs. 3 PRESERVE delive VE is, Why build an ASIC? Deployment Issue nt, and calable ystem for t scenarios. lts:] 1. ecurity plementation bsystem. 3. security ASIC g results VSS ditions. 5. r deployment PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the n ECC signature veri second is the key p factor for ASICs in environment : : : [O 4mm 4mm chip] technology may on space for one ECC 90nm will allow fo cores and 55nm w more. For 180nm max 100MHz, 100

8 2 Cars already include many CPUs. 3 PRESERVE deliverable 5.4, Why build an ASIC? Deployment Issues Report s. tion 3. ASIC SS ent PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the number of ECC signature verifications p second is the key performanc factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield en space for one ECC core, whe 90nm will allow for up to ten cores and 55nm will allow fo more. For 180nm core says max 100MHz, 100 verif/seco

9 Cars already include many CPUs. 3 PRESERVE deliverable 5.4, 4 Why build an ASIC? Deployment Issues Report PRESERVE deliverable 1.1, Security Requirements of Vehicle Security Architecture, 2011: Processing 1,000 packets per second and processing each in 1 ms can hardly be met by current hardware. As discussed in [32], a Pentium D 3.4 GHz processor needs about 5 times as long for a verification : : : a dedicated cryptographic co-processor is likely to be necessary. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second.

10 ady include many CPUs. 3 PRESERVE deliverable 5.4, 4 Compare ld an ASIC? Deployment Issues Report IAIK NIS VE deliverable 1.1, Requirements of Vehicle Architecture, 2011: ing 1,000 packets per nd processing each in 1 ardly be met by current. As discussed in [32], m D 3.4 GHz processor out 5 times as long for tion : : : a dedicated aphic co-processor is be necessary. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. 858 scala in at 180nm technolo standard condition core volt Signatur somewha Still clos than the

11 e many CPUs. 3 PRESERVE deliverable 5.4, 4 Compare to, e.g.,? Deployment Issues Report IAIK NIST P-256 rable 1.1, ents of Vehicle re, 2011: packets per sing each in 1 et by current ssed in [32], Hz processor es as long for dedicated rocessor is ry. V4, 2016: the number of ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. 858 scalarmult/sec in GE at 1 at 180nm ( UMC technology using F standard cell librar m 2 /GE; w conditions (temper core voltage 1.62V Signature verificat somewhat slower t Still close to 100 than the PRESERV

12 PUs. 3 PRESERVE deliverable 5.4, 4 Compare to, e.g., Deployment Issues Report IAIK NIST P-256 ECC Mod V4, 2016: the number of 858 scalarmult/second ehicle : er in 1 rent 2], sor for ECC signature verifications per second is the key performance factor for ASICs in a C2C environment : : : [On a 4mm 4mm chip] the 180nm technology may only yield enough space for one ECC core, whereas 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. in GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f18 standard cell library (FSA0A m 2 /GE; worst case conditions (temperature 125 core voltage 1.62V) ). Signature verification will be somewhat slower than scalar Still close to 100 more effi than the PRESERVE estima

13 PRESERVE deliverable 5.4, 4 Compare to, e.g., 5 Deployment Issues Report IAIK NIST P-256 ECC Module: V4, 2016: the number of 858 scalarmult/second ECC signature verifications per in GE at 192 MHz second is the key performance at 180nm ( UMC L180GII factor for ASICs in a C2C technology using Faraday f180 environment : : : [On a standard cell library (FSA0A C), 4mm 4mm chip] the 180nm m 2 /GE; worst case technology may only yield enough conditions (temperature 125 C, space for one ECC core, whereas core voltage 1.62V) ). 90nm will allow for up to ten ECC cores and 55nm will allow for even more. For 180nm core says max 100MHz, 100 verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates.

14 VE deliverable 5.4, 4 Compare to, e.g., 5 Let s go ent Issues Report IAIK NIST P-256 ECC Module: core argu 6: the number of ature verifications per the key performance r ASICs in a C2C ent : : : [On a m chip] the 180nm 858 scalarmult/second in GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), m 2 /GE; worst case Central c in [32], a processo (i.e., 17 for signa gy may only yield enough conditions (temperature 125 C, [32] is P one ECC core, whereas core voltage 1.62V) ). Z., Ana ll allow for up to ten ECC 55nm will allow for even or 180nm core says MHz, 100 verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead Third Jo Mobile N (WMNC

15 rable 5.4, 4 Compare to, e.g., 5 Let s go back to P s Report IAIK NIST P-256 ECC Module: core argument for umber of fications per erformance a C2C n a the 180nm 858 scalarmult/second in GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), m 2 /GE; worst case Central claim: As in [32], a Pentium processor needs ab (i.e., 17 million CP for signature verifi ly yield enough conditions (temperature 125 C, [32] is Petit, J., M core, whereas core voltage 1.62V) ). Z., Analysis of au r up to ten ECC ill allow for even core says verif/second. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicu Third Joint IFIP W Mobile Networking (WMNC), 2010.

16 4 Compare to, e.g., 5 Let s go back to PRESERVE IAIK NIST P-256 ECC Module: core argument for an ASIC. er e 858 scalarmult/second in GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GH processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. ough conditions (temperature 125 C, [32] is Petit, J., Mammeri, reas core voltage 1.62V) ). Z., Analysis of authenticatio ECC r even nd. Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicular networ Third Joint IFIP Wireless an Mobile Networking Conferen (WMNC), 2010.

17 Compare to, e.g., 5 Let s go back to PRESERVE s 6 IAIK NIST P-256 ECC Module: core argument for an ASIC. 858 scalarmult/second in GE at 192 MHz at 180nm ( UMC L180GII technology using Faraday f180 standard cell library (FSA0A C), m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. conditions (temperature 125 C, [32] is Petit, J., Mammeri, core voltage 1.62V) ). Z., Analysis of authentication Signature verification will be somewhat slower than scalarmult. Still close to 100 more efficient than the PRESERVE estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), 2010.

18 to, e.g., 5 Let s go back to PRESERVE s 6 [32] says T P-256 ECC Module: core argument for an ASIC. to the hu rmult/second 0 GE at 192 MHz ( UMC L180GII gy using Faraday f180 cell library (FSA0A C), m 2 /GE; worst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economi from veh governm compani have ma vehicular s (temperature 125 C, [32] is Petit, J., Mammeri, [1]. On age 1.62V) ). Z., Analysis of authentication collisions e verification will be t slower than scalarmult. e to 100 more efficient PRESERVE estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), and 7900 United S economi [2]. : : : [ costing e

19 5 Let s go back to PRESERVE s 6 [32] says 1. Intro ECC Module: core argument for an ASIC. to the huge life los ond 92 MHz L180GII araday f180 y (FSA0A C), orst case Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts from vehicular coll governments, auto companies, and ind have made the red vehicular fatalities ature 125 C, [32] is Petit, J., Mammeri, [1]. On average, v ) ). Z., Analysis of authentication collisions cause 10 ion will be han scalarmult. more efficient E estimates. overhead in vehicular networks, Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), and 7900 injuries d United States, leav economic impact o [2]. : : : [Similar st costing e160 billio

20 5 Let s go back to PRESERVE s 6 [32] says 1. Introduction. D ule: core argument for an ASIC. to the huge life losses and th 0 C), Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts resulting from vehicular collisions, ma governments, automotive companies, and industry con have made the reduction of vehicular fatalities a top prio C, [32] is Petit, J., Mammeri, [1]. On average, vehicular Z., Analysis of authentication collisions cause 102 deaths overhead in vehicular networks, and 7900 injuries daily in the mult. cient tes. Third Joint IFIP Wireless and Mobile Networking Conference (WMNC), United States, leaving an economic impact of $230 bil [2]. : : : [Similar story for EU costing e160 billion annually

21 Let s go back to PRESERVE s 6 [32] says 1. Introduction. Due 7 core argument for an ASIC. to the huge life losses and the Central claim: As discussed in [32], a Pentium D 3.4 GHz processor needs about 5ms (i.e., 17 million CPU cycles) for signature verification. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority [32] is Petit, J., Mammeri, [1]. On average, vehicular Z., Analysis of authentication collisions cause 102 deaths overhead in vehicular networks, and 7900 injuries daily in the Third Joint IFIP Wireless and United States, leaving an Mobile Networking Conference economic impact of $230 billion (WMNC), [2]. : : : [Similar story for EU:] costing e160 billion annually [3].

22 back to PRESERVE s 6 [32] says 1. Introduction. Due 7 Vehicles ment for an ASIC. to the huge life losses and the informat laim: As discussed Pentium D 3.4 GHz r needs about 5ms million CPU cycles) ture verification. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority of IEEE1 support Signatur [8] over P-224 an paper, w etit, J., Mammeri, [1]. On average, vehicular and com lysis of authentication collisions cause 102 deaths the auth in vehicular networks, and 7900 injuries daily in the provided int IFIP Wireless and United States, leaving an II. Signa etworking Conference economic impact of $230 billion verificati ), [2]. : : : [Similar story for EU:] D 3.4Gh costing e160 billion annually [3].

23 RESERVE s 6 [32] says 1. Introduction. Due 7 Vehicles will comm an ASIC. to the huge life losses and the information. All i discussed D 3.4 GHz out 5ms U cycles) cation. economic impacts resulting from vehicular collisions, many governments, automotive companies, and industry consortia have made the reduction of vehicular fatalities a top priority of IEEE stan support the Elliptic Signature Algorith [8] over the two N P-224 and P-256. paper, we assess th ammeri, [1]. On average, vehicular and communicatio thentication collisions cause 102 deaths the authentication lar networks, and 7900 injuries daily in the provided by ECDS ireless and United States, leaving an II. Signature gener Conference economic impact of $230 billion verification times o [2]. : : : [Similar story for EU:] D 3.4Ghz workstat costing e160 billion annually [3].

24 s 6 [32] says 1. Introduction. Due 7 Vehicles will communicate sa to the huge life losses and the information. All implement economic impacts resulting of IEEE standard [7] s z from vehicular collisions, many governments, automotive support the Elliptic Curve D Signature Algorithm (ECDSA companies, and industry consortia [8] over the two NIST curve have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the process [1]. On average, vehicular and communication overhead n collisions cause 102 deaths the authentication mechanis ks, and 7900 injuries daily in the provided by ECDSA. : : : Tab d United States, leaving an II. Signature generation and ce economic impact of $230 billion verification times on a Penti [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10] costing e160 billion annually [3].

25 [32] says 1. Introduction. Due 7 Vehicles will communicate safety 8 to the huge life losses and the information. All implementations economic impacts resulting of IEEE standard [7] shall from vehicular collisions, many support the Elliptic Curve Digital governments, automotive Signature Algorithm (ECDSA) companies, and industry consortia [8] over the two NIST curves have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the processing [1]. On average, vehicular and communication overhead of collisions cause 102 deaths the authentication mechanism and 7900 injuries daily in the provided by ECDSA. : : : Table United States, leaving an II. Signature generation and economic impact of $230 billion verification times on a Pentium [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10] costing e160 billion annually [3].

26 1. Introduction. Due 7 Vehicles will communicate safety 8 [10] (in [ ge life losses and the information. All implementations J., Anal c impacts resulting of IEEE standard [7] shall Authenti icular collisions, many support the Elliptic Curve Digital VANETs ents, automotive Signature Algorithm (ECDSA) Conferen es, and industry consortia [8] over the two NIST curves Mobility de the reduction of P-224 and P-256. : : : In this Cairo, D fatalities a top priority average, vehicular cause 102 deaths injuries daily in the tates, leaving an c impact of $230 billion Similar story for EU:] 160 billion annually [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says impleme and follo For NIST Pentium 2.50ms/ 4.97ms/

27 duction. Due 7 Vehicles will communicate safety 8 [10] (in [32]) is P ses and the information. All implementations J., Analysis of EC resulting of IEEE standard [7] shall Authentication Pro isions, many support the Elliptic Curve Digital VANETs, 3rd IFIP motive Signature Algorithm (ECDSA) Conference on New ustry consortia [8] over the two NIST curves Mobility and Secur uction of P-224 and P-256. : : : In this Cairo, December 2 a top priority ehicular 2 deaths aily in the ing an f $230 billion ory for EU:] n annually [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA implemented using and following the F For NIST P-224/P Pentium D 3.4GH 2.50ms/3.33ms to 4.97ms/6.63ms to

28 ue 7 Vehicles will communicate safety 8 [10] (in [32]) is Petit e information. All implementations J., Analysis of ECDSA of IEEE standard [7] shall Authentication Processing in ny support the Elliptic Curve Digital VANETs, 3rd IFIP Internati Signature Algorithm (ECDSA) Conference on New Technolo sortia [8] over the two NIST curves Mobility and Security (NTM P-224 and P-256. : : : In this Cairo, December rity lion :] [3]. paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz worksta 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

29 Vehicles will communicate safety 8 [10] (in [32]) is Petit 9 information. All implementations J., Analysis of ECDSA of IEEE standard [7] shall Authentication Processing in support the Elliptic Curve Digital VANETs, 3rd IFIP International Signature Algorithm (ECDSA) Conference on New Technologies, [8] over the two NIST curves Mobility and Security (NTMS), P-224 and P-256. : : : In this Cairo, December paper, we assess the processing and communication overhead of the authentication mechanism provided by ECDSA. : : : Table II. Signature generation and verification times on a Pentium D 3.4Ghz workstation [10] [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

30 will communicate safety 8 [10] (in [32]) is Petit 9 Compare ion. All implementations J., Analysis of ECDSA speeds re standard [7] shall Authentication Processing in of 14nm the Elliptic Curve Digital VANETs, 3rd IFIP International ( 2015 I e Algorithm (ECDSA) Conference on New Technologies, the two NIST curves d P-256. : : : In this e assess the processing munication overhead of entication mechanism by ECDSA. : : : Table ture generation and on times on a Pentium z workstation [10] Mobility and Security (NTMS), Cairo, December [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify ms 0.049ms

31 unicate safety 8 [10] (in [32]) is Petit 9 Compare to, e.g., mplementations J., Analysis of ECDSA speeds reported fo dard [7] shall Authentication Processing in of 14nm 3.31GHz Curve Digital VANETs, 3rd IFIP International ( 2015 Intel Core m (ECDSA) Conference on New Technologies, IST curves : : : In this e processing n overhead of mechanism A. : : : Table ation and n a Pentium ion [10] Mobility and Security (NTMS), Cairo, December [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify ms to sign ( ms to verify (

32 fety 8 [10] (in [32]) is Petit 9 Compare to, e.g., Ed25519 ations J., Analysis of ECDSA speeds reported for single co hall Authentication Processing in of 14nm 3.31GHz Skylake igital VANETs, 3rd IFIP International ( 2015 Intel Core i ) ) Conference on New Technologies, s ing of m le Mobility and Security (NTMS), Cairo, December [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on 0.015ms to sign (49840 cycl 0.049ms to verify ( c um Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

33 [10] (in [32]) is Petit 9 Compare to, e.g., Ed J., Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs, 3rd IFIP International ( 2015 Intel Core i ) on Conference on New Technologies, Mobility and Security (NTMS), Cairo, December ms to sign (49840 cycles), 0.049ms to verify ( cycles). [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

34 [10] (in [32]) is Petit 9 Compare to, e.g., Ed J., Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs, 3rd IFIP International ( 2015 Intel Core i ) on Conference on New Technologies, Mobility and Security (NTMS), Cairo, December ms to sign (49840 cycles), 0.049ms to verify ( cycles). [10] says ECDSA was implemented using MIRACL and following the Fig.1. For NIST P-224/P-256 on Pentium D 3.4GHz workstation : This chip didn t exist in Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). 2.50ms/3.33ms to sign, 0.065ms to sign ( cycles), 4.97ms/6.63ms to verify ms to verify ( cycles).

35 32]) is Petit 9 Compare to, e.g., Ed Be ysis of ECDSA speeds reported for single core on 720M cation Processing in of 14nm 3.31GHz Skylake 0.9ms to, 3rd IFIP International ce on New Technologies, and Security (NTMS), ecember ( 2015 Intel Core i ) on ms to sign (49840 cycles), 0.049ms to verify ( cycles). ARM Co 1000MH in ipad MH ECDSA was nted using MIRACL wing the Fig.1. P-224/P-256 on D 3.4GHz workstation : This chip didn t exist in Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsu 1000MH Motorola 800MHz Amazon 3.33ms to sign, 0.065ms to sign ( cycles), Today: i 6.63ms to verify ms to verify ( cycles). Cortex-A

36 etit 9 Compare to, e.g., Ed Bernstein Sc DSA speeds reported for single core on 720MHz ARM cessing in of 14nm 3.31GHz Skylake 0.9ms to verify (65 International Technologies, ity (NTMS), 009. ( 2015 Intel Core i ) on ms to sign (49840 cycles), 0.049ms to verify ( cycles). ARM Cortex-A8 co 1000MHz Apple A in ipad 1, iphone MHz Samsun was MIRACL ig on z workstation : This chip didn t exist in Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy 1000MHz TI OMA Motorola Droid X 800MHz Freescale Amazon Kindle 4 ( sign, 0.065ms to sign ( cycles), Today: in CPUs co verify ms to verify ( cycles). Cortex-A7 is even

37 9 Compare to, e.g., Ed Bernstein Schwabe speeds reported for single core on 720MHz ARM Cortex-A8 of 14nm 3.31GHz Skylake 0.9ms to verify ( cycl onal gies, S), ( 2015 Intel Core i ) on ms to sign (49840 cycles), ARM Cortex-A8 cores were i 1000MHz Apple A4 in ipad 1, iphone 4 (2010); 0.049ms to verify ( cycles). 1000MHz Samsung Exynos 3 tion : This chip didn t exist in Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy S (2010) 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : 0.065ms to sign ( cycles), Today: in CPUs costing ms to verify ( cycles). Cortex-A7 is even more popu

38 Compare to, e.g., Ed Bernstein Schwabe 11 speeds reported for single core on 720MHz ARM Cortex-A8: of 14nm 3.31GHz Skylake 0.9ms to verify ( cycles). ( 2015 Intel Core i ) on ARM Cortex-A8 cores were in 1000MHz Apple A ms to sign (49840 cycles), in ipad 1, iphone 4 (2010); 0.049ms to verify ( cycles). 1000MHz Samsung Exynos 3110 This chip didn t exist in Compare instead to single core of 65nm 2.4GHz Core 2 ( 2007 Intel Core 2 Quad Q6600 ). in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : 0.065ms to sign ( cycles), Today: in CPUs costing 2 EUR ms to verify ( cycles). Cortex-A7 is even more popular.

39 to, e.g., Ed Bernstein Schwabe nm 3 ported for single core on 720MHz ARM Cortex-A8: ( 2001 I 3.31GHz Skylake ntel Core i ) on /bench.cr.yp.to: 0.9ms to verify ( cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 0.46ms ( for Curve using flo to sign (49840 cycles), in ipad 1, iphone 4 (2010); Integer m to verify ( cycles). didn t exist in instead to single core 2.4GHz Core 2 ( 2007 e 2 Quad Q6600 ). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : Nobody adapting Would b 3.4GHz P same ba to sign ( cycles), Today: in CPUs costing 2 EUR. more ins to verify ( cycles). Cortex-A7 is even more popular. Ed25519 on one c

40 Ed Bernstein Schwabe nm 32-bit 2GH r single core on 720MHz ARM Cortex-A8: ( 2001 Intel Pentiu Skylake i ) on r.yp.to: 0.9ms to verify ( cycles). ARM Cortex-A8 cores were in 1000MHz Apple A4 0.46ms (0.9 million for Curve25519 sca using floating-poin 9840 cycles), in ipad 1, iphone 4 (2010); Integer multiplier i cycles). ist in o single core ore 2 ( 2007 Q6600 ). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Amazon Kindle 4 (2011); : : : Nobody has ever b adapting this to si Would be 0:6ms 3.4GHz Pentium D same basic microa cycles), Today: in CPUs costing 2 EUR. more instructions, cycles). Cortex-A7 is even more popular. Ed25519 would be on one core than P

41 Bernstein Schwabe nm 32-bit 2GHz Willame re on 720MHz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): on 0.9ms to verify ( cycles). ARM Cortex-A8 cores were in 0.46ms (0.9 million cycles) for Curve25519 scalarmult 1000MHz Apple A4 using floating-point multiplie es), in ipad 1, iphone 4 (2010); Integer multiplier is much slo ycles). 9. re MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); 800MHz Freescale i.mx50 in Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 3.4GHz Pentium D (dual cor Amazon Kindle 4 (2011); : : : same basic microarchitecture les), Today: in CPUs costing 2 EUR. more instructions, faster cloc ycles). Cortex-A7 is even more popular. Ed25519 would be >10 fas on one core than Petit s soft

42 2012 Bernstein Schwabe nm 32-bit 2GHz Willamette 12 on 720MHz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): 0.9ms to verify ( cycles). 0.46ms (0.9 million cycles) ARM Cortex-A8 cores were in for Curve25519 scalarmult 1000MHz Apple A4 using floating-point multiplier. in ipad 1, iphone 4 (2010); Integer multiplier is much slower! 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); 1000MHz TI OMAP3630 in Motorola Droid X (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify. 800MHz Freescale i.mx50 in 3.4GHz Pentium D (dual core): Amazon Kindle 4 (2011); : : : same basic microarchitecture, Today: in CPUs costing 2 EUR. more instructions, faster clock. Cortex-A7 is even more popular. Ed25519 would be >10 faster on one core than Petit s software.

43 rnstein Schwabe nm 32-bit 2GHz Willamette 12 Bad ECD Hz ARM Cortex-A8: ( 2001 Intel Pentium 4 ): certainly verify ( cycles). rtex-a8 cores were in z Apple A4, iphone 4 (2010); 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t u can t u need a etc. Typ z Samsung Exynos 3110 ng Galaxy S (2010); z TI OMAP3630 in Droid X (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify Bro Menezes 4.0ms/6 cycles) f Freescale i.mx50 in 3.4GHz Pentium D (dual core): inside NI Kindle 4 (2011); : : : n CPUs costing 2 EUR. 7 is even more popular. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software Be 0.7 millio for NIST

44 hwabe nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST Cortex-A8: ( 2001 Intel Pentium 4 ): certainly has some 0102 cycles). res were in 4 (2010); 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest can t use fastest need an annoyin etc. Typical estima g Exynos 3110 S (2010); P3630 in (2010); Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify Brown Hank Menezes on 400M 4.0ms/6.4ms (1.6/ cycles) for double i.mx50 in 3.4GHz Pentium D (dual core): inside NIST P ); : : : sting 2 EUR. more popular. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software Bernstein, 0.7 million cycles o for NIST P-224 sc

45 11 180nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 de : ( 2001 Intel Pentium 4 ): certainly has some impact: es). n 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest mulmods can t use fastest curve form need an annoying inversion etc. Typical estimate: 2 sl 110 ; Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify Brown Hankerson Lóp Menezes on 400MHz Pentiu 4.0ms/6.4ms (1.6/2.6 millio cycles) for double scalarmult 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 ve : EUR. lar. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software Bernstein, 1:6 faste 0.7 million cycles on Pentium for NIST P-224 scalarmult.

46 180nm 32-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 design 13 ( 2001 Intel Pentium 4 ): certainly has some impact: 0.46ms (0.9 million cycles) for Curve25519 scalarmult using floating-point multiplier. Integer multiplier is much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. Nobody has ever bothered adapting this to signatures. Would be 0:6ms for verify Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 verif. same basic microarchitecture, more instructions, faster clock. Ed25519 would be >10 faster on one core than Petit s software Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult.

47 2-bit 2GHz Willamette 12 Bad ECDSA-NIST-P-256 design Bro ntel Pentium 4 ): certainly has some impact: Menezes 0.9 million cycles) scalarmult ating-point multiplier. ultiplier is much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. cycles on e.g., P millio 2.7 millio has ever bothered this to signatures. e 0:6ms for verify. entium D (dual core): sic microarchitecture, tructions, faster clock. would be >10 faster ore than Petit s software Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult Be 0.7 millio 0.8 millio 0.9 millio using co OpenSSL 2.0 millio

48 z Willamette 12 Bad ECDSA-NIST-P-256 design Brown Hank m 4 ): certainly has some impact: Menezes software cycles) larmult t multiplier. s much slower! can t use fastest mulmods; can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. cycles on P4 than e.g., P-224 scalarm 1.2 million cycles o 2.7 million cycles o othered gnatures. for verify. (dual core): rchitecture, faster clock. >10 faster etit s software Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult Bernstein P million cycles o 0.8 million cycles o 0.9 million cycles o using compressed k OpenSSL 1.0.1, P- 2.0 million cycles o

49 tte 12 Bad ECDSA-NIST-P-256 design Brown Hankerson Lóp certainly has some impact: Menezes software uses many can t use fastest mulmods; cycles on P4 than on PII. r. wer! can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. e.g., P-224 scalarmult: 1.2 million cycles on Pentium 2.7 million cycles on Pentium e):, k. ter ware Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult Bernstein P-224 scalar 0.7 million cycles on Pentium 0.8 million cycles on Pentium 0.9 million cycles on Pentium using compressed keys. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium

50 Bad ECDSA-NIST-P-256 design Brown Hankerson López 14 certainly has some impact: Menezes software uses many more can t use fastest mulmods; cycles on P4 than on PII. can t use fastest curve formulas; need an annoying inversion; etc. Typical estimate: 2 slower. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium Brown Hankerson López Menezes on 400MHz Pentium II: 4.0ms/6.4ms (1.6/2.6 million cycles) for double scalarmult inside NIST P-224/P-256 verif Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium million cycles on Pentium 4 using compressed keys Bernstein, 1:6 faster: 0.7 million cycles on Pentium II for NIST P-224 scalarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D.

51 SA-NIST-P-256 design Brown Hankerson López 14 How did has some impact: Menezes software uses many more 17 millio se fastest mulmods; cycles on P4 than on PII. 22 millio se fastest curve formulas; n annoying inversion; ical estimate: 2 slower. wn Hankerson López on 400MHz Pentium II:.4ms (1.6/2.6 million or double scalarmult ST P-224/P-256 verif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium million cycles on Pentium 4 using compressed keys. Presuma bad mulm Why did ECDSA, underlyin Why did previous rnstein, 1:6 faster: n cycles on Pentium II P-224 scalarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Why did

52 -P-256 design Brown Hankerson López 14 How did Petit man impact: Menezes software uses many more 17 million cycles fo mulmods; cycles on P4 than on PII. 22 million cycles fo curve formulas; g inversion; te: 2 slower. erson López Hz Pentium II: 2.6 million scalarmult /P-256 verif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium million cycles on Pentium 4 using compressed keys. Presumably some bad mulmod and b Why did Petit reim ECDSA, using MIR underlying arithme Why did Petit not previous speed lite 1:6 faster: n Pentium II alarmult. OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Petit cho Why did BHLM ch

53 sign Brown Hankerson López 14 How did Petit manage to us Menezes software uses many more 17 million cycles for P-224 v ; cycles on P4 than on PII. 22 million cycles for P-256 v ulas; ; ower. ez m II: n rif. e.g., P-224 scalarmult: 1.2 million cycles on Pentium II. 2.7 million cycles on Pentium Bernstein P-224 scalarmult: 0.7 million cycles on Pentium II. 0.8 million cycles on Pentium million cycles on Pentium 4 using compressed keys. Presumably some combinatio bad mulmod and bad curve Why did Petit reimplement ECDSA, using MIRACL for t underlying arithmetic? Why did Petit not simply cit previous speed literature? r: II OpenSSL 1.0.1, P-224 verif: 2.0 million cycles on Pentium D. Why did Petit choose Pentiu Why did BHLM choose PII?

54 2000 Brown Hankerson López 14 How did Petit manage to use 15 Menezes software uses many more 17 million cycles for P-224 verif, cycles on P4 than on PII. 22 million cycles for P-256 verif? e.g., P-224 scalarmult: Presumably some combination of 1.2 million cycles on Pentium II. bad mulmod and bad curve ops. 2.7 million cycles on Pentium 4. Why did Petit reimplement 2001 Bernstein P-224 scalarmult: ECDSA, using MIRACL for the 0.7 million cycles on Pentium II. underlying arithmetic? 0.8 million cycles on Pentium million cycles on Pentium 4 using compressed keys. Why did Petit not simply cite previous speed literature? OpenSSL 1.0.1, P-224 verif: Why did Petit choose Pentium D? 2.0 million cycles on Pentium D. Why did BHLM choose PII?

55 wn Hankerson López 14 How did Petit manage to use 15 Petit: T software uses many more 17 million cycles for P-224 verif, cryptogr P4 than on PII. 22 million cycles for P-256 verif? OpenSSL 24 scalarmult: n cycles on Pentium II. n cycles on Pentium 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors comparis that MIR performa rnstein P-224 scalarmult: ECDSA, using MIRACL for the elliptic c n cycles on Pentium II. underlying arithmetic? n cycles on Pentium 4. n cycles on Pentium 4 mpressed keys. Why did Petit not simply cite previous speed literature? 1.0.1, P-224 verif: Why did Petit choose Pentium D? n cycles on Pentium D. Why did BHLM choose PII?

56 erson López 14 How did Petit manage to use 15 Petit: There are uses many more 17 million cycles for P-224 verif, cryptographic libra on PII. 22 million cycles for P-256 verif? OpenSSL and Cryp ult: n Pentium II. n Pentium 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors in [21] pro comparison and co that MIRACL has performance for op 24 scalarmult: ECDSA, using MIRACL for the elliptic curves over n Pentium II. underlying arithmetic? n Pentium 4. n Pentium 4 eys. Why did Petit not simply cite previous speed literature? 224 verif: Why did Petit choose Pentium D? n Pentium D. Why did BHLM choose PII?

57 ez 14 How did Petit manage to use 15 Petit: There are three main more 17 million cycles for P-224 verif, cryptographic libraries: MIR 22 million cycles for P-256 verif? OpenSSL and Crypto++. II. 4. Presumably some combination of bad mulmod and bad curve ops. Why did Petit reimplement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations o mult: ECDSA, using MIRACL for the elliptic curves over binary fie II. underlying arithmetic? 4. 4 Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? D. Why did BHLM choose PII?

58 How did Petit manage to use 15 Petit: There are three main million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Presumably some combination of bad mulmod and bad curve ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields. underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?

59 How did Petit manage to use 15 Petit: There are three main million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Presumably some combination of bad mulmod and bad curve ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields. underlying arithmetic? But NIST P-224 and NIST P-256 Why did Petit not simply cite are defined over prime fields! previous speed literature? [21] says For elliptic curves Why did Petit choose Pentium D? over prime fields, OpenSSL has Why did BHLM choose PII? the best performance under all platforms.

60 Petit manage to use 15 Petit: There are three main 16 More gen n cycles for P-224 verif, cryptographic libraries: MIRACL, Paper an n cycles for P-256 verif? OpenSSL and Crypto++. crypto u bly some combination of od and bad curve ops. Petit reimplement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the cr Why is t Why sho using MIRACL for the elliptic curves over binary fields. If the cr g arithmetic? Petit not simply cite speed literature? But NIST P-224 and NIST P-256 are defined over prime fields! [21] says For elliptic curves Paper is Look, he More like More like Petit choose Pentium D? over prime fields, OpenSSL has funding BHLM choose PII? the best performance under all platforms.

61 age to use 15 Petit: There are three main 16 More general situa r P-224 verif, cryptographic libraries: MIRACL, Paper analyzes imp r P-256 verif? OpenSSL and Crypto++. crypto upon an ap combination of ad curve ops. plement Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto soun Why is the paper i Why should it be p ACL for the elliptic curves over binary fields. If the crypto soun tic? simply cite rature? But NIST P-224 and NIST P-256 are defined over prime fields! [21] says For elliptic curves Paper is more inte Look, here s a spe More likely to be p More likely to mot ose Pentium D? over prime fields, OpenSSL has funding to fix the oose PII? the best performance under all platforms.

62 e 15 Petit: There are three main 16 More general situation: erif, cryptographic libraries: MIRACL, Paper analyzes impact of erif? OpenSSL and Crypto++. crypto upon an application. n of ops. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto sounds fast: Why is the paper interesting Why should it be published? he elliptic curves over binary fields. If the crypto sounds slower: e But NIST P-224 and NIST P-256 are defined over prime fields! Paper is more interesting. Look, here s a speed problem More likely to be published. [21] says For elliptic curves More likely to motivate m D? over prime fields, OpenSSL has funding to fix the problem. the best performance under all platforms.

63 Petit: There are three main 16 More general situation: 17 cryptographic libraries: MIRACL, Paper analyzes impact of OpenSSL and Crypto++. crypto upon an application. Authors in [21] proposed a comparison and concluded that MIRACL has the best performance for operations on If the crypto sounds fast: Why is the paper interesting? Why should it be published? elliptic curves over binary fields. If the crypto sounds slower: But NIST P-224 and NIST P-256 are defined over prime fields! Paper is more interesting. Look, here s a speed problem! More likely to be published. [21] says For elliptic curves More likely to motivate over prime fields, OpenSSL has funding to fix the problem. the best performance under all platforms.

64 here are three main 16 More general situation: 17 Obvious aphic libraries: MIRACL, Paper analyzes impact of applicati and Crypto++. crypto upon an application. deploym in [21] proposed a on and concluded ACL has the best nce for operations on urves over binary fields. P-224 and NIST P-256 ed over prime fields! If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! More likely to be published. Many ra answerin CPU to literature mulmod, Slowest, are most For elliptic curves e fields, OpenSSL has performance under all s. More likely to motivate funding to fix the problem. Situation randomn There s n deliberat

65 three main 16 More general situation: 17 Obvious question w ries: MIRACL, Paper analyzes impact of application conside to++. crypto upon an application. deployment: Is it posed a ncluded the best erations on binary fields. nd NIST P-256 ime fields! If the crypto sounds fast: Why is the paper interesting? Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! More likely to be published. Many random met answering this que CPU to test? Wha literature and libra mulmod, or curve Slowest, least com are most likely to b tic curves penssl has ce under all More likely to motivate funding to fix the problem. Situation is fully e randomness + nat There s no evidenc deliberately slowed

66 16 More general situation: 17 Obvious question whenever a ACL, Paper analyzes impact of application considers crypto crypto upon an application. deployment: Is it fast enou If the crypto sounds fast: Many random methodologie Why is the paper interesting? answering this question. Wh n lds Why should it be published? If the crypto sounds slower: Paper is more interesting. Look, here s a speed problem! CPU to test? What to take literature and libraries? Reus mulmod, or curve ops, or mo Slowest, least competent ans More likely to be published. are most likely to be publish as all More likely to motivate funding to fix the problem. Situation is fully explainable randomness + natural select There s no evidence that Pet deliberately slowed down cry

67 More general situation: 17 Obvious question whenever an 18 Paper analyzes impact of application considers crypto crypto upon an application. deployment: Is it fast enough? If the crypto sounds fast: Many random methodologies for Why is the paper interesting? answering this question. Which Why should it be published? CPU to test? What to take from If the crypto sounds slower: Paper is more interesting. literature and libraries? Reuse mulmod, or curve ops, or more? Look, here s a speed problem! Slowest, least competent answers More likely to be published. are most likely to be published. More likely to motivate funding to fix the problem. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit deliberately slowed down crypto.

68 eral situation: 17 Obvious question whenever an 18 Paper in alyzes impact of application considers crypto software pon an application. deployment: Is it fast enough? incentive ypto sounds fast: he paper interesting? Many random methodologies for answering this question. Which slow, and report it uld it be published? CPU to test? What to take from Paper w ypto sounds slower: more interesting. re s a speed problem! ly to be published. ly to motivate to fix the problem. literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit functions lengths, timing m maximiz from old This is n what ma deliberately slowed down crypto.

69 tion: 17 Obvious question whenever an 18 Paper introducing act of application considers crypto software or hardwa plication. deployment: Is it fast enough? incentive to report ds fast: nteresting? Many random methodologies for answering this question. Which slow, and analogou report its own cryp ublished? CPU to test? What to take from Paper will naturall ds slower: resting. ed problem! ublished. ivate problem. literature and libraries? Reuse mulmod, or curve ops, or more? Slowest, least competent answers are most likely to be published. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit functions, paramet lengths, platforms, timing mechanism maximize reported from old to new. This is not the sam what matters most deliberately slowed down crypto.

70 17 Obvious question whenever an 18 Paper introducing new crypt application considers crypto software or hardware has sam deployment: Is it fast enough? incentive to report older cryp? Many random methodologies for answering this question. Which slow, and analogous incentiv report its own crypto as fast CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input mulmod, or curve ops, or more? lengths, platforms, I/O form! Slowest, least competent answers are most likely to be published. timing mechanism, etc. that maximize reported improvem from old to new. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit This is not the same as selec what matters most for the u deliberately slowed down crypto.

71 Obvious question whenever an 18 Paper introducing new crypto 19 application considers crypto software or hardware has same deployment: Is it fast enough? incentive to report older crypto as Many random methodologies for answering this question. Which slow, and analogous incentive to report its own crypto as fast. CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input mulmod, or curve ops, or more? lengths, platforms, I/O format, Slowest, least competent answers are most likely to be published. timing mechanism, etc. that maximize reported improvement from old to new. Situation is fully explainable by randomness + natural selection. There s no evidence that Petit This is not the same as selecting what matters most for the users. deliberately slowed down crypto.

72 question whenever an 18 Paper introducing new crypto 19 Bit oper on considers crypto software or hardware has same (assumin ent: Is it fast enough? incentive to report older crypto as as listed ndom methodologies for g this question. Which slow, and analogous incentive to report its own crypto as fast. key ops test? What to take from and libraries? Reuse or curve ops, or more? least competent answers likely to be published. is fully explainable by ess + natural selection. o evidence that Petit ely slowed down crypto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users

73 henever an 18 Paper introducing new crypto 19 Bit operations per rs crypto software or hardware has same (assuming precomp fast enough? incentive to report older crypto as as listed in recent hodologies for stion. Which slow, and analogous incentive to report its own crypto as fast. key ops/bit ciphe t to take from ries? Reuse ops, or more? petent answers e published. xplainable by ural selection. e that Petit down crypto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users Simo NOE Skin Simo PRE Skin Picco AES AES

74 n 18 Paper introducing new crypto 19 Bit operations per bit of pla software or hardware has same (assuming precomputed subk gh? incentive to report older crypto as as listed in recent Skinny pa s for ich slow, and analogous incentive to report its own crypto as fast. key ops/bit cipher from e re? wers ed. by ion. it pto. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users Simon: 60 ops NOEKEON Skinny Simon: 106 op PRESENT Skinny Piccolo AES AES

75 Paper introducing new crypto 19 Bit operations per bit of plaintext 20 software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as as listed in recent Skinny paper: slow, and analogous incentive to report its own crypto as fast. key ops/bit cipher Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users Simon: 60 ops broken NOEKEON Skinny Simon: 106 ops broken PRESENT Skinny Piccolo AES AES

76 Paper introducing new crypto 19 Bit operations per bit of plaintext 20 software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as not entirely listed in Skinny paper: slow, and analogous incentive to report its own crypto as fast. Paper will naturally select functions, parameters, input lengths, platforms, I/O format, timing mechanism, etc. that maximize reported improvement from old to new. This is not the same as selecting what matters most for the users. key ops/bit cipher Salsa20/ Salsa20/ Simon: 60 ops broken NOEKEON Skinny Salsa Simon: 106 ops broken PRESENT Skinny Piccolo AES AES

77 troducing new crypto 19 Bit operations per bit of plaintext 20 Many ba or hardware has same (assuming precomputed subkeys), backed b to report older crypto as analogous incentive to s own crypto as fast. ill naturally select, parameters, input platforms, I/O format, echanism, etc. that e reported improvement to new. ot the same as selecting tters most for the users. not entirely listed in Skinny paper: key ops/bit cipher Salsa20/ Salsa20/ Simon: 60 ops broken NOEKEON Skinny Salsa Simon: 106 ops broken PRESENT Skinny Piccolo AES AES e.g. Do w optimize the older Rely on We com most arc do much complete heuristic get little where th slightly w

78 new crypto 19 Bit operations per bit of plaintext 20 Many bad example re has same (assuming precomputed subkeys), backed by tons of older crypto as s incentive to to as fast. y select ers, input I/O format,, etc. that improvement e as selecting for the users. not entirely listed in Skinny paper: key ops/bit cipher Salsa20/ Salsa20/ Simon: 60 ops broken NOEKEON Skinny Salsa Simon: 106 ops broken PRESENT Skinny Piccolo AES AES e.g. Do we bother optimized impleme the older crypto? Rely on optimizin We come so close most architectures do much more wit complete algorithm heuristics. We can get little niggles he where the heuristic slightly wrong answ

79 o 19 Bit operations per bit of plaintext 20 Many bad examples to imita e (assuming precomputed subkeys), backed by tons of misinform to as e to. at, ent ting sers. not entirely listed in Skinny paper: key ops/bit cipher Salsa20/ Salsa20/ Simon: 60 ops broken NOEKEON Skinny Salsa Simon: 106 ops broken PRESENT Skinny Piccolo AES AES e.g. Do we bother searching optimized implementations o the older crypto? Take any Rely on optimizing compil We come so close to optim most architectures that we c do much more without using complete algorithms instead heuristics. We can only try t get little niggles here and th where the heuristics get slightly wrong answers.

80 Bit operations per bit of plaintext 20 Many bad examples to imitate, 21 (assuming precomputed subkeys), backed by tons of misinformation. not entirely listed in Skinny paper: e.g. Do we bother searching for key ops/bit cipher Salsa20/ Salsa20/ Simon: 60 ops broken NOEKEON Skinny Salsa Simon: 106 ops broken PRESENT Skinny Piccolo AES AES optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

81 ations per bit of plaintext 20 Many bad examples to imitate, 21 Reality is g precomputed subkeys), backed by tons of misinformation. ely listed in Skinny paper: e.g. Do we bother searching for /bit cipher Salsa20/8 Salsa20/12 Simon: 60 ops broken NOEKEON Skinny Salsa20 Simon: 106 ops broken.2 PRESENT Skinny.75 Piccolo.5 AES.5 AES optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

82 bit of plaintext 20 Many bad examples to imitate, 21 Reality is more com uted subkeys), backed by tons of misinformation. in Skinny paper: e.g. Do we bother searching for r 20/8 20/12 n: 60 ops broken KEON ny 20 n: 106 ops broken SENT ny lo optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

83 intext 20 Many bad examples to imitate, 21 Reality is more complicated: eys), backed by tons of misinformation. paper: e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! broken Rely on optimizing compiler! We come so close to optimal on most architectures that we can t s broken do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

84 Many bad examples to imitate, 21 Reality is more complicated: 22 backed by tons of misinformation. e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! Rely on optimizing compiler! We come so close to optimal on most architectures that we can t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.

85 d examples to imitate, 21 Reality is more complicated: 22 SUPERC y tons of misinformation. includes e bother searching for d implementations of of 595 cr >20 imp crypto? Take any code! Haswell: optimizing compiler! impleme e so close to optimal on hitectures that we can t more without using NP gcc -O3 is 6:15 Salsa20 algorithms instead of merged s. We can only try to with m niggles here and there optimiza e heuristics get compiler rong answers.

86 s to imitate, 21 Reality is more complicated: 22 SUPERCOP bench misinformation. includes 2155 impl searching for ntations of of 595 cryptograph >20 implementatio Take any code! Haswell: Reasonab g compiler! implementation co to optimal on that we can t hout using NP gcc -O3 -fomit-f is 6:15 slower th Salsa20 implement s instead of merged implement only try to with machine-ind re and there optimizations and s get compiler options: ers.

87 te, 21 Reality is more complicated: 22 SUPERCOP benchmarking t ation. includes 2155 implementatio for f of 595 cryptographic primitiv >20 implementations of Sals code! Haswell: Reasonably simple er! implementation compiled wi al on an t NP gcc -O3 -fomit-frame-poi is 6:15 slower than fastest Salsa20 implementation. of merged implementation o with machine-independent ere optimizations and best of 12 compiler options: 4:52 slow

88 Reality is more complicated: 22 SUPERCOP benchmarking toolkit 23 includes 2155 implementations of 595 cryptographic primitives. >20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15 slower than fastest Salsa20 implementation. merged implementation with machine-independent optimizations and best of 121 compiler options: 4:52 slower.

89 more complicated: 22 SUPERCOP benchmarking toolkit 23 Another includes 2155 implementations lattice-ba of 595 cryptographic primitives. means g >20 implementations of Salsa20. of rando Haswell: Reasonably simple ref implementation compiled with Valencia gcc -O3 -fomit-frame-pointer Regazzo is 6:15 slower than fastest sources o Salsa20 implementation. discrete merged implementation benchma with machine-independent Qualitati optimizations and best of 121 choice of compiler options: 4:52 slower. sampling

90 plicated: 22 SUPERCOP benchmarking toolkit 23 Another interesting includes 2155 implementations lattice-based signin of 595 cryptographic primitives. means generating >20 implementations of Salsa20. of random Gaussia Haswell: Reasonably simple ref Brannigan implementation compiled with Valencia O Sulliva gcc -O3 -fomit-frame-pointer Regazzoni An inv is 6:15 slower than fastest sources of random Salsa20 implementation. discrete Gaussian s merged implementation benchmarks for RN with machine-independent Qualitatively large optimizations and best of 121 choice of RNG compiler options: 4:52 slower. sampling cost o

91 22 23 SUPERCOP benchmarking toolkit includes 2155 implementations of 595 cryptographic primitives. >20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6:15 slower than fastest Salsa20 implementation. merged implementation with machine-independent optimizations and best of 121 compiler options: 4:52 slower. Another interesting example lattice-based signing typicall means generating a huge nu of random Gaussian samples Brannigan Smyth O Valencia O Sullivan Güneys Regazzoni An investigation sources of randomness withi discrete Gaussian sampling : benchmarks for RNGs, samp Qualitatively large impacts: choice of RNG cost of sampling cost of signing.

92 SUPERCOP benchmarking toolkit 23 Another interesting example: 24 includes 2155 implementations lattice-based signing typically of 595 cryptographic primitives. means generating a huge number >20 implementations of Salsa20. of random Gaussian samples. Haswell: Reasonably simple ref Brannigan Smyth Oder implementation compiled with Valencia O Sullivan Güneysu gcc -O3 -fomit-frame-pointer Regazzoni An investigation of is 6:15 slower than fastest sources of randomness within Salsa20 implementation. discrete Gaussian sampling : merged implementation benchmarks for RNGs, samplers. with machine-independent Qualitatively large impacts: optimizations and best of 121 choice of RNG cost of compiler options: 4:52 slower. sampling cost of signing.

93 OP benchmarking toolkit 23 Another interesting example: 24 Two exa 2155 implementations lattice-based signing typically in this 20 yptographic primitives. means generating a huge number Skylake lementations of Salsa20. of random Gaussian samples M Reasonably simple ref Brannigan Smyth Oder cycles/by ntation compiled with Valencia O Sullivan Güneysu using AE -fomit-frame-pointer Regazzoni An investigation of (32 cycle slower than fastest sources of randomness within implementation. discrete Gaussian sampling : implementation benchmarks for RNGs, samplers. achine-independent Qualitatively large impacts: tions and best of 121 choice of RNG cost of options: 4:52 slower. sampling cost of signing.

94 marking toolkit 23 Another interesting example: 24 Two examples of s ementations lattice-based signing typically in this 2017 paper ic primitives. means generating a huge number Skylake (Intel Core ns of Salsa20. of random Gaussian samples MByte/sec ly simple ref Brannigan Smyth Oder cycles/byte) for AE mpiled with Valencia O Sullivan Güneysu using AES-NI; 106 rame-pointer Regazzoni An investigation of (32 cycles/byte) fo an fastest sources of randomness within ation. discrete Gaussian sampling : ation benchmarks for RNGs, samplers. ependent Qualitatively large impacts: best of 121 choice of RNG cost of 4:52 slower. sampling cost of signing.

95 oolkit 23 Another interesting example: 24 Two examples of speed repo ns lattice-based signing typically in this 2017 paper for a 3.4G es. means generating a huge number Skylake (Intel Core i7-6700): a20. of random Gaussian samples MByte/sec (8.86 ref Brannigan Smyth Oder cycles/byte) for AES CTR-D th Valencia O Sullivan Güneysu using AES-NI; MByte nter Regazzoni An investigation of (32 cycles/byte) for ChaCha sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. Qualitatively large impacts: 1 choice of RNG cost of er. sampling cost of signing.

96 Another interesting example: 24 Two examples of speed reported 25 lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples MByte/sec ( Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG Valencia O Sullivan Güneysu using AES-NI; MByte/sec Regazzoni An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG cost of sampling cost of signing.

97 Another interesting example: 24 Two examples of speed reported 25 lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples MByte/sec ( Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG Valencia O Sullivan Güneysu using AES-NI; MByte/sec Regazzoni An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within discrete Gaussian sampling : benchmarks for RNGs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Qualitatively large impacts: choice of RNG cost of sampling cost of signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

98 interesting example: 24 Two examples of speed reported 25 sed signing typically in this 2017 paper for a 3.4GHz enerating a huge number Skylake (Intel Core i7-6700): m Gaussian samples MByte/sec (8.86 Brannigan Smyth Oder cycles/byte) for AES CTR-DRBG O Sullivan Güneysu using AES-NI; MByte/sec ni An investigation of (32 cycles/byte) for ChaCha20. f randomness within Gaussian sampling : rks for RNGs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. vely large impacts: RNG cost of cost of signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

99 example: 24 Two examples of speed reported 25 g typically in this 2017 paper for a 3.4GHz a huge number Skylake (Intel Core i7-6700): n samples MByte/sec (8.86 Smyth Oder cycles/byte) for AES CTR-DRBG n Güneysu using AES-NI; MByte/sec estigation of (32 cycles/byte) for ChaCha20. ness within ampling : Gs, samplers. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. impacts: cost of f signing. Author non-response: essential for us to examine standard open implementations. Slow ones?

100 : 24 Two examples of speed reported 25 y in this 2017 paper for a 3.4GHz mber Skylake (Intel Core i7-6700): MByte/sec (8.86 der cycles/byte) for AES CTR-DRBG u using AES-NI; MByte/sec of (32 cycles/byte) for ChaCha20. n But wait. ebacs reports lers cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: essential for us to examine standard open implementations. Slow ones?

101 Two examples of speed reported in this 2017 paper for a 3.4GHz Skylake (Intel Core i7-6700): MByte/sec (8.86 cycles/byte) for AES CTR-DRBG using AES-NI; MByte/sec (32 cycles/byte) for ChaCha20. But wait. ebacs reports 0.92 cycles/byte for AES-256-CTR, 1.18 cycles/byte for ChaCha20. Author non-response: essential for us to examine standard open implementations. Slow ones? 25 26

102 mples of speed reported 17 paper for a 3.4GHz (Intel Core i7-6700): Byte/sec (8.86 te) for AES CTR-DRBG S-NI; MByte/sec s/byte) for ChaCha20.. ebacs reports les/byte for AES-256-CTR, les/byte for ChaCha20. on-response: essential examine standard open ntations. Slow ones? 25 26

103 peed reported for a 3.4GHz i7-6700): (8.86 S CTR-DRBG.07 MByte/sec r ChaCha20. reports r AES-256-CTR, r ChaCha20. se: essential standard open Slow ones? 25 26

104 rted Hz RBG /sec 20. -CTR, 20. tial pen s?

105 26 27

106 26 27

107 26 27

108 26 27

How cryptographic benchmarking goes wrong. Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance.

How cryptographic benchmarking goes wrong 1 Daniel J. Bernstein Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance. PRESERVE, ending 2015.06.30, was a European