Meltdown & Spectre. Side-channels considered harmful. Qualcomm Mobile Security Summit May, San Diego, CA. Moritz Lipp
|
|
- Alice Alberta McDaniel
- 5 years ago
- Views:
Transcription
1 Meltdown & Spectre Side-channels considered harmful Qualcomm Mobile Security Summit May, San Diego, CA Moritz Lipp (@mlqxyz) Michael Schwarz (@misc0110)
2 Flashback Qualcomm Mobile Security Summit Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
3 Flashback: Side-channel attacks Safe software infrastructure does not mean safe execution Information leaks because of the underlying hardware Side-channel attacks exploit unintentional information leakage by side-effects Power consumption Execution time CPU cache... Performance optimizations often induce side-channel leakage 3 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
4 Outline Last year Leaking keystroke timings via the cache Leaking AES keys from the cache Covertly sending data through the cache Rowhammer: flipping bits in DRAM 4 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
5 Outline Last year Leaking keystroke timings via the cache Leaking AES keys from the cache Covertly sending data through the cache Rowhammer: flipping bits in DRAM This year More leakage! Build upon the tools of last year Leaking arbitrary memory content Leaking privileged register content 4 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
6 Whoami Moritz Lipp PhD Graz University of mail@mlq.me 5 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
7 Whoami Michael Schwarz PhD Graz University of michael.schwarz91@gmail.com 6 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
8 Let s Read Kernel Memory from User Space!
9 Building the Code Find something human readable, e.g., the Linux version # sudo grep linux_banner /proc/kallsyms ffffffff81a000e0 R linux_banner 7 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
10 Building the Code char data = *(char*) 0xffffffff81a000e0; printf("%c\n", data); 8 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
11 Building the Code Compile and run segfault at ffffffff81a000e0 ip sp 00007ffce4a80610 error 5 in reader 9 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
12 Building the Code Compile and run segfault at ffffffff81a000e0 ip sp 00007ffce4a80610 error 5 in reader Kernel addresses are of course not accessible 9 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
13 Building the Code Compile and run segfault at ffffffff81a000e0 ip sp 00007ffce4a80610 error 5 in reader Kernel addresses are of course not accessible Any invalid access throws an exception segmentation fault 9 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
14 Building the Code Just catch the segmentation fault! 10 Moritz Lipp Michael Schwarz
15 Building the Code Just catch the segmentation fault! We can simply install a signal handler 10 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
16 Building the Code Just catch the segmentation fault! We can simply install a signal handler And if an exception occurs, just jump back and continue 10 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
17 Building the Code Just catch the segmentation fault! We can simply install a signal handler And if an exception occurs, just jump back and continue Then we can read the value 10 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
18 Building the Code Just catch the segmentation fault! We can simply install a signal handler And if an exception occurs, just jump back and continue Then we can read the value Sounds like a good idea 10 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
19 Building the Code Still no kernel memory 11 Moritz Lipp Michael Schwarz
20 Building the Code Still no kernel memory Privilege checks seem to work 11 Moritz Lipp Michael Schwarz
21 Building the Code Still no kernel memory Privilege checks seem to work Maybe it is not that straight forward 11 Moritz Lipp Michael Schwarz
22 Building the Code Still no kernel memory Privilege checks seem to work Maybe it is not that straight forward Back to the drawing board 11 Moritz Lipp Michael Schwarz
23 Caches and Cache Attacks
24 CPU Cache printf("%d", i); printf("%d", i); 12 Moritz Lipp Michael Schwarz
25 CPU Cache printf("%d", i); printf("%d", i); Cache miss 12 Moritz Lipp Michael Schwarz
26 CPU Cache printf("%d", i); printf("%d", i); Cache miss Request 12 Moritz Lipp Michael Schwarz
27 CPU Cache printf("%d", i); printf("%d", i); Cache miss Request Response 12 Moritz Lipp Michael Schwarz
28 CPU Cache printf("%d", i); printf("%d", i); Cache miss i Request Response 12 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
29 CPU Cache printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 12 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
30 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit i Request Response 12 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
31 CPU Cache DRAM access, slow printf("%d", i); printf("%d", i); Cache miss Cache hit i No DRAM access, much faster Request Response 12 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
32 Memory Access Latency 10 4 Number of accesses Cache hit Cache miss ,000 1,100 1,200 Measured access time (CPU cycles) 13 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
33 Flush+Reload ATTACKER Shared Memory VICTIM flush access access 14 Moritz Lipp Michael Schwarz
34 Flush+Reload ATTACKER Shared Memory VICTIM flush access cached Shared Memory cached access 14 Moritz Lipp Michael Schwarz
35 Flush+Reload ATTACKER Shared Memory VICTIM flush access Shared Memory access 14 Moritz Lipp Michael Schwarz
36 Flush+Reload ATTACKER Shared Memory VICTIM flush access access 14 Moritz Lipp Michael Schwarz
37 Flush+Reload ATTACKER Shared Memory VICTIM flush access access 14 Moritz Lipp Michael Schwarz
38 Flush+Reload ATTACKER Shared Memory VICTIM flush access Shared Memory access 14 Moritz Lipp Michael Schwarz
39 Flush+Reload ATTACKER Shared Memory VICTIM flush access Shared Memory access 14 Moritz Lipp Michael Schwarz
40 Flush+Reload ATTACKER Shared Memory VICTIM flush access Shared Memory access fast if victim accessed data, slow otherwise 14 Moritz Lipp Michael Schwarz
41 Operating Systems 101
42 Memory Isolation Kernel is isolated from user space This isolation is a combination of hardware and software User applications cannot access anything from the kernel There is only a well-defined interface called system calls Userspace Applications Kernelspace Operating System Memory 15 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
43 Out-of-order Execution
44 Out-of-order Execution int width = 10, height = 5; float diagonal = sqrt(width * width + height * height); int area = width * height; printf("area %d x %d = %d\n", width, height, area); 16 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
45 Out-of-order Execution Dependency int width = 10, height = 5; float diagonal = sqrt(width * width + height * height); int area = width * height; Parallelize printf("area %d x %d = %d\n", width, height, area); 16 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
46 Out-of-order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop MUX Allocation Queue CDB Execution Engine Reorder buffer µop µop µop µop µop µop µop µop Execution Units µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Instructions are fetched and decoded in the front-end Instructions are dispatched to the backend Instructions are processed by individual execution units Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 17 Moritz Lipp Michael Schwarz
47 Out-of-order Execution L1 Instruction Cache ITLB Frontend Branch Predictor µop Cache µops Instruction Fetch & PreDecode Instruction Queue 4-Way Decode µop µop µop µop CDB Execution Engine Reorder buffer µop µop µop µop µop µop µop µop Execution Units MUX Allocation Queue µop µop µop µop Scheduler µop µop µop µop µop µop µop µop ALU, AES,... ALU, FMA,... ALU, Vect,... ALU, Branch Load data Load data Store data AGU Instructions are executed out-of-order Instructions wait until their dependencies are ready Later instructions might execute prior earlier instructions Instructions retire in-order State becomes architecturally visible Memory Subsystem Load Buffer Store Buffer DTLB L1 Data Cache STLB L2 Cache 18 Moritz Lipp Michael Schwarz
48 Reading memory If an application reads memory, permissions are checked... data is loaded If an application tries to read inaccessible memory, an error occurs... application is stopped But what happens if the checks are reordered? Would we know? 19 Moritz Lipp Michael Schwarz
49 Building the Code Adapted code *(volatile char*)0; array[84 * 4096] = 0; 20 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
50 Building the Code Adapted code *(volatile char*)0; array[84 * 4096] = 0; volatile because compiler was not happy warning : s t a t e m e n t with no e f f e c t [ Wunused v a l u e ] ( char ) 0 ; 20 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
51 Building the Code Adapted code *(volatile char*)0; array[84 * 4096] = 0; volatile because compiler was not happy warning : s t a t e m e n t with no e f f e c t [ Wunused v a l u e ] ( char ) 0 ; Static code analyzer is still not happy warning : D e r e f e r e n c e o f n u l l p o i n t e r ( v o l a t i l e char ) 0 ; 20 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
52 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed 21 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
53 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Unreachable code line was actually executed Exception was only thrown afterwards 21 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
54 Building the Code Out-of-order instructions leave microarchitectural traces 22 Moritz Lipp Michael Schwarz
55 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache 22 Moritz Lipp Michael Schwarz
56 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache Give such instructions a name: transient instructions 22 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
57 Building the Code Out-of-order instructions leave microarchitectural traces We can see them for example in the cache Give such instructions a name: transient instructions We can indirectly observe the execution of transient instructions 22 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
58 Building the Code Maybe there is no permission check in transient instructions Moritz Lipp Michael Schwarz
59 Building the Code Maybe there is no permission check in transient instructions......or it is only done when commiting them 23 Moritz Lipp Michael Schwarz
60 Building the Code Maybe there is no permission check in transient instructions......or it is only done when commiting them Add another layer of indirection to test char data = *(char*) 0xffffffff81a000e0; array[data * 4096] = 0; 23 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
61 Building the Code Maybe there is no permission check in transient instructions......or it is only done when commiting them Add another layer of indirection to test char data = *(char*) 0xffffffff81a000e0; array[data * 4096] = 0; Then check whether any part of array is cached 23 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
62 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data 24 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
63 Building the Code Flush+Reload over all pages of the array Access time [cycles] Page Index of cache hit reveals data Permission check is in some cases not fast enough 24 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
64 Meltdown Using out-of-order execution, we can read data at any address 25 Moritz Lipp Michael Schwarz
65 Meltdown Using out-of-order execution, we can read data at any address Privilege checks are sometimes too slow 25 Moritz Lipp Michael Schwarz
66 Meltdown Using out-of-order execution, we can read data at any address Privilege checks are sometimes too slow Allows to leak kernel memory 25 Moritz Lipp Michael Schwarz
67 Meltdown Using out-of-order execution, we can read data at any address Privilege checks are sometimes too slow Allows to leak kernel memory Entire physical memory is typically also accessible in kernel address space 25 Moritz Lipp Michael Schwarz
68 Meltdown Using out-of-order execution, we can read data at any address Privilege checks are sometimes too slow Allows to leak kernel memory Entire physical memory is typically also accessible in kernel address space Works on Intel CPUs and ARM Cortex-A75 25 Moritz Lipp Michael Schwarz
69 Demo
70 Can we fix that?
71 Take the kernel addresses... Kernel addresses in user space are a problem 26 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
72 Take the kernel addresses... Kernel addresses in user space are a problem Why don t we take the kernel addresses Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
73 ...and remove them...and remove them if not needed? 27 Moritz Lipp Michael Schwarz
74 ...and remove them...and remove them if not needed? User accessible check in hardware is not reliable 27 Moritz Lipp Michael Schwarz
75 Idea Let s just unmap the kernel in user space 28 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
76 Idea Let s just unmap the kernel in user space Kernel addresses are then no longer present 28 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
77 Idea Let s just unmap the kernel in user space Kernel addresses are then no longer present Memory which is not mapped cannot be accessed at all 28 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
78 Userspace Kernelspace Applications Operating System Memory
79 Kernel View User View Userspace Kernelspace Userspace Kernelspace Applications Operating System Memory Applications context switch
80 Kernel Address Space Isolation We published KAISER in July Moritz Lipp Michael Schwarz
81 Kernel Address Space Isolation We published KAISER in July 2017 Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation) 29 Moritz Lipp Michael Schwarz
82 Kernel Address Space Isolation We published KAISER in July 2017 Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation) Kernel patches available for arm64 29 Moritz Lipp Michael Schwarz
83 Kernel Address Space Isolation We published KAISER in July 2017 Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation) Kernel patches available for arm64 Microsoft and Apple implemented similar concepts, for x86 and ARM 29 Moritz Lipp Michael Schwarz
84 Kernel Address Space Isolation We published KAISER in July 2017 Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation) Kernel patches available for arm64 Microsoft and Apple implemented similar concepts, for x86 and ARM All share the same idea: switching address spaces on context switch 29 Moritz Lipp Michael Schwarz
85 But wait, what about privileged registers?
86 Meltdown for Registers (Variant 3a) ARM found a closely related Meltdown variant 30 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
87 Meltdown for Registers (Variant 3a) ARM found a closely related Meltdown variant Read of system registers that are not accessible from current exception level 30 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
88 Meltdown for Registers (Variant 3a) ARM found a closely related Meltdown variant Read of system registers that are not accessible from current exception level ARM Cortex-A15, Cortex-A57 and Cortex-A72 are vulnerable 30 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
89 Meltdown for Registers (Variant 3a) ARM found a closely related Meltdown variant Read of system registers that are not accessible from current exception level ARM Cortex-A15, Cortex-A57 and Cortex-A72 are vulnerable Impact: breaking KASLR and pointer authentication 30 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
90 Demo
91 Meltdown for Registers (Variant 3a) Solution? Substitute registers with dummy values on context switch 31 Moritz Lipp Michael Schwarz
92 Meltdown for Registers (Variant 3a) Solution? Substitute registers with dummy values on context switch Only necessary for virtual addresses and secrets 31 Moritz Lipp Michael Schwarz
93 Meltdown for Registers (Variant 3a) Solution? Substitute registers with dummy values on context switch Only necessary for virtual addresses and secrets only a few registers affected 31 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
94 Meltdown for Registers (Variant 3a) Solution? Substitute registers with dummy values on context switch Only necessary for virtual addresses and secrets only a few registers affected Performance overhead should be minimal 31 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
95 Where are Variant 1 and 2?
96 Speculative Execution CPU tries to predict the future (branch predictor), based on events learned in the past Speculative execution of instructions If the prediction was correct, very fast otherwise: Discard results Measurable side-effects? 32 Moritz Lipp Michael Schwarz
97 Spectre if <access in bounds> predicted 33 Moritz Lipp Michael Schwarz
98 Spectre if <access in bounds> true predicted 33 Moritz Lipp Michael Schwarz
99 Spectre if <access in bounds> true predicted true 33 Moritz Lipp Michael Schwarz
100 Spectre if <access in bounds> true predicted true 33 Moritz Lipp Michael Schwarz
101 Spectre if <access in bounds> true predicted true false 33 Moritz Lipp Michael Schwarz
102 Spectre if <access in bounds> true predicted true false 33 Moritz Lipp Michael Schwarz
103 Spectre if <access in bounds> true false predicted true false 33 Moritz Lipp Michael Schwarz
104 Spectre if <access in bounds> true false false predicted true false 33 Moritz Lipp Michael Schwarz
105 Spectre if <access in bounds> true false false predicted true false 33 Moritz Lipp Michael Schwarz
106 Spectre if <access in bounds> true predicted true false false false true 33 Moritz Lipp Michael Schwarz
107 Spectre if <access in bounds> true predicted true false false false true 33 Moritz Lipp Michael Schwarz
108 Spectre (Variant 1: Bounds-check bypass) index = 0; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
109 Spectre (Variant 1: Bounds-check bypass) index = 0; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
110 Spectre (Variant 1: Bounds-check bypass) index = 0; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
111 Spectre (Variant 1: Bounds-check bypass) index = 0; char* data = "textkey"; if (index < 4) Execute th els en e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
112 Spectre (Variant 1: Bounds-check bypass) index = 1; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
113 Spectre (Variant 1: Bounds-check bypass) index = 1; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
114 Spectre (Variant 1: Bounds-check bypass) index = 1; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
115 Spectre (Variant 1: Bounds-check bypass) index = 1; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
116 Spectre (Variant 1: Bounds-check bypass) index = 2; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
117 Spectre (Variant 1: Bounds-check bypass) index = 2; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
118 Spectre (Variant 1: Bounds-check bypass) index = 2; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
119 Spectre (Variant 1: Bounds-check bypass) index = 2; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
120 Spectre (Variant 1: Bounds-check bypass) index = 3; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
121 Spectre (Variant 1: Bounds-check bypass) index = 3; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
122 Spectre (Variant 1: Bounds-check bypass) index = 3; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
123 Spectre (Variant 1: Bounds-check bypass) index = 3; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
124 Spectre (Variant 1: Bounds-check bypass) index = 4; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
125 Spectre (Variant 1: Bounds-check bypass) index = 4; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
126 Spectre (Variant 1: Bounds-check bypass) index = 4; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
127 Spectre (Variant 1: Bounds-check bypass) index = 4; char* data = "textkey"; if (index < 4) Execute els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
128 Spectre (Variant 1: Bounds-check bypass) index = 5; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
129 Spectre (Variant 1: Bounds-check bypass) index = 5; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
130 Spectre (Variant 1: Bounds-check bypass) index = 5; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
131 Spectre (Variant 1: Bounds-check bypass) index = 5; char* data = "textkey"; if (index < 4) Execute els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
132 Spectre (Variant 1: Bounds-check bypass) index = 6; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
133 Spectre (Variant 1: Bounds-check bypass) index = 6; char* data = "textkey"; if (index < 4) then Prediction else LUT[data[index] * 4096] 0 34 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
134 Spectre (Variant 1: Bounds-check bypass) index = 6; char* data = "textkey"; if (index < 4) Speculate els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
135 Spectre (Variant 1: Bounds-check bypass) index = 6; char* data = "textkey"; if (index < 4) Execute els en th e Prediction LUT[data[index] * 4096] 34 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
136 Demo
137 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
138 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() () swim() fly sw Speculate im () Prediction LUT[data[index] * 4096] 35 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
139 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
140 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() Execute () fly swim() sw im () Prediction LUT[data[index] * 4096] 35 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
141 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
142 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() Speculate () fly fly() sw im () Prediction LUT[data[index] * 4096] 35 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
143 Spectre (Variant 2: Branch target injection) Animal* a = bird; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
144 Spectre (Variant 2: Branch target injection) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
145 Spectre (Variant 2: Branch target injection) Animal* a = fish; a->move() Speculate ) ( fly fly() sw im () Prediction LUT[data[index] * 4096] 35 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
146 Spectre (Variant 2: Branch target injection) Animal* a = fish; a->move() fly() fly() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
147 Spectre (Variant 2: Branch target injection) Animal* a = fish; a->move() () fly() fly Execute sw im () Prediction LUT[data[index] * 4096] 35 0 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
148 Spectre (Variant 2: Branch target injection) Animal* a = fish; a->move() fly() swim() swim() Prediction LUT[data[index] * 4096] 0 35 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
149 Demo
150 Spectre We can influence the CPU to mispredict the future 36 Moritz Lipp Michael Schwarz
151 Spectre We can influence the CPU to mispredict the future Convince other programs to reveal their secrets 36 Moritz Lipp Michael Schwarz
152 Spectre We can influence the CPU to mispredict the future Convince other programs to reveal their secrets Can even be triggered from the browser 36 Moritz Lipp Michael Schwarz
153 Spectre We can influence the CPU to mispredict the future Convince other programs to reveal their secrets Can even be triggered from the browser Untrusted code can convince trusted code to reveal secrets 36 Moritz Lipp Michael Schwarz
154 Spectre Demonstrated on Intel, AMD and ARM CPUs 37 Moritz Lipp Michael Schwarz
155 Spectre Demonstrated on Intel, AMD and ARM CPUs Affects processor with branch target speculation 37 Moritz Lipp Michael Schwarz
156 Spectre Demonstrated on Intel, AMD and ARM CPUs Affects processor with branch target speculation Much harder to fix, KAISER does not help 37 Moritz Lipp Michael Schwarz
157 Can we fix that?
158 Spectre Trivial approach: disable speculative execution 38 Moritz Lipp Michael Schwarz
159 Spectre Trivial approach: disable speculative execution No wrong speculation if there is no speculation 38 Moritz Lipp Michael Schwarz
160 Spectre Trivial approach: disable speculative execution No wrong speculation if there is no speculation Problem: massive performance hit! 38 Moritz Lipp Michael Schwarz
161 Spectre Trivial approach: disable speculative execution No wrong speculation if there is no speculation Problem: massive performance hit! Also: How to disable it? 38 Moritz Lipp Michael Schwarz
162 Spectre Trivial approach: disable speculative execution No wrong speculation if there is no speculation Problem: massive performance hit! Also: How to disable it? Speculative execution is deeply integrated into CPU 38 Moritz Lipp Michael Schwarz
163 Drilling template Drilling template nrw) 39 Moritz Lipp Michael Schwarz
164 More things which do not work Prevent access to high-resolution timer 40 Moritz Lipp Michael Schwarz
165 More things which do not work Prevent access to high-resolution timer Own timer using timing thread (last year) 40 Moritz Lipp Michael Schwarz
166 More things which do not work Prevent access to high-resolution timer Own timer using timing thread (last year) Flush instruction only privileged 40 Moritz Lipp Michael Schwarz
167 More things which do not work Prevent access to high-resolution timer Own timer using timing thread (last year) Flush instruction only privileged Cache eviction through memory accesses (last year) 40 Moritz Lipp Michael Schwarz
168 More things which do not work Prevent access to high-resolution timer Own timer using timing thread (last year) Flush instruction only privileged Cache eviction through memory accesses (last year) Just move secrets into secure world 40 Moritz Lipp Michael Schwarz
169 More things which do not work Prevent access to high-resolution timer Own timer using timing thread (last year) Flush instruction only privileged Cache eviction through memory accesses (last year) Just move secrets into secure world Spectre works on secure enclaves 40 Moritz Lipp Michael Schwarz
170 Spectre Variant 1 Mitigations 41 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
171 Spectre Variant 1 Mitigations Workaround: insert instructions stopping speculation 41 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
172 Spectre Variant 1 Mitigations Workaround: insert instructions stopping speculation insert after every bounds check 41 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
173 Spectre Variant 1 Mitigations Workaround: insert instructions stopping speculation insert after every bounds check ARM: Conditional select or conditional move and new barrier CSDB 41 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
174 Spectre Variant 1 Mitigations Workaround: insert instructions stopping speculation insert after every bounds check ARM: Conditional select or conditional move and new barrier CSDB Alternative: DSB SYS + ISB greater performance hit Retrofitted to existing ARMv7 and ARMv8 41 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
175 Spectre Variant 1 Mitigations 42 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
176 Spectre Variant 1 Mitigations Speculation barrier requires compiler supported 42 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
177 Spectre Variant 1 Mitigations Speculation barrier requires compiler supported Already implemented in GCC, LLVM, and MSVC 42 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
178 Spectre Variant 1 Mitigations Speculation barrier requires compiler supported Already implemented in GCC, LLVM, and MSVC Can be automated (MSVC) not really reliable 42 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
179 Spectre Variant 1 Mitigations Speculation barrier requires compiler supported Already implemented in GCC, LLVM, and MSVC Can be automated (MSVC) not really reliable Explicit use by programmer: builtin load no speculate 42 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
180 Spectre Variant 1 Mitigations 43 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
181 Spectre Variant 1 Mitigations 43 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
182 Spectre Variant 1 Mitigations 44 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
183 Spectre Variant 1 Mitigations Speculation barrier works if affected code constructs are known 44 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
184 Spectre Variant 1 Mitigations Speculation barrier works if affected code constructs are known Programmer has to fully understand vulnerability 44 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
185 Spectre Variant 1 Mitigations Speculation barrier works if affected code constructs are known Programmer has to fully understand vulnerability Automatic detection is not reliable 44 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
186 Spectre Variant 1 Mitigations Speculation barrier works if affected code constructs are known Programmer has to fully understand vulnerability Automatic detection is not reliable Non-negligible performance overhead of barriers 44 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
187 Spectre Variant 2 Mitigations (Software) ARM provides hardened Linux kernel and ARM Trusted Firmware patches 45 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
188 Spectre Variant 2 Mitigations (Software) ARM provides hardened Linux kernel and ARM Trusted Firmware patches Clears branch-predictor state on context switch 45 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
189 Spectre Variant 2 Mitigations (Software) ARM provides hardened Linux kernel and ARM Trusted Firmware patches Clears branch-predictor state on context switch Either via instruction (BPIALL) Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
190 Spectre Variant 2 Mitigations (Software) ARM provides hardened Linux kernel and ARM Trusted Firmware patches Clears branch-predictor state on context switch Either via instruction (BPIALL)......or workaround (disable/enable MMU) 45 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
191 Spectre Variant 2 Mitigations (Software) ARM provides hardened Linux kernel and ARM Trusted Firmware patches Clears branch-predictor state on context switch Either via instruction (BPIALL)......or workaround (disable/enable MMU) Google s Retpoline does not work on ARM 45 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
192 Learn from it We have ignored software side-channels for many many years: 46 Moritz Lipp Michael Schwarz
193 Learn from it We have ignored software side-channels for many many years: attacks on crypto 46 Moritz Lipp Michael Schwarz
194 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed 46 Moritz Lipp Michael Schwarz
195 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed attacks on ASLR 46 Moritz Lipp Michael Schwarz
196 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway 46 Moritz Lipp Michael Schwarz
197 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on SGX and TrustZone 46 Moritz Lipp Michael Schwarz
198 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on SGX and TrustZone not part of the threat model 46 Moritz Lipp Michael Schwarz
199 Learn from it We have ignored software side-channels for many many years: attacks on crypto software should be fixed attacks on ASLR ASLR is broken anyway attacks on SGX and TrustZone not part of the threat model for years we solely optimized for performance 46 Moritz Lipp Michael Schwarz
200 When you read the manuals... After learning about a side channel you realize: 47 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
201 When you read the manuals... After learning about a side channel you realize: the side channels were documented in the processor manual 47 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
202 When you read the manuals... After learning about a side channel you realize: the side channels were documented in the processor manual only now we understand the implications 47 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
203 What do we learn from it? Motor Vehicle Deaths in U.S. by Year 48 Moritz Lipp Michael Schwarz
204 A unique chance A unique chance to rethink processor design grow up, like other fields (car industry, construction industry) find good trade-offs between security and performance 49 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
205 Conclusion Underestimated microarchitectural attacks for a long time Meltdown and Spectre exploit performance optimizations Allow to leak arbitrary memory Countermeasures come with a performance impact Find trade-offs between security and performance 50 Moritz Lipp (@mlqxyz), Michael Schwarz (@misc0110)
206 Meltdown & Spectre Side-channels considered harmful Qualcomm Mobile Security Summit May, San Diego, CA Moritz Lipp (@mlqxyz) Michael Schwarz (@misc0110)
Software-based Microarchitectural Attacks
SCIENCE PASSION TECHNOLOGY Software-based Microarchitectural Attacks Daniel Gruss April 19, 2018 Graz University of Technology 1 Daniel Gruss Graz University of Technology Whoami Daniel Gruss Post-Doc
More informationTransient Execution Attacks
Transient Execution Attacks Daniel Gruss September 12, 2018 Graz University of Technology 1 Daniel Gruss Graz University of Technology Timeline Meltdown/Spectre (1) 19.02.2016: Daniel has an implementation
More informationMicroarchitectural Attacks and Defenses in JavaScript
Microarchitectural Attacks and Defenses in JavaScript Michael Schwarz, Daniel Gruss, Moritz Lipp 25.01.2018 www.iaik.tugraz.at 1 Michael Schwarz, Daniel Gruss, Moritz Lipp www.iaik.tugraz.at Microarchitecture
More informationPrecise State Recovery. Out-of-Order Pipelines
Precise State Recovery in Out-of-Order Pipelines Nima Honarmand Recall Our Generic OOO Pipeline Instruction flow (pipeline front-end) is in-order Register and memory execution are OOO And, we need a final
More informationProject 5: Optimizer Jason Ansel
Project 5: Optimizer Jason Ansel Overview Project guidelines Benchmarking Library OoO CPUs Project Guidelines Use optimizations from lectures as your arsenal If you decide to implement one, look at Whale
More informationEECS 470 Lecture 8. P6 µarchitecture. Fall 2018 Jon Beaumont Core 2 Microarchitecture
P6 µarchitecture Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Core 2 Microarchitecture Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides. Portions
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Speculation and raps in Out-of-Order Cores What is wrong with omasulo s? Branch instructions Need branch prediction to guess what to fetch next Need speculative execution
More informationFinal Report: DBmbench
18-741 Final Report: DBmbench Yan Ke (yke@cs.cmu.edu) Justin Weisz (jweisz@cs.cmu.edu) Dec. 8, 2006 1 Introduction Conventional database benchmarks, such as the TPC-C and TPC-H, are extremely computationally
More information7/11/2012. Single Cycle (Review) CSE 2021: Computer Organization. Multi-Cycle Implementation. Single Cycle with Jump. Pipelining Analogy
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10 CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan CSE-2021 July-12-2012 2 Single Cycle with Jump Multi-Cycle Implementation
More informationArchitectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance Michael D. Powell, Arijit Biswas, Shantanu Gupta, and Shubu Mukherjee SPEARS Group, Intel Massachusetts EECS, University
More informationInstructor: Dr. Mainak Chaudhuri. Instructor: Dr. S. K. Aggarwal. Instructor: Dr. Rajat Moona
NPTEL Online - IIT Kanpur Instructor: Dr. Mainak Chaudhuri Instructor: Dr. S. K. Aggarwal Course Name: Department: Program Optimization for Multi-core Architecture Computer Science and Engineering IIT
More informationSATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation
SATSim: A Superscalar Architecture Trace Simulator Using Interactive Animation Mark Wolff Linda Wills School of Electrical and Computer Engineering Georgia Institute of Technology {wolff,linda.wills}@ece.gatech.edu
More informationSupporting x86-64 Address Translation for 100s of GPU Lanes. Jason Power, Mark D. Hill, David A. Wood
Supporting x86-64 Address Translation for 100s of GPU s Jason Power, Mark D. Hill, David A. Wood Summary Challenges: CPU&GPUs physically integrated, but logically separate; This reduces theoretical bandwidth,
More informationUsing Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical & Computer Engineering North
More informationPipelined Processor Design
Pipelined Processor Design COE 38 Computer Architecture Prof. Muhamed Mudawar Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Pipelining versus Serial
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Execution and Register Rename In Search of Parallelism rivial Parallelism is limited What is trivial parallelism? In-order: sequential instructions do not have
More informationOut-of-Order Execution. Register Renaming. Nima Honarmand
Out-of-Order Execution & Register Renaming Nima Honarmand Out-of-Order (OOO) Execution (1) Essence of OOO execution is Dynamic Scheduling Dynamic scheduling: processor hardware determines instruction execution
More information7/19/2012. IF for Load (Review) CSE 2021: Computer Organization. EX for Load (Review) ID for Load (Review) WB for Load (Review) MEM for Load (Review)
CSE 2021: Computer Organization IF for Load (Review) Lecture-11 CPU Design : Pipelining-2 Review, Hazards Shakil M. Khan CSE-2021 July-19-2012 2 ID for Load (Review) EX for Load (Review) CSE-2021 July-19-2012
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Execution and Register Rename In Search of Parallelism rivial Parallelism is limited What is trivial parallelism? In-order: sequential instructions do not have
More informationDepartment Computer Science and Engineering IIT Kanpur
NPTEL Online - IIT Bombay Course Name Parallel Computer Architecture Department Computer Science and Engineering IIT Kanpur Instructor Dr. Mainak Chaudhuri file:///e /parallel_com_arch/lecture1/main.html[6/13/2012
More informationOn the Rules of Low-Power Design
On the Rules of Low-Power Design (and Why You Should Break Them) Prof. Todd Austin University of Michigan austin@umich.edu A long time ago, in a not so far away place The Rules of Low-Power Design P =
More informationCSE 2021: Computer Organization
CSE 2021: Computer Organization Lecture-11 CPU Design : Pipelining-2 Review, Hazards Shakil M. Khan IF for Load (Review) CSE-2021 July-14-2011 2 ID for Load (Review) CSE-2021 July-14-2011 3 EX for Load
More informationA Static Power Model for Architects
A Static Power Model for Architects J. Adam Butts and Guri Sohi University of Wisconsin-Madison {butts,sohi}@cs.wisc.edu 33rd International Symposium on Microarchitecture Monterey, California December,
More informationFall 2015 COMP Operating Systems. Lab #7
Fall 2015 COMP 3511 Operating Systems Lab #7 Outline Review and examples on virtual memory Motivation of Virtual Memory Demand Paging Page Replacement Q. 1 What is required to support dynamic memory allocation
More informationComputer Science 246. Advanced Computer Architecture. Spring 2010 Harvard University. Instructor: Prof. David Brooks
Advanced Computer Architecture Spring 2010 Harvard University Instructor: Prof. dbrooks@eecs.harvard.edu Lecture Outline Instruction-Level Parallelism Scoreboarding (A.8) Instruction Level Parallelism
More informationEECS 470. Lecture 9. MIPS R10000 Case Study. Fall 2018 Jon Beaumont
MIPS R10000 Case Study Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Multiprocessor SGI Origin Using MIPS R10K Many thanks to Prof. Martin and Roth of University of Pennsylvania for
More informationOutline Simulators and such. What defines a simulator? What about emulation?
Outline Simulators and such Mats Brorsson & Mladen Nikitovic ICT Dept of Electronic, Computer and Software Systems (ECS) What defines a simulator? Why are simulators needed? Classifications Case studies
More informationCombined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors
Combined Circuit and Microarchitecture Techniques for Effective Soft Error Robustness in SMT Processors Xin Fu, Tao Li and José Fortes Department of ECE, University of Florida xinfu@ufl.edu, taoli@ece.ufl.edu,
More informationCUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads
Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA
More informationEECS 470 Lecture 5. Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont
Intro to Dynamic Scheduling (Scoreboarding) Fall 2018 Jon Beaumont http://www.eecs.umich.edu/courses/eecs470 Many thanks to Prof. Martin and Roth of University of Pennsylvania for most of these slides.
More informationCSE502: Computer Architecture CSE 502: Computer Architecture
CSE 502: Computer Architecture Out-of-Order Schedulers Data-Capture Scheduler Dispatch: read available operands from ARF/ROB, store in scheduler Commit: Missing operands filled in from bypass Issue: When
More informationChapter 4. Pipelining Analogy. The Processor. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop:
Chapter 4 The Processor Part II Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup p = 2n/(0.5n + 1.5) 4 =
More informationU. Wisconsin CS/ECE 752 Advanced Computer Architecture I
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I Prof. Karu Sankaralingam Unit 5: Dynamic Scheduling I Slides developed by Amir Roth of University of Pennsylvania with sources that included University
More informationDynamic Scheduling II
so far: dynamic scheduling (out-of-order execution) Scoreboard omasulo s algorithm register renaming: removing artificial dependences (WAR/WAW) now: out-of-order execution + precise state advanced topic:
More informationOverview. 1 Trends in Microprocessor Architecture. Computer architecture. Computer architecture
Overview 1 Trends in Microprocessor Architecture R05 Robert Mullins Computer architecture Scaling performance and CMOS Where have performance gains come from? Modern superscalar processors The limits of
More informationΕΠΛ 605: Προχωρημένη Αρχιτεκτονική
ΕΠΛ 605: Προχωρημένη Αρχιτεκτονική Υπολογιστών Presentation of UniServer Horizon 2020 European project findings: X-Gene server chips, voltage-noise characterization, high-bandwidth voltage measurements,
More informationIssue. Execute. Finish
Specula1on & Precise Interrupts Fall 2017 Prof. Ron Dreslinski h6p://www.eecs.umich.edu/courses/eecs470 In Order Out of Order In Order Issue Execute Finish Fetch Decode Dispatch Complete Retire Instruction/Decode
More informationAsanovic/Devadas Spring Pipeline Hazards. Krste Asanovic Laboratory for Computer Science M.I.T.
Pipeline Hazards Krste Asanovic Laboratory for Computer Science M.I.T. Pipelined DLX Datapath without interlocks and jumps 31 0x4 RegDst RegWrite inst Inst rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext A B OpSel
More informationGPU-accelerated track reconstruction in the ALICE High Level Trigger
GPU-accelerated track reconstruction in the ALICE High Level Trigger David Rohr for the ALICE Collaboration Frankfurt Institute for Advanced Studies CHEP 2016, San Francisco ALICE at the LHC The Large
More informationCMP 301B Computer Architecture. Appendix C
CMP 301B Computer Architecture Appendix C Dealing with Exceptions What should be done when an exception arises and many instructions are in the pipeline??!! Force a trap instruction in the next IF stage
More informationMemory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors
Memory-Level Parallelism Aware Fetch Policies for Simultaneous Multithreading Processors STIJN EYERMAN and LIEVEN EECKHOUT Ghent University A thread executing on a simultaneous multithreading (SMT) processor
More informationThe adventures of a Suricate in ebpf land
The adventures of a Suricate in ebpf land É. Leblond Stamus Networks Nov. 10, 2016 É. Leblond (Stamus Networks) The adventures of a Suricate in ebpf land Nov. 10, 2016 1 / 34 1 ebpf technology 2 Suricata
More informationPre-Silicon Validation of Hyper-Threading Technology
Pre-Silicon Validation of Hyper-Threading Technology David Burns, Desktop Platforms Group, Intel Corp. Index words: microprocessor, validation, bugs, verification ABSTRACT Hyper-Threading Technology delivers
More informationEECS 470. Tomasulo s Algorithm. Lecture 4 Winter 2018
omasulo s Algorithm Winter 2018 Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, yson, Vijaykumar, and Wenisch of Carnegie Mellon University,
More informationA Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability
A Case for Opportunistic Embedded Sensing In Presence of Hardware Power Variability L. Wanner, C. Apte, R. Balani, Puneet Gupta, and Mani Srivastava University of California, Los Angeles puneet@ee.ucla.edu
More informationEvaluation of CPU Frequency Transition Latency
Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz 1 Alexandre Laurent 1 Benoît Pradelle 1 William Jalby 1 1 University of Versailles Saint-Quentin-en-Yvelines, France ENA-HPC 2013, Dresden
More informationSW simulation and Performance Analysis
SW simulation and Performance Analysis In Multi-Processing Embedded Systems Eugenio Villar University of Cantabria Context HW/SW Embedded Systems Design Flow HW/SW Simulation Performance Analysis Design
More informationInstruction Level Parallelism III: Dynamic Scheduling
Instruction Level Parallelism III: Dynamic Scheduling Reading: Appendix A (A-67) H&P Chapter 2 Instruction Level Parallelism III: Dynamic Scheduling 1 his Unit: Dynamic Scheduling Application OS Compiler
More informationFreeway: Maximizing MLP for Slice-Out-of-Order Execution
Freeway: Maximizing MLP for Slice-Out-of-Order Execution Rakesh Kumar Norwegian University of Science and Technology (NTNU) rakesh.kumar@ntnu.no Mehdi Alipour, David Black-Schaffer Uppsala University {mehdi.alipour,
More informationHow different FPGA firmware options enable digitizer platforms to address and facilitate multiple applications
How different FPGA firmware options enable digitizer platforms to address and facilitate multiple applications 1 st of April 2019 Marc.Stackler@Teledyne.com March 19 1 Digitizer definition and application
More informationDeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors
DeCoR: A Delayed Commit and Rollback Mechanism for Handling Inductive Noise in Processors Meeta S. Gupta, Krishna K. Rangan, Michael D. Smith, Gu-Yeon Wei and David Brooks School of Engineering and Applied
More informationLec 24: Parallel Processors. Announcements
Lec 24: Parallel Processors Kavita ala CS 3410, Fall 2008 Computer Science Cornell University P 3 out Hack n Seek nnouncements The goal is to have fun with it Recitations today will talk about it Pizza
More informationCompiler Optimisation
Compiler Optimisation 6 Instruction Scheduling Hugh Leather IF 1.18a hleather@inf.ed.ac.uk Institute for Computing Systems Architecture School of Informatics University of Edinburgh 2018 Introduction This
More informationDesign of Embedded Systems - Advanced Course Project
2011-10-31 Bomberman A Design of Embedded Systems - Advanced Course Project Linus Sandén, Mikael Göransson & Michael Lennartsson et07ls4@student.lth.se, et07mg7@student.lth.se, mt06ml8@student.lth.se Abstract
More informationPerformance Evaluation of Recently Proposed Cache Replacement Policies
University of Jordan Computer Engineering Department Performance Evaluation of Recently Proposed Cache Replacement Policies CPE 731: Advanced Computer Architecture Dr. Gheith Abandah Asma Abdelkarim January
More informationHow cryptographic benchmarking goes wrong. Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance.
How cryptographic benchmarking goes wrong 1 Daniel J. Bernstein Thanks to NIST 60NANB12D261 for funding this work, and for not reviewing these slides in advance. PRESERVE, ending 2015.06.30, was a European
More informationCOTSon: Infrastructure for system-level simulation
COTSon: Infrastructure for system-level simulation Ayose Falcón, Paolo Faraboschi, Daniel Ortega HP Labs Exascale Computing Lab http://sites.google.com/site/hplabscotson MICRO-41 tutorial November 9, 28
More informationEvaluation of CPU Frequency Transition Latency
Noname manuscript No. (will be inserted by the editor) Evaluation of CPU Frequency Transition Latency Abdelhafid Mazouz Alexandre Laurent Benoît Pradelle William Jalby Abstract Dynamic Voltage and Frequency
More informationLecture Topics. Announcements. Today: Memory Management (Stallings, chapter ) Next: continued. Self-Study Exercise #6. Project #4 (due 10/11)
Lecture Topics Today: Memory Management (Stallings, chapter 7.1-7.4) Next: continued 1 Announcements Self-Study Exercise #6 Project #4 (due 10/11) Project #5 (due 10/18) 2 Memory Hierarchy 3 Memory Hierarchy
More informationECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution
ECE 4750 Computer Architecture, Fall 2016 T09 Advanced Processors: Superscalar Execution School of Electrical and Computer Engineering Cornell University revision: 2016-11-28-17-33 1 In-Order Dual-Issue
More informationDynamic Scheduling I
basic pipeline started with single, in-order issue, single-cycle operations have extended this basic pipeline with multi-cycle operations multiple issue (superscalar) now: dynamic scheduling (out-of-order
More informationProcessors Processing Processors. The meta-lecture
Simulators 5SIA0 Processors Processing Processors The meta-lecture Why Simulators? Your Friend Harm Why Simulators? Harm Loves Tractors Harm Why Simulators? The outside world Unfortunately for Harm you
More informationLow Power System-On-Chip-Design Chapter 12: Physical Libraries
1 Low Power System-On-Chip-Design Chapter 12: Physical Libraries Friedemann Wesner 2 Outline Standard Cell Libraries Modeling of Standard Cell Libraries Isolation Cells Level Shifters Memories Power Gating
More informationComputer Architecture
Computer Architecture Lecture 01 Arkaprava Basu www.csa.iisc.ac.in Acknowledgements Several of the slides in the deck are from Luis Ceze (Washington), Nima Horanmand (Stony Brook), Mark Hill, David Wood,
More informationIF ID EX MEM WB 400 ps 225 ps 350 ps 450 ps 300 ps
CSE 30321 Computer Architecture I Fall 2010 Homework 06 Pipelined Processors 85 points Assigned: November 2, 2010 Due: November 9, 2010 PLEASE DO THE ASSIGNMENT ON THIS HANDOUT!!! Problem 1: (25 points)
More informationSancus: Low-cost trustworthy extensible networked devices with a zero-software Trusted Computing Base
Sancus: Low-cost trustworthy extensible networked devices with a zero-software Trusted Computing Base Job Noorman Pieter Agten Wilfried Daniels Raoul Strackx Anthony Van Herrewege Christophe Huygens Bart
More informationBlackfin Online Learning & Development
A Presentation Title: Blackfin Optimizations for Performance and Power Consumption Presenter: Merril Weiner, Senior DSP Engineer Chapter 1: Introduction Subchapter 1a: Agenda Chapter 1b: Overview Chapter
More informationParallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir
Parallel Computing 2020: Preparing for the Post-Moore Era Marc Snir THE (CMOS) WORLD IS ENDING NEXT DECADE So says the International Technology Roadmap for Semiconductors (ITRS) 2 End of CMOS? IN THE LONG
More informationAn architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming.
An architecture for Scalable Concurrent Embedded Software" No more communication in your program, the key to multi-core and distributed programming. Eric.Verhulst@altreonic.com www.altreonic.com 1 Content
More informationChapter 16 - Instruction-Level Parallelism and Superscalar Processors
Chapter 16 - Instruction-Level Parallelism and Superscalar Processors Luis Tarrataca luis.tarrataca@gmail.com CEFET-RJ L. Tarrataca Chapter 16 - Superscalar Processors 1 / 78 Table of Contents I 1 Overview
More informationMosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes
Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes Rachata Ausavarungnirun Joshua Landgraf Vance Miller Saugata Ghose Jayneel Gandhi Christopher J. Rossbach Onur
More informationElectroMagnetic Fault Injection Characterization
ElectroMagnetic Fault Injection Characterization George Thessalonikefs george.thessalonikefs@os3.nl University of Amsterdam System & Network Engineering MSc February 10, 2014 Abstract This paper tries
More informationInstruction Level Parallelism Part II - Scoreboard
Course on: Advanced Computer Architectures Instruction Level Parallelism Part II - Scoreboard Prof. Cristina Silvano Politecnico di Milano email: cristina.silvano@polimi.it 1 Basic Assumptions We consider
More informationAnalysis of Dynamic Power Management on Multi-Core Processors
Analysis of Dynamic Power Management on Multi-Core Processors W. Lloyd Bircher and Lizy K. John Laboratory for Computer Architecture Department of Electrical and Computer Engineering The University of
More informationEnhancing System Architecture by Modelling the Flash Translation Layer
Enhancing System Architecture by Modelling the Flash Translation Layer Robert Sykes Sr. Dir. Firmware August 2014 OCZ Storage Solutions A Toshiba Group Company Introduction This presentation will discuss
More informationCSE502: Computer Architecture Welcome to CSE 502
Welcome to CSE 502 Introduction & Review Today s Lecture Course Overview Course Topics Grading Logistics Academic Integrity Policy Homework Quiz Key basic concepts for Computer Architecture Course Overview
More informationMitigating Inductive Noise in SMT Processors
Mitigating Inductive Noise in SMT Processors Wael El-Essawy and David H. Albonesi Department of Electrical and Computer Engineering, University of Rochester ABSTRACT Simultaneous Multi-Threading, although
More informationIntroduction to Real-Time Systems
Introduction to Real-Time Systems Real-Time Systems, Lecture 1 Martina Maggio and Karl-Erik Årzén 16 January 2018 Lund University, Department of Automatic Control Content [Real-Time Control System: Chapter
More informationEfficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era
28 Efficiently Exploiting Memory Level Parallelism on Asymmetric Coupled Cores in the Dark Silicon Era GEORGE PATSILARAS, NIKET K. CHOUDHARY, and JAMES TUCK, North Carolina State University Extracting
More informationECE 471 Embedded Systems Lecture 31
ECE 471 Embedded Systems Lecture 31 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 30 November 2018 HW#10 was due Project update was due HW#11 will be posted Announcements 1 HW#9
More informationBenchmarking C++ From video games to algorithmic trading. Alexander Radchenko
Benchmarking C++ From video games to algorithmic trading Alexander Radchenko Quiz. How long it takes to run? 3.5GHz Xeon at CentOS 7 Write your name Write your guess as a single number Write time units
More informationGC for interactive and real-time systems
GC for interactive and real-time systems Interactive or real-time app concerns Reducing length of garbage collection pause Demands guarantees for worst case performance Generational GC works if: Young
More informationGame Architecture. 4/8/16: Multiprocessor Game Loops
Game Architecture 4/8/16: Multiprocessor Game Loops Monolithic Dead simple to set up, but it can get messy Flow-of-control can be complex Top-level may have too much knowledge of underlying systems (gross
More informationKosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University
CURRICULUM VITAE Kosuke Imamura, Assistant Professor, Department of Computer Science, Eastern Washington University EDUCATION: PhD Computer Science, University of Idaho, December
More informationREVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND.
December 3-6, 2018 Santa Clara Convention Center CA, USA REVOLUTIONIZING THE COMPUTING LANDSCAPE AND BEYOND. https://tmt.knect365.com/risc-v-summit @risc_v ACCELERATING INFERENCING ON THE EDGE WITH RISC-V
More informationKillzone Shadow Fall: Threading the Entity Update on PS4. Jorrit Rouwé Lead Game Tech, Guerrilla Games
Killzone Shadow Fall: Threading the Entity Update on PS4 Jorrit Rouwé Lead Game Tech, Guerrilla Games Introduction Killzone Shadow Fall is a First Person Shooter PlayStation 4 launch title In SP up to
More informationParallelism Across the Curriculum
Parallelism Across the Curriculum John E. Howland Department of Computer Science Trinity University One Trinity Place San Antonio, Texas 78212-7200 Voice: (210) 999-7364 Fax: (210) 999-7477 E-mail: jhowland@trinity.edu
More informationAGENTLESS ARCHITECTURE
ansible.com +1 919.667.9958 WHITEPAPER THE BENEFITS OF AGENTLESS ARCHITECTURE A management tool should not impose additional demands on one s environment in fact, one should have to think about it as little
More information6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors
6.S084 Tutorial Problems L19 Control Hazards in Pipelined Processors Options for dealing with data and control hazards: stall, bypass, speculate 6.S084 Worksheet - 1 of 10 - L19 Control Hazards in Pipelined
More informationCS521 CSE IITG 11/23/2012
Parallel Decoding and issue Parallel execution Preserving the sequential consistency of execution and exception processing 1 slide 2 Decode/issue data Issue bound fetch Dispatch bound fetch RS RS RS RS
More informationEECS 470 Lecture 4. Pipelining & Hazards II. Winter Prof. Ronald Dreslinski h8p://
Wenisch 26 -- Portions ustin, Brehob, Falsafi, Hill, Hoe, ipasti, artin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar EECS 4 ecture 4 Pipelining & Hazards II Winter 29 GS STTION Prof. Ronald Dreslinski h8p://www.eecs.umich.edu/courses/eecs4
More informationSOFTWARE IMPLEMENTATION OF THE
SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,
More informationThe Transistor. Survey: What is Moore s Law? Survey: What is Moore s Law? Technology Unit Overview. Technology Generations
CSE 560 Computer Systems Architecture Technology Survey: What is Moore s Law? What does Moore s Law state? A. The length of a transistor halves every 2 years. B. The number of transistors on a chip will
More informationCS4617 Computer Architecture
1/26 CS4617 Computer Architecture Lecture 2 Dr J Vaughan September 10, 2014 2/26 Amdahl s Law Speedup = Execution time for entire task without using enhancement Execution time for entire task using enhancement
More informationFTA SI-640 High Speed Camera Installation and Use
FTA SI-640 High Speed Camera Installation and Use Last updated November 14, 2005 Installation The required drivers are included with the standard Fta32 Video distribution, so no separate folders exist
More informationSoftware ISP Application Note
NXP Semiconductors Document Number: AN12060 Application Notes Rev. 0, 10/2017 Software ISP Application Note 1. Introduction This document describes the software-based image signal processing application(sw-isp)
More informationAn Evaluation of Speculative Instruction Execution on Simultaneous Multithreaded Processors
An Evaluation of Speculative Instruction Execution on Simultaneous Multithreaded Processors STEVEN SWANSON, LUKE K. McDOWELL, MICHAEL M. SWIFT, SUSAN J. EGGERS and HENRY M. LEVY University of Washington
More informationSCALCORE: DESIGNING A CORE
SCALCORE: DESIGNING A CORE FOR VOLTAGE SCALABILITY Bhargava Gopireddy, Choungki Song, Josep Torrellas, Nam Sung Kim, Aditya Agrawal, Asit Mishra University of Illinois, University of Wisconsin, Nvidia,
More informationBMS BMU Vehicle Communications Protocol
BMS Communications Protocol 2013 Tritium Pty Ltd Brisbane, Australia http://www.tritium.com.au 1 of 11 TABLE OF CONTENTS 1 Introduction...3 2 Overview...3 3 allocations...4 4 Data Format...4 5 CAN packet
More informationUsing Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems Eric Rotenberg Center for Embedded Systems Research (CESR) Department of Electrical and Computer Engineering North
More information