APPENDIX B PARETO PLOTS PER BENCHMARK

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 1 APPENDIX B PARETO PLOTS PER BENCHMARK Appendix B contains all Pareto frontiers for the SPEC CPU benchmarks as calculated by the model (green curve) and simulated by Sniper (blue curve). The red points are the configurations for which our model predicted they are Pareto-optimal, but shown with their simulated performance and power consumption. The difference between the blue and green curves shows the error of the model, while the difference between the red points and the blue curve indicates how well we can predict actual Pareto-optimal configurations. Next to the visual matching, we show various metrics underneath each figure: the average absolute error for performance and power, as well as sensitivity, specificity and the Hypervolume Ratio (HVR) [31]. Sensitivity and specificity quantify the fraction of predicted actual Pareto-optimal and non-pareto-optimal designs, respectively. HVR quantifies how well we can predict the range of solutions across the entire frontier. Put together, these metrics denote how good each predicted Pareto frontier is. The average values over the whole design space are %, 7% and 97% for specificity, sensitivity and HVR, respectively. Hence, our model is very good at predicting the actual range of the Pareto frontier (HVR) and also at filtering out the non-pareto optimal solutions (sensitivity), but performs less good on detecting all Pareto-optimal designs (specificity). However, a visual inspection of the Pareto frontiers shows us that either we find only a few designs in a large cluster of Pareto-optimal designs that are very close to each other, which leads to lower sensitivity but which we deem acceptable or we miss some Pareto-optimal designs that are not useful to implement (e.g., the designs on the left vertical tail of bzip2: large power increase with only a small performance gain). We include additional explanation for some of the Pareto frontiers: bzip2, h2ref, gobmk and soplex: The model misses the top-left tail of the Pareto frontier, which appears to be almost vertical. However, these designs are less interesting to find because they represent a marginal increase in performance while power increases substantially. Furthermore, this tail is always comprised of less than % of the total designs... 1. 1. 2. 2. 3. astar abs. error performance / power:.% /.% Sensitivity:.% - Specificity: 9.% HVR: 99.3% gromacs: As shown in the phase graph (see Appendix A), we make a systematic error across all configurations. However, this still leads to good relative accuracy when changing the processor configuration. This systematic error is shown in the Pareto frontier: the green curve is a shifted version of the blue curve where the error for all CPI values is indeed around 22% to the left. Due to the good relative accuracy, the designs on the frontier are almost exactly the same as the ones on the frontier (almost all red points are part of the blue curve). hmmer: There is a tail on the right that we do not predict accurately. This is similar to bzip2 etc. for which we not accurately predict the tail on the left. We still see most of the Pareto-optimal designs on that right tail, but not all Pareto-optimal designs in the knees of the curve, leading to lower sensitivity. perlbench: We do not predict the left vertical tail, and we predict two dense clusters of Pareto-optimal designs. However, this is not an issue since there are actually no Pareto optimal designs in between those clusters. The frontier actually connects the Paretooptimal clusters. sjeng: The model does not find any of the designs on the left vertical tail because it cannot properly estimate the decrease in branch misprediction rate of using the gshare branch predictor. The model classifies all branch predictors as performing approximately the same, while in fact, the gshare branch predictor outperforms the others for the larger dispatch widths. sphinx3: We do not see the left vertical tail, which in this case is actually built up out around designs. However, those designs are all clustered on places on the vertical tail, and are less interesting because they double power consumption for a gain in performance of less than %. xalancbmk: Here we observe designs which are not actually Pareto-optimal. However, these points are still close to being Pareto-optimal. 2 22 1. 1. 1.2 1. 1. bwaves abs. error performance / power: 7.% / 2.% Sensitivity: 27.% - Specificity: 77.7% HVR: 9.7%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 2 3...7..9 1. 1.1 1.2 1.3 3 bzip2 abs. error performance / power:.% / 2.% Sensitivity: 1.9% - Specificity:.% HVR: 9.%.3....7..9 1. 3 calculix abs. error performance / power: 2.9% / 2.1% Sensitivity: 31.% - Specificity:.1% HVR: 99.%....7..9 gamess abs. error performance / power:. / 1.9% Sensitivity:.% - Specificity: 93.% HVR: 97.9%..7..9 1. 1.1 1.2 1.3 1. cactusadm abs. error performance / power:.1% /.7% Sensitivity: 2.% - Specificity:.% HVR: 99.%..7..9 1. 1.1 1.2 1.3 1. 1 dealii abs. error performance / power:.% / 2.1% Sensitivity:.% - Specificity:.% HVR: 9.% 1. 1. 2. 2. 3. 3..... gcc abs. error performance / power: 13.2% / 2.% Sensitivity:.9% - Specificity:.2% HVR: 99.%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3.. 1. 1.2 1. 1. 1. 2. 2.2 GemsFDTD abs. error performance / power:.% /.2% Sensitivity: 2.1% - Specificity: 9.2% HVR: 99.%...7..9 1. 1.1 1.2 1.3 gromacs abs. error performance / power: 22.9% / % Sensitivity:.% - Specificity: 92.% HVR: 99.2% 3.......7.7. hmmer abs. error performance / power:.3% / 3.% Sensitivity: 33.3% - Specificity: 91.3% HVR: 9.7%...7..9 1. 1.1 3 gobmk abs. error performance / power:.3% / 2.% Sensitivity: 1.3% - Specificity: 93.% HVR: 9.3%......7.7.. h2ref abs. error performance / power: 2.7% / 1.% Sensitivity:.7% - Specificity: 92.3% HVR: 9.%. 1. 1.2 1. 1. 1. 2. 2.2 lbm abs. error performance / power:.9% /.% Sensitivity:.% - Specificity:.% HVR: 9.3%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER.. 1. 1.2 1. 1. 1. leslie3d abs. error performance / power: 17.% /.3% Sensitivity: 31.1% - Specificity: 7.% HVR: 9.1% 2 3 7 9 mcf abs. error performance / power:.9% / 2.% Sensitivity: 2.% - Specificity: 7.% HVR: 9.9% 3...3.......7 namd abs. error performance / power:. /.% Sensitivity: 2.2% - Specificity: 97.% HVR: 99.% 1 1. 1. 2. 2. 3. 3.... libquantum abs. error performance / power: 11.% /.1% Sensitivity: 33.3% - Specificity: 93.% HVR: 97.% 22 1 1. 1. 2. 2. 3. 3. milc abs. error performance / power:.% / 3.1% Sensitivity:.3% - Specificity: 9.% HVR:.%. 1. 1. 2. 2. 3. 3. omnetpp abs. error performance / power: 9.2% /.2% Sensitivity:.7% - Specificity:.9% HVR: 99.2%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3.3....7..9 3 perlbench abs. error performance / power: 3.% / 2.1% Sensitivity: 3.% - Specificity: 79.% HVR: 93.9%....7..9 3 sjeng abs. error performance / power:.3% / 3.% Sensitivity: 3.% - Specificity: 9.3% HVR: 7.%.. 1. 1. 2. 2. 3. sphinx3 abs. error performance / power:.% /.% Sensitivity:.% - Specificity:.1% HVR: 9.% 3....7..9 1. povray abs. error performance / power:.% / 2.% Sensitivity: 9.% - Specificity: 9.1% HVR: 9.% 1 2 3 soplex abs. error performance / power:.7% /.% Sensitivity: 2.% - Specificity: 9.2% HVR: 99.7% 3....7. tonto abs. error performance / power:.% / 3.% Sensitivity: 7.9% - Specificity: 9.% HVR: 99.%

IEEE TRANSACTIONS ON COMPUTERS, VOL., NO., SEPTEMBER 3... 1. 1.2 1. 1. 1. wrf abs. error performance / power: 11.7% -.% Sensitivity: 7.% - Specificity: 3.3% HVR: 9.%... 1. 1.2 1. xalancbmk abs. error performance / power:.9% -.% Sensitivity: 7.% - Specificity: 1.% HVR: 9.3%.. 1. 1.2 1. 1. 1. zeusmp abs. error performance / power: 7.% /.% Sensitivity: 1.% - Specificity: 9.% HVR: 99.3%