Georgia Tech. Greetings from. Machine Learning and its Application to Integrated Systems

Greetings from Georgia Tech Machine Learning and its Application to Integrated Systems Madhavan Swaminathan John Pippin Chair in Microsystems Packaging & Electromagnetics School of Electrical and Computer Engineering Director, Center for Co-Design of Chip, Package, System (C3PS)

Machine Learning & Hardware Design Behavioral Modeling Optimization Uncertainty Quantification BER Estimation Trusted Platform for IoT FPGA Compilation 2

Outline Problem Definition & Motivation Bayesian Optimization Two-Stage Bayesian Optimization Examples Clock Skew Minimization of 3D ICs Co-Optimization of Embedded Inductor and IVR Wireless Power Transfer for IoT Summary 3

3D ICs and Systems Multi-scale Coupled Equations (Multi-physics) 4 Trend in Systems is towards miniaturization 3D ICs is a technology that enables miniaturization where chips are stacked on each other with TSVs Major problem are the thermal gradients and hot spots generated due to incapability of heat to escape Multi-physics problem: Thermal + Electrical + Circuit FVM method used for discretization with domain decomposition used for capturing multi-scale geometries Jianyong Xie, M. Swaminathan, 3D transient thermal solver using non-conformal domain decomposition approach, International Conference on Computer- Aided Design (ICCAD), pp. 333-340, 2012

Motivation Uncertainty arises due to the inability to tune control knobs (in a cost effective way) to achieve the appropriate performance (since simulation is expensive) Objective is to minimize the Clock Skew for the Clock Distribution Clock Skew is affected by Temperature (Magnitude and Gradient) Temperature is controlled by FIVE input (or control) parameters Coupled Thermal and Electrical equations are solved using non-uniform grid and domain decomposition Objective is therefore to TUNE these parameters to minimize Skew We need a solution that is non-intrusive and is applicable to high dimensions! S. J. Park, B. Bae, J. Kim and M. Swaminathan, Application of Machine-Learning for Optimization of 3-D Integrated Circuits and Systems, TVLSI 17. 5

A Few Optimization Methods. Peaks Function Optimization applied to peaks function: (a) Multi-start, (b) global search (iter=273), (c) pattern search (iter=272), (d) genetic algorithm (iter=2650) and (e) Bayesian optimization (iter=100) Faster convergence using ML (Bayesian Optimization) Fewer iterations lead to fewer samples and shorter run time Motivation for our research S. J. Park, B. Bae, J. Kim and M. Swaminathan, Application of Machine-Learning for Optimization of 3-D Integrated Circuits and Systems, TVLSI 17. 6

Solution Machine Learning based Bayesian Optimization Difficulties in Optimization No gradient information available Slow Convergence Rate Local maxima/minima problem CPU Extensive Simulations Non-Convex Objective Functions Large sample space High dimensionality Machine Learning Based Optimization Machine Learning based Optimization Surrogate based Global Optima Faster Convergence Capability of handling non-convex problems Can handle high dimensionality Active Learning 7

Two-Stage Bayesian Optimization H.Torun & M. Swaminathan, EPEPS 17 Distinctive Hierarchical Partitioning Scheme Reduces number of simulations required Learning Acquisition Functions Extends Applicability to Various Designs Fast Exploration and Pure Exploitation Stages Separate coarse and fine tuning 8

Two-Stage Bayesian Optimization: Hierarchical Partitioning Tree :Candidate Points (branch not expanded) :Sampled Points (branch expanded) : Skipped Points (branch not expanded) Conventional [1] Modified [2] TSBO Tree based BO eliminates the auxiliary optimization of acquisition function! Conventional tree expands every branch. Modified tree skips some branches with high probability it is sub-optimal. In TSBO, each child node is candidate points, one of which is chosen at each iteration. Overcomes limitation in number of branches generated and allow more rapid coverage of sample space. [1]: R. Munos, Optimistic optimization of a deterministic function without the knowledge of its smoothness, in Advances in neural information processing systems, 2011. [2]: Z. Wang, B. Shakibi, L. Jin, and N. Freitas, Bayesian multi-scale optimistic optimization, in Artificial Intelligence and Statistics, 2014, 9

Two-Stage Bayesian Optimization Learning Acquisition Functions Conventional BO: Use auxiliary optimization on acquisition function to find where to sample next Only use 1 acquisition function Learning Acquisition Functions: During optimization, actively learn which strategy is best for current problem After learning is completed, continue next iterations with learned function. Sequential selection makes algorithm deterministic. 10

Two-Stage Bayesian Optimization: Fast Exploration and Pure Exploitation Starting Point t = 1 t = 20 End of first stage t = 42 End of second stage t = 100 11

Clock Skew Minimization for Clock Distribution: Optimization Setup Thermal-electrical simulations are used along with Bayesian Optimization for tuning the system input parameter. 12

Clock Skew Minimization for Clock Distribution Results ~4X Faster 13 Non Linear Solver Previous Work[1] This Work 25.2 (+%9.4) 23.8 (+%4.7) 23.5 Skew [ps] 96.6 (+%12.3) 88.0 (+%2.3) 86.0 CPU Time (Normalized) * 3.96 3.76 1.00 [1] S. J. Park, B. Bae, J. Kim and M. Swaminathan, Application of Machine-Learning for Optimization of 3-D Integrated Circuits a nd Systems, TVLSI 17.

Co-Optimization of Embedded Inductor and IVR Overall SiP IVR Architecture Embedded Solenoidal Inductor Control Parameters Integrated Voltage Regulators are used to increase efficiency and conserve power in microprocessors (Ex: Intel Gen 4) Objective is to maximize IVR efficiency while minimizing inductor area IVR efficiency is affected by inductor and buck converter. Assuming LDO, PDN and LOAD is fixed. Solenoidal Inductors with magnetic cores are used Multiple trade-offs: ESR, DC resistance, inductance, lateral area Tune inductor control parameters to maximize efficiency (10 12 dimension) 14

Co-Optimization of Embedded Inductor and IVR Optimization Setup H. M. Torun, M. Swaminathan, A. K. Davis, M. L. F. Belladredj A Machine Learning based Global Optimization Algorithm and its Application to Integrated Systems. TVLSI (Under Review) 15

Co-Optimization of Embedded Inductor and IVR Results Objective Function Peak Efficiency Inductor Area Non-Linear GP-UCB IMGPO TSBO Area 25.19 mm 2 (+39.7%) 5.18 mm 2 (%0.4) 6.64 mm2 (%28.1) 5.16 mm 2 Peak Efficiency 78.6% 84.9% 84.4% 85.1% CPU Time >185 min (+72.9%) 117.33 min (+57.4 %) 115.6 min (+56.7 %) 50.1 min TSBO reduces CPU time to reach error tolerance by 57.4% and 56.7% compared to GP-UCB and IMGPO! H. M. Torun, M. Swaminathan, A. K. Davis, M. L. F. Belladredj A Machine Learning based Global Optimization Algorithm and its Application to Integrated Systems. TVLSI (Under Review) 16

Machine Learning Driven System Miniaturization of IoT Embedded Mismatched WPT Coils Control Parameters Two-layer spiral inductor with screen printed magnetic material Stage 2: Wireless Power Transfer Stage 3: RX Resonance and Rectifier Buck Converter Stage 1: TX Resonance Stage 4: DC regulation stage with embedded inductor Objective: Stage-by-stage optimization with the objective of maximizing RF to regulated DC conversion efficiency while minimizing the area of RX Coil and embedded spiral inductor 17

ML Driven System Miniaturization of IoT for WPT - Results RX Coil Size Hand Tuned 100 mm2 Balanced Optimized RF Coils 50.4 mm2 ( - 49.6%) Miniature Optimized RF Coils 20.1 mm2 ( - 79.1%) 12 Parameters WPT Coils Co-Optimization WPT Power Transfer Eff. 94.2% 94.8% 94.9% Coupling coefficient 0.32 0.25 0.15 Peak RF-DC Eff. 61.7 % 68.4% 57.1 % System Efficiency 51.7 % 64.1% ( + 12.4%) 51.8 % ( - 0.2%) Hand Tuned Spiral 32 mil Board Optimized 32 mil board Optimized 8 mil board L 536 nh 1.37 uh 1.83 uh Area 56.25 mm2 40.48 mm2 (-28.0%) 22.09 mm2 (-60.7%) Peak Q 33 @ 12 MHz 34.5 @ 10MHz 27.3 @ 10MHz DC-DC Efficiency 83.8% 90.7% ( + 6.9%) 91.7% ( + 7.4%) Particle Swarm IMGPO TSBO CPU Time > 606.7 min ( > +74.1%) 423 min ( + 62.8%) 157.3 min H. M. Torun, C. Pardue, A.K. Davis, M.L.F Belladredj, M. Swaminathan Machine Learning Driven Advanced Packaging and System Miniaturization of IoT for Wireless Power Transfer Solutions, ECTC 2018, Under Review. 18

Summary Bayesian Optimization is shown to require less system simulations to achieve global optimum. Three new techniques namely, Learning Acquisition Functions, distinctive hierarchical partitioning tree and Fast Exploration and Pure Exploitation Stages is presented and used in TSBO. Has been applied up to 12 dimensions showing significant performance improvements as compared to other techniques. Some of the popular non- ML optimization techniques do NOT produce good solutions. Goal is to get to 30+ dimensions. ML based designs as compared to Hand Tuned designs have shown to produce significantly better designs (performance and area) with major time 19 savings!

Thank you www.c3ps.gatech.edu 20