The Evolution of Waveform Relaxation for Circuit and Electromagnetic Solvers Albert Ruehli, Missouri S&T EMC Laboratory, University of Science & Technology, Rolla, MO with contributions by Giulio Antonini, Univ. L Aquila,Italy July, 2010 July,2010 Slide 1 of 47
OUTLINE Evolution of Waveform Relaxation (WR) WR in the Circuit Domain WR for Transmission Lines WR for Electromagnetic Solvers July, 2010 Slide 2 of 47
SYSTEMS TO BE SOLVED Time Domain Solution for Large Systems May contain non-linear parts Heterogeneous: VLSI circuit Problems with many transmission lines Homogeneous: EM circuit PEEC models Sparse MNA System solution Time O(n 1.5 ) Cẋ(t)+Gx(t)=Bu; July, 2010 Slide 3 of 47
BEGINNING OF WAVEFORM RELAXATION Time Domain Solution for Large Systems Logic circuits are almost One-Way Forward coupling from left to right Miller capacitances introduce back coupling! July, 2010 Slide 4 of 47
WAVEFORM RELAXATION SOLUTION Example for Weak One-Way Coupled Subsystems Solve SSy1 for window in time, Solve SSy2... ε 1 2 C 1 [x 1 (t),ε x 2 (t)] x 1 (t)+g 1 [x 1 (t),ε x2(t)]=bu; C 2 [x 1 (t),x 2 (t)] x 2 (t)+g 2 [x 1 (t),x2(t)]=0 July, 2010 Slide 5 of 47
BEGINNING OF WAVEFORM TECHNIQUES Start was Work on One-Way Systems and WR 1980 Work on one-way systems (Paper: Ruehli, Sangiovanni, Rabbat) 1981 first ideas on WR, Lelarasmee, Ruehli, Sangiovanni-Vincentelli 1982 Trans. on CAD paper on WR First application goal: large logic circuits July, 2010 Slide 6 of 47
OUTLINE OF GENERAL WR SOLUTION Assume System Partitioned into Subsystems (SSy) Partition circuit into SSy Most logic SSy are One-Way forward Other Fundamental Steps in WR Approach Ordering of SSy (make labels) Scheduling of SSy for solver Solve an SSy for window in time Store waveform for time window segment July, 2010 Slide 7 of 47
PARTITIONING FOR CIRCUITS Circuits are Heterogeneous Systems! First, exploit hierarchy from top down Short feedback loops are in one SSy Weak coupling allows cutting at circuit inputs Break circuits depending on strong coupling July, 2010 Slide 8 of 47
PARTITIONING STRATEGY Non-uniform Structure of Circuits Circuit level partition at detail level Assemble SSy bottom up branch by branch Assemble nodes into strongly coupled SSy R = 1 R = 1 R 1 2 4 6 = 1 1 1 1 Assemble SSy according to coupling R 2 Only EigenV = 0.19; R 2 and R 6 EigenV = 0.25 R 2 and R 4 and R 6 EigenV = 0.37 July, 2010 Slide 9 of 47
RAPID CONVERGENCE FOR RC CIRCUITS CONVERGENCE IN SMALL TIME WINDOW Analytic Expression For Convergence X (k) (k 1) E 2αt T 1 e m=0 (2αt) m T E(0) T m! R 1 R 2 R N C C C C 2 3 N Rapid Convergence For Window T : k 2eT July, 2010 Slide 10 of 47
PARTITIONING DIFFICULT FOR SOME CKTs High Pass Connection, Strong Coupling Short circuit between nodes 1 and 2 Needs more advanced partitioning approach v (k) 1 (t)+αv(k) v (k) 1 (t)= v(k 1) 2 (t)+i 1 (t)/c 2 2 (t)+βv(k) 2 (t)= v(k 1) 1 (t) 1 C 2 2 R 1 R 3 κ= s s+1 s s+1 0 0 July, 2010 Slide 11 of 47
SSy ORDERING Time Domain Solution for Large Systems Start at inputs (sources) Logic circuits: levelize the graph SSy labeling according to graph results 3 7 6 10 2 5 9 1 4 8 July, 2010 Slide 12 of 47
SCHEDULING OF SSy Scheduling After Ordering Assignment of SSy sequence Scheduling based on ordering However, can be different from ordering July, 2010 Slide 13 of 47
SCHEDULING OF SSy SOLVER Basic Scheduling of SSy Simple chain example Follow ordering = basic schedule Visit all SSy until all voltages, currents converged Order { 1,2,3,4,5,6,7,8,...} 1 2 3 4 5 6 Basic Schedule 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 July, 2010 Slide 14 of 47
ENHANCED SCHEDULING ε Theorem Scheduling Assume that the SSy have directionality (logic ckts with ε feedback signal) The error of cutting the feedback at SSy k results in a back direction error O(ε (N k) ) The error propagating in the forward direction is O(ε). EPSILON SCHEDULE 1 2 1 2 3 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 July, 2010 Slide 15 of 47
SSy SOLVE STEP Given: Partitioning done: We have SSys Ordering done, Static schedule known Solve SSys according to static schedule Use of Updated Waveforms? Gauss-Jacobi: Update at end of solving all SSy, Converges slower Gauss-Seidel: New waveforms each SSy solve July, 2010 Slide 16 of 47
SSy LATENCY (DORMANCY) Avoid Solve Compute Time for Latent SSy A Subsystem is Latent if All external waveforms x E do not change x (w) E x(w 1) ε E A + ε R max x (w 1) E (Waveform) Time 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 1110000 1111 1110000 1110000 1110000 1111 1111 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1110000 1110000 1111 1111 1111 1110000 1111 1110000 1111 1110000 1111 0000000 1111 0000000 1111 000 111 111 111 0000 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1111 1110000 1111 1 2 3 4 5 6 7 8 Latent Active Latent (System) Space July, 2010 Slide 17 of 47
WAVEFORM EXCHANGE, STORAGE Example: Waveforms for two SSy Solve SSy 1 using waveforms from SSy 2 WFs are divided into time windows, Store by windows v,i(t) SSy v,i(t) t SSy 2 1 v,i(t) v,i(t) t July, 2010 Slide 18 of 47
OVER AND UNDER WAVEFORM OVER (UNDER) RELAXATION FACTOR 0 β 2 Scale the update by β Under-relaxation Over-relaxation Approximate β(t) ẏ (k+1) 1 (t)= f 1 [y (k+1) 1 (t),x (k) 2 (t)] x (k+1) 1 (t)=β(t)y (k+1) 1 (t)+(1 β(t))y (k) 1 (t) ẏ (k+1) 2 (t)= f 2 [x (k+1) 1 (t),y (k+1) 2 (t)] x (k+1) 2 (t)=β(t)y (k+1) 2 (t)+(1 β(t))y (k) 2 (t) July, 2010 Slide 19 of 47
PARALLEL PROCESSING FOR WR Suitability for Parallel Processing Used for large circuits with many SSys Need algorithms which keep most processors busy Would like to have number of processors smaller than number of SSy Experience with Parallel Circuit Solver Faster if use more aggressive partitioning allowing for non-uniform iterations Best approach for parallel (Spice) circuit solver? July, 2010 Slide 20 of 47
COMPARE WR TO SPICE SPICE- Time Point By Point Computations Computations are localized in single matrix Short compute times only Cannot tolerate delays(latency) in processor communication, less suitable for parallel WR- Compute all Point for Time Window Put a small Spice on each processor as solver WR, processor exchange of waveforms rather than point data only Can tolerate larger communication latency July, 2010 Slide 21 of 47
PARALLEL WR FOR LARGE CIRCUITS Circuits with up to 186k transistors 256 processor WR circuit solver speedup Speed Up 200 100 * * * * * * * * * * * 10 4 5 6 10 10 Number of Transistors July, 2010 Slide 22 of 47
SUMMARY OF VLSI CIRCUIT PART WR For Pure Circuit Problems Many interesting circuit specific algorithms Good enhancement of WR performance High efficiency for parallel processing State Of The Art Cheap parallel processors are widely available Makes WR more useful Still much work needs to be done July, 2010 Slide 23 of 47
APPLICATION TO COMBINED ELMAG./CIRCUIT PROBLEMS EM/Ckt Problems General: EM and Ckt interactions challenging 3D EM solutions much different from Ckt Full wave solution adds challenges Nonlinear combined solvers are difficult Observations About Partitioning EM problems can be systematically partitioned Homogeneous structures with fixed partitioning Partitioning and convergence can be controlled July, 2010 Slide 24 of 47
PARTITIONING FOR TLs Transverse Partitioning for Multi-Lines Modeling with many TLs is very time expensive Use transverse WR partitioning for problem July, 2010 Slide 25 of 47
PARTITIONING FOR TLs Excessive Spice Compute Time Modeling with many TLs is very time expensive Without WR compute time July, 2010 Slide 26 of 47
PARTITIONING FOR TL RESULTS Excessive Spice Compute Time Modeling for multi TLs is very time expensive Compute time with transverse WR is linear! July, 2010 Slide 27 of 47
PEEC for 3D WR-EM SOLUTION PEEC - Transforms EM Problem to Circuit Domain Transient (and frequency) domain EM solutions PEEC ckt. models: Consists of capacitances, inductances, resistances, voltage, current sources Partitioning at coupled elements Using modified nodal analysis (MNA) formulation PEEC gets low frequency and dc solution July, 2010 Slide 28 of 47
Basic Derivation of PEEC Model Equation for Total Electric Field KVL: v= E dl Ē i J( r,t) ( r,t)= σ v + µ G( r, r ) J( r,t d ) dv t + G( r, r )q(r,t d )dv (1) ε 0 PEEC Circuit Model Element Computation KVL: Voltage = R I + s Lp I + Q/C RHS Term 1: Resistance RHS Term 2: Partial Inductance RHS Term 3: Coefficient of Potential v July, 2010 Slide 29 of 47
(Lp,P,R,τ)PEEC Equivalent Circuit Model PEEC Equivalent Circuits For Two Basic Cells Example: 3 Node Discretization of Metal Stick Path along metal conductor is strongly coupled Coupled Partial Inductances and Capacitances i L1 v 1 v 2 v 3 i L2 July, 2010 Slide 30 of 47
Partitioning Into SSy 3 4 1 5 2, 6 EM Interactions between SSys PEEC mutual coupling between all EM SSy Challenge is coupled branches SSy to SSy July, 2010 Slide 31 of 47
OUTLINE OF EM SSy PARTITIONING EM Geometry Partitioning into SSys Circuit topology is same for all PEEC cells Partial inductance coupling decreases with d Capacitive coupling decreases with d SSy formed based on weak coupling PEEC Model Direct Coupling Break at less coupled parts Need to break resistive conduction path Galvanic-ally isolated units are easy to decouple Trade-off between SSy size and no. iterations July, 2010 Slide 32 of 47
PRE-ESTIMATION OF COUPLING STRENGTH Coupling factors checks for partitioning Do we have to know the circuit details to estimate couplings? Good news. WR coupling may be large compared to EM coupling! WR Coupling factors γ 0.25 is small Each iteration error will be reduced by factor 4 Convergence in 3 to 5 iterations EM Coupling 10 3 may still be large! Cannot neglect such EM couplings July, 2010 Slide 33 of 47
ESTIMATE OF COUPLING STRENGTH + I 1 V 1 _ Coupling factors checks for partitioning Inductive coupling: γ=lp 2 12 /(Lp 11Lp 22 ) Can also use distance-size related criteria Capacitive couplings, use similar approximation WR couplings is weak if γ 0.1 July, 2010 Slide 34 of 47
Inductive SSy WR Decoupling I 2 Lp 22 + V 2 SSy 12 I 6 Lp 66 + V 6 SSy 1 I 9 Lp 99 V 9 + SSy5 V 2 = Lp 26 si 6 + Lp 29 si 9 + ;V 6 = Lp 62 si 2 + Lp 69 si 9 + July, 2010 Slide 35 of 47
Capacitive SSy WR Decoupling ic 7 1 p 77 SSy 1 I 7 1 p 55 ic 5 SSy 12 I 5 SSy 5 ic 2 1 p I 2 22 I 2 = p 25 p 22 Ic 5 + p 27 p 22 si 7 + July, 2010 Slide 36 of 47
ASSEMBLING THE SSy FROM ELEMENTS SSy SSy Test Coupling all Elements Between SSy Elements dc paths are directly coupled July, 2010 Slide 37 of 47
SOLUTION OF PARTITIONED SSys Neutral Delay Differential Equations (NDDE) in Modified Nodal Analysis (MNA) form C 0 ẋ+g 0 x+ i G i x(t τ i)+ i C i ẋ(t τ i ) i B i u i (t τ i )= i C + i ẋ(t τ i ) i G + i u i(t τ i )+ i B + i u i (t τ i ) Solve the subsystems SSy in usual Spice form Each processor has its own Spice Circuit solver Always use latest waveform results Each subsystem SSy has its own time-step Need Multi-Rate interpolation among the coupled waveforms July, 2010 Slide 38 of 47
ORDERING AND SCHEDULING FOR SSys 3 4 1 5 2, 6 Ordering: (Pin1: SSy1), (Pin3: SSy2), (Pin4: SSy3), (Pin5: SSy 4), (Pin2,Gnd: SSy 5) Basic schedule SSy1, SSy2, SSy4, SSy5, SSy3 July, 2010 Slide 39 of 47
MIXED WR-PARALLEL MATRIX SOLVER Large dependence on number of available processors Large dependence on system size (Number of SSy) Convergence in 3 to 10 iterations Conventional solution: At most 1 processor per SSy New solution: Assign matrix solver Number of processors depends on size of SSy SSy compute time is more uniform July, 2010 Slide 40 of 47
VALIDATION PROBLEMS FOR WR SOLUTION The first contact is driven by a pulse voltage source with rise time τ r = 50 ps. 1 10 9 0.9 0.8 0.7 10 10 10 11 Voltage [V] 0.6 0.5 0.4 Voltage [V] 10 12 10 13 0.3 10 14 0.2 0.1 10 15 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Time [ns] 10 16 0 5 10 15 20 25 30 35 40 45 Frequency [GHz] Left: transient voltage; Right: magnitude spectrum. July, 2010 Slide 41 of 47
PARALLEL MATRIX SOLUTION OF SSy Size of each SSy is different for real problems Several processors to solve SSy circuits 1.1 1 Pin Pin and gnd 0.9 Relative compute time 0.8 0.7 0.6 0.5 0.4 1 2 3 4 5 6 7 8 No. processors Parallel compute time for 1 pin and pin + ground July, 2010 Slide 42 of 47
A CONNECTOR TEST PROBLEM 9 20 2 1.5 1b 2b 3b 4b 5b 0.8 12 5 4 3 1 e e e e 2 e 5 1.5 2 July, 2010 Slide 43 of 47
WAVEFORM COMPARISON WITH WR 0.6 0.5 0.4 PEEC pin1 inp (WR)PEEC pin1 inp PEEC pin1 out (WR)PEEC pin1 out Voltage [V] 0.3 0.2 0.1 0 0.1 0.2 0 5 10 15 Time [ns] Input and output WF flat and WR comparison July, 2010 Slide 44 of 47
CONNECTOR MODELING RESULTS Inductive cells Capacitive cells Nodes 552 752 200 Table 1: Global problem. Inductive cells Capacitive cells Nodes 264 304 80 Table 2: Grounded pin+ground plane. Global [s] Grounded pin+ground plane [s] Ratio 119.4 21.35 5.5 Table 3: CPU-time requirements. July, 2010 Slide 45 of 47
LARGE EM SYSTEM BEHAVIOR FOR WR Original circuit matrix size: NxN Number of subsystems SSy: S Number of processors: P Number of WR iterations: K Circuit solver run time assumed: O(N 2 ) S = P = 3; K=3; Time Full = N 2 (2) Time WR = KN2 S = N2 (3) July, 2010 Slide 46 of 47
SUMMARY AND CONCLUSIONS WR for circuits Introduction of issues for circuit WR Status: Ongoing work on improving partitioning PEEC solver status Work in starting phase, several problems solved General Status Many papers have been published, parallel WR Continuous progress on new algorithms and implementations Commercial interest is in large parallel solvers July, 2010 Slide 47 of 47