Lecture 30 Perspectives Administrivia Final on Friday December 14, 2001 8 am Location: 180 Tan Hall Topics all what was covered in class. Review Session - TBA Lab and hw scores to be posted on the web please check if correct or if something is missing Superb Job on Posters! FEEDBACK ON COURSE EXTREMELY WELCOME!
Transistors (MT) 10000 1000 100 10 1 0.1 0.01 0.001 Transistor Count 286 386 8085 4004 8008 8080 8086 900M 425M 200M P6 486 Pentium proc 1.8B 1970 1980 1990 2000 2010 Year 200M--1.8B transistors on the Lead Microprocessor S. Borkar 18nm FinFET Double-gate structure + raised source/drain Gate Silicon Fin Source BOX Gate X. Huang, et al, 1999 IEDM, p.67~70 Drain Si fin - Body! I d [ua/um] 400 350 300 250 200 150 100 50 0-1.25 V -1.00 V -0.75 V -0.50 V -0.25 V -1.50 V -1.5-1.0-0.5 0.0 V d [V]
Power Density With Vdd ~1.2V, these devices are quite fast. FO4 delay is <5ps If we continue with today s architectures, we could run digital circuits at 30GHz But - we will end up with 20kW/cm 2 power density. Lower supply to 0.6V, we are down to 5kW/cm 2. Speeds will be a bit lower, too, FO4 = 10ps, lowering the frequencies to ~10GHz [Tang, ISSCC 01], and lowering power Assume that a high performance DG or bulk FET can be designed with 1kW/cm 2, with FO4 = 10ps [Frank, Proc IEEE, 3/01] Power (Watts) Power will be a problem 100000 10000 1000 100 10 1 0.1 8085 8086286 386 486 4004 80088080 Pentium proc 18KW 5KW 1.5KW 500W 1971 1974 1978 1985 1992 2000 2004 2008 Year Power delivery and dissipation will be prohibitive S. Borkar
Power is a Limiting Factor If we have 2cm x 2cm die in a high-performance microprocessor, we will end up with 4kW power dissipation. If our power has to be limited to 180W, we can afford to have only 4.5% of these devices with 0.6V supply on the die, given that nothing else dissipates power. Possible Scenario Example: 0.5 % of devices will be of highest performance 35% is leakage (assume: 20% drain, 10% gate, 5% drain-to-body) 65% is active power, if just 0.5% of these CV 2 = 13W, leakage 7W How would other 99.5% devices that populate the 2cmx2cm die look like?
Microprocessors Today 20nm Cache Cache µp Core 2GHz µp Core Dedicated Logic 7-10 GHz Microprocessor Design Core datapath will be running at 7-10GHz Requires fast devices, low thresholds with 0.5-0.6V supplies Lowest NMOS V Th ~ -0.1V to get swing in CMOS. Assume threshold of 0 0.1V. The devices will be very leaky, will use second threshold to control leakage power. With second threshold set to have 10x less leakage, 90% of devices off critical paths can be made highthreshold. Power limits the size of the µp core to 5-10% die (today s transistor count, just shrunk), 30-50% of total power budget.
Add Dedicated Datapath Can execute e.g. DIVX decoder, graphics Vdd Logic Block Freq = 1 Vdd = 1 Throughput = 1 Power = 1 Area = 1 Pwr Den = 1 Vdd/2 Logic Block Logic Block Freq = 0.5 Vdd = 0.5 Throughput = 1 Power = 0.25 Area = 2 Pwr Den = 0.125 Leakage Curr. = 2 Will run at 10x lower frequency, at 0.5-0.7 of the processor V DD = 0.25-0.35V Thresholds for critical paths V Th = 150mV Need leakage power management another threshold or control of V T 180W Gives Us: Power Area µp Core Memory µp Core Memory Dedicated datapath Dedicated datapath
Memory Density is the key requirement Will occupy 70-80% of the die Low leakage Low activity Inherently low active power, low power density (at least 10x less than logic) Need higher V Th ~ 0.5V, and higher supply 0.8-1V (?) Systems-on-a-Chip Today 20nm Radio (60GHz (?), CMOS?) 25M transistors, 3MB embedded SRAM MIPS core @ 100MHz, DSP @ 144MHz 7 PLLs, 12 ADC, DACs, 100 clocks, 1.4W Broadcom set-top box 2W
Transistor Requirements Will need different kinds of transistors:» Datapaths (speed, leakage)» Dedicated DSP (power, leakage)» Memory (density is main concern)» Analog (?) Power and leakage determine the size ratios between these blocks Number of different transistors types is determined by parameter spread Less devices could solve the problem, but, need control of the threshold (4 th terminal), with strong transfer function. Today s Design Methodologies Will Not Scale Much Further The Deep Sub-Micron (DSM) Effect ( 0.25µ) DSM Microscopic Problems Wiring Load Management Noise, Crosstalk Reliability, Manufacturability Complexity: LRC, ERC Accurate Power Prediction Accurate Delay Prediction etc. Everything Looks a Little Different? 1/DSM Macroscopic Issues Time-to-Market Millions of Gates High-Level Abstractions Reuse & IP: Portability Predictability etc. and There s a Lot of Them!
The Productivity Gap Logic Transistors per Chip (K) 10,000,000.10µ 1,000,000.35µ 2.5µ 100,000 10,000 1,000 100 10 1 Logic Transistors/Chip Transistor/Staff Month 58%/Yr. compound Complexity growth rate x x x x x x x 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 21%/Yr. compound Productivity growth rate 100,000,000 10,000,000 1,000,000 100,000 10,000 1,000 Productivity Trans./Staff - Month 2001 2003 2005 2007 2009 100 10 Source: SEMATECH Implementation Methodologies Digital Circuit Implementation Approaches Custom Semi-custom Cell-Based Array-Based Standard Cells Macro Cells Pre-diffused Pre-wired Compiled Cells (Gate Arrays) (FPGA)
Custom Design Layout Editor Magic Layout Editor (UC Berkeley) Standard Cell - Example 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and for three different technologies
Synthesis 1. Describe your circuit in HDL (VHDL, Verilog) 2. Syntehsis programs map it into a standard cell library. Set the constraints (timing, area) 3. Get a gate level netlist automatic place and route 4. Insert clock 5. Extract the netlist from layout 6. Does it meet constraints? go back to 1, 2, 3, 4. Called Design closure timing closure, power closure. Gate Array Sea-of-gates polysilicon V DD rows of uncommitted cells GND metal possible contact Uncommited Cell In1 In2 In3 In4 routing channel Committed Cell (4-input NOR) Out
Sea-of-gate Primitive Cells Oxide-isolation PMOS PMOS NMOS NMOS NMOS Using oxide-isolation Using gate-isolation Sea-of-gates Random Logic Memory Subsystem LSI Logic LEA300K (0.6 µm CMOS)
Prewired Arrays Categories of prewired arrays (or fieldprogrammable devices): Fuse-based (program-once) Non-volatile EPROM based RAM based Programmable Logic Devices PLA PROM PAL
EPLD Block Diagram Primary inputs Macrocell Courtesy Altera Corp. Field-Programmable Gate Arrays Fuse-based I/O Buffers Program/Test/Diagnostics Vertical routes Standard-cell like floorplan I/O Buffers I/O Buffers Rows of logic module s Routing channels I/O Buffers
Interconnect Programmed interconnection Input/output pin Antifus e Cell Horizontal tracks Vertical tracks Programming interconnect using anti-fuses Field-Programmable Gate Arrays RAM-based CLB CLB Horizontal routing channel switching matrix Interconnect point CLB CLB Vertical routing channel
RAM-based FPGA Basic Cell (CLB) Combinational logic Storage elements R A B/Q1/Q2 C/Q1/Q2 D An y function of up to 4 variables F D in F G R D Q1 CE F A B/Q1/Q2 C/Q1/Q2 D An y function of u p to 4 variable s G F G R D Q2 CE G E Clock CE Courtesy of Xilinx RAM-based FPGA Xilinx XC4025
Addressing the Design Complexity Issue Architecture Reuse Reuse comes in generations Generation Reuse element Status 1 st Standard c e lls We ll e s tablis he d 2 nd IP blo c ks Be ing introduc e d 3 rd Architecture Eme rging 4 th IC Early re s e arc h Source: Theo Claasen (Philips) DAC 00 Architecture ReUse Silicon System Platform» Flexible architecture for hardware and software» Specific (programmable) components» Network architecture» Software modules» Rules and guidelines for design of HW and SW Has been successful in PC s» Dominance of a few players who specify and control architecture Application-domain specific (difference in constraints)» Speed (compute power)» Dissipation» Costs» Real / non-real time data
Platform-Based Design Only the consumer gets freedom of choice; designers need freedom from choice (Orfali, et al, 1996, p.522) A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer New platforms will be defined at the architecture-micro-architecture boundary They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations Key to such approaches is the representation of communication in the platform model Source:R.Newton Design at a crossroad System-on-a-Chip Multi- 500 k Gates FPGA Spectral RAM + 1 Gbit DRAM Imager Preprocessing 64 SIMD Processor Array + SRAM Image Conditioning 100 GOPS Analog µc system +2 Gbit DRAM Recognition Embedded applications where cost, performance, and energy are the real issues! DSP and control intensive Mixed-mode Combines programmable and application-specific modules Software plays crucial role
EE 141 Summary Digital CMOS design is well and kicking Some major challenges down the road caused by Deep Sub-micron» Super GHz design» Power consumption!!!!» Reliability making it work» Device variations Some new circuit solutions are bound to emerge Who can afford design in the years to come? Some major design methodology change in the making!