Survey of Low Power Techiques for ROMs Edwi de Agel Crystal Semicoductor Corporatio P.O Box 17847 Austi, TX 78744 Earl E. Swartzlader, Jr. Departmet of Electrical ad Computer Egieerig Uiversity of Texas at Austi Austi, TX 78712 Abstract This paper presets a survey of low power techiques for Read Oly Memories (ROMs). Sigificat savigs i power dissipatio are achieved through the use of techiques at the circuit ad architecture level. The ROM circuits have bee desiged i 0.35 m CMOS techology ad simulated usig PowerMill. Itroductio With the developmet of submicro techologies ad the icrease of complexity o VLSI chips, the market for portable applicatios, digital sigal processors ad ASIC implemetatios has focused sigificat effort o the desig of low power systems [1]. ROMs (Read Oly Memories) are a importat part of may digital systems (e.g., digital filters, digital sigal processors, microprocessors etc). The high area desity of ROMs makes these types of circuits very attractive to store fixed iformatio (e.g., coefficiets of a digital filter). As ew submicro techologies are developed, the fast speeds of these processes allow the implemetatio of architectures which could ot be implemeted i the past. Also the icrease i the umber of metal layers becomes a mai istrumet to reduce switched capacitace without pealty i the desity of the ROM. Sigificat savigs i power are achieved through the implemetatio of several techiques. The focus of this paper is o techiques at the architecture ad trasistor levels ad their global impact o power dissipatio. The first sectio of the paper explais traditioal ROM desigs ad the sources of power dissipatio. The secod part of this paper discusses low power techiques at the architecture level. The ext sectio presets techiques that are applicable at the circuit level. The last sectio shows results ad coclusios. 1 Sources of Power Dissipatio Figure 1 shows the traditioal architecture of a ROM. The decoder selects amog the row lies that ru through the ROM core, turig o oly oe row lie at a give time. The colum multiplexer ad driver select which colum is beig read ad drive the data bus. The cotrol logic geerates the iteral sigals of the ROM (i.e., precharge, read etc.). The ROM core is used to store iformatio through the placemet of trasistors. There are two mai types of ROMS: NAND array, where pull dow trasistors are i series ad NOR array where the pull dow trasistors are i parallel. This paper focuses o ROMs usig a NOR array sice these structures are faster tha NAND arrays ad are the most frequetly used [2]. Address Clk Decoder Cotrol...... Rom Core Colum Mux & Driver Dataout Figure 1: ROM Block Diagram 12-1 Mux Figure 2: ROM Bitlies Row Lie Bit Lie I order to save power, most ROMs precharge durig oe phase of the clock ad evaluate i the other. Table 1 shows the power dissipatio i a 2K x 18 ROM desiged i 0.6 m techology at 3.3V ad clocked at 10 MHz. As the table shows, the precharge of the bit lies i the ROM core dissipates most of the power. There are two mai reasos for this. First, bit lies have large capacitace (drai capacitace of the trasistors tied to this lie, parallel plate ad frige compoet to substrate plus the overlap of the row lies ad other metal layers). Secod, more tha 18 bit lies are switched per access; this is due to the word lie selectig more bit lies tha
Table 1: Power Dissipatio ROM 2k x 18 Block Power Percetage (%) Decoder 0.06 2.1 ROM Core 2.24 89 Cotrol 0.18 7.2 Drivers 0.05 1.7 is ecessary (see figure 2). The example presets a multiplexer ratio 12 to 1. As a result at least 4 more bit lies will switch istead of oe. The power dissipated i the cotrol logic is because it cotais all the drivers to geerate the sigals feedig the decoder. Also the cotrol logic geerate the precharge sigal which is used to precharge the ROM core, eable the output drivers, ad eable the decode logic. The power dissipated i the decoder is ot much sice oly oe row lie switches per access. 2 Low Power Techiques: Architecture Sice most of the power dissipated is due to switchig of the bit lies, a sigificat umber of the followig techiques focus o the ROM core. 2.1 Hierarchical Word Lie This cocept has bee proposed for static radom access memories (SRAMS) [3]. The basic idea is to divide the memory i differet blocks ad ru the block word lie i oe layer (i.e., metal1 or poly) ad a global word lie i other layer. As a result oly the bit cells of the desired block are accessed. The same cocept ca be applied to ROMs. The ROM ca be divided i several blocks ad a give block is eabled through the address bits. Although a sigificat amout of the power dissipated ca be reduced through this techique, it does ot solve the problem completely, the mai reaso is that due to layout cosideratios a ratio of at least 4 to 1 is required i the multiplexer. A sigificat reductio i power is obtaied but still more tha oe bit lie per bit could be switchig. 2.2 Selective Precharge Large capacitaceis beig switched per cycle because every bit lie is beig precharged high durig the first part of the cycle ad may bit lies are discharged eve whe these locatios are ot accessed. Through selective precharge oly bit lies which will be accessed are precharged [5]. The hardware overhead of this techique is low sice most of this cotrol logic is the same cotrol logic required to cotrol the multiplexers at the bottom of the ROM. 2.3 Miimizatio of No-zero Terms This techique focuses i the reductio of the capacitace i the bit lies ad the row lies. This ca be achieved by miimizig the umber of o-zero terms i the ROM table which reduces the umber of NMOS devices i the ROM core. This techique is very efficiet sice zero terms do ot switch bit lies ad reduce capacitace i both bit lies ad row lies. 2.3.1 Iverted ROM If the umber of oes is very high, the whole ROM core ca be iverted ad the fial data iverted i the drivers. The efficiecy of this type of ecodig depeds o the origial umber of o-zero terms. If the umber of o-zero terms is close to half the umber of bits i the ROM core the the reductio of o-zero terms will be small or oe. 2.3.2 Iverted Row The reductio of o-zero terms ca be performed or doe o a row by row basis. A give row is iverted if more tha half of the bits are o-zero terms. Figure 3 shows two origial rows ad the result after the techique has bee applied. It is importat to observe that a extra bit per row is required to perform the ecodig. Also ote that if the the whole ROM would have bee iverted the reductio of o-zero terms i oe row would have bee offset by the icrease i the other oe. Origial 1 0 1 1 1 0 1 1 Oe Row Ecodig 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 Ecoded Bit Figure 3: Iverted Row 2.3.3 Sig Magitude Represetatio Ofte a ROM is used to store the coefficiets of a digital filter. As a result, a sigificat amout of the o-zero terms are due to the sig extesio of the egative coefficiets. Sig Magitude represetatio ca be used to reduce a sigificat umber of the oes. The mai drawback of this type of ecodig is that a coversio to two s complemet is required at the edo of a cycle, which slows dow the ROM. Still for applicatios like mixed-sigal systems where speed is ot a issue, this type of ecodig ca be very useful. 2.3.4 Sig Magitude ad Iverted Block The umber of o-zero terms ca be reduced further more if the sig magitude represetatio is implemeted alog with the iverted row ecodig. After the sig magitude is doe, the iverted row ecodig could be applied i a subset of the row (e.g., the 5 least sigificat bits). 2.4 Differece Ecodig Differece ecodig ca be used to reduce the whole size of the ROM core. For digital filters ad other applicatios the ROM is accessed sequetially. If the values betwee adjacet data do ot chage sigificatly betwee oe address ad the ext, the ROM core ca store the differece betwee the data istead of the whole value [4]. The mai disadvatage is that a adder is required to calculate the origial value. A variatio of the same cocept is to hard wire differet costats (i.e., offsets) ad store oly the differece with respect to the costat. 2.5 Smaller ROMs Figure 4 shows the coefficiets of a 102 tap FIR filter. If these coefficiets are stored i ROM, the largest coefficiets will determie the size of the ROM required. More tha 70% of the coefficiets are below 18 bits. Still the largest coefficiet goes up to 24 bits. As
a result the ROM core has wasted space ad additioal capacitace. A better implemetatio ca be achieved if the large coefficiets are stored i a wide ROM with fewer address. The small coefficiets are stored i arrow ROM with may addresses. A similar priciple ca be applied for locatios i ROM which are ofte accessed; locatios that are accessedfrequetly are stored i a small, fast ROM, while the other locatios are stored i a larger ROM [6]. 6000000.0 I order to avoid switchig i the data bus or the adder required to covert from sig magitude to two s complemet a voltage keeper is used to miimize switchig. Figure 5 shows a possible implemetatio of the keeper with the ivert logic. The voltage keeper is used to store past history ad avoid trasitios i the data bus ad adder (if sig magitude is implemeted). Fire sigal is eabled after the ROM core has evaluated. Pass ad Ivert sigals are used if sig magitude or Row Ivert are implemeted. Vdd Precharge Pass_b w Iput Output 4000000.0 Fire Ivert_b Pass Ivert Figure 5: Output Stage 2000000.0 0.0 3.3 Buffer Sizig A large set of buffers is required i the cotrol logic to drive the address lies through the decoder, geerate the cotrol sigals for the colum multiplexers, drive the row lies ad drive the precharge sigals. For a log time, the optimum buffer taperig factor e = 2:72 has bee used [8]. Figure 6 presets the model used. I the figure g represets the coductace while represets the taper defied as: -2000000.0 0.0 50.0 100.0 2-1 g g g g Figure 4: 102 Tap FIR Filter C y CL 3 Low Power Techiques: Circuit Level Figure 6: Drivig Large Capacitive Loads Low power techiques at the circuit level ca be powerful tools to reduce the power i VLSI systems [7]. 3.1 NMOS Precharge A importat techique to reduce the power dissipated i the bit lies is limitig the voltage swig. This ca be doe through NMOS precharge of the ROM core; NMOS trasistors are used to precharge bit lies high. As a result, bit lies are precharged to Vdd - Vt, where Vt is the threshold voltage. Sice the bit lies switch oly betwee Vdd - Vt ad groud sigificat savigs ca be achieved. A drawback of this techique is degradatio of oise margis ad the body bias effect (which icreases the threshold voltage) requirig careful desig of the output drivers. 3.2 Voltage Keeper Oce the umber of o-zero terms has bee miimized, switchig of bit lies is reduced. Still eve if the same locatio of the ROM is accessed repeatedly, bit lies eed to be precharged every time. (W=L) k+1 = (W=L) k (1) where W ad L are the width ad legth of trasistors i a give stage. I this case idicates the size of stage k + 1 relative to stage k. The umber of stages required for a give capacitive load is: = l CL=Ci l This model igores the effect of parasitic capacitaces at the outputofeachstage. Havilad [9] icludesthe parasitic capacitace i the calculatios usig a split capacitor model (see figure 7). C x ad C y are the iheret output capacitace ad the icidetal load capacitace respectively. Usig this model ad developig a equatio to miimize delay the optimum taper factor is: (2)
Cy 2-1 g g g g 2 Cx Cy x Cy Cx Cy C CL Table 2: ROM Ecodig Ecodig Power Two s Complemet 0.80 Sig Magitude 0.78 Row Ivert 0.69 Taper Factor 12.0 11.0 10.0 9.0 8.0 7.0 6.0 5.0 4.0 3.0 Figure 7: Improved Model Delay Power-Delay 3.5 Voltage Scalig Table 3: Selective Precharge Selective Precharge Power Before 0.69 After 0.58 Voltage scalig is oe of the most powerful tools to reduce the power dissipatio. A quadratic improvemet ca be easily achieved through voltage scalig. Although this techique is very effective i reducig power the speed of the circuits is degraded as the voltage goes dow. A first order derivatio [1] shows that the delay of CMOS gates ca be expressed as: 2.0 0.0 1.0 2.0 3.0 Cx/Cy T delay = CLVdd I = 2C LV dd C ox(w=l)(v dd, V t) 2 (6) Figure 8: Power-Delay Product versus Delay [l(), 1] = Cx (3) C y This equatio shows that the optimum taper depeds o the ratio of C x=c y. Still, this equatio has bee developedto miimize delay. For power dissipatio, there are ofte large capacitive loads which are ot i the critical path. Choi [10] derived the taperig factor to miimize power-delay product usig the same model. The optimum ca be expressed as: (, 2) l(), (, 1) =0 (4) If the parasitic capacitaces are eglected = 4:25. Havilad [9] shows that both taperig factors ca be related by: Power,Delay =( Delay) (5) where = 1:44. Figure 8 shows a graph comparig the for differet ratios of C x=c y. A differet derivatio to miimize power uder delay costrait have bee doe by Figueras [13]. 3.4 Reductio of Short Circuit Currets Careful desig of the cotrol logic is required i order to avoid turig o row lies whe the precharged circuitry is o. Also output drivers eed to be eabled after the ROM core has evaluated. Delay lies ca be used to geerate sigals with precise timig [12]. A robust desig of the delay lies is eeded to avoid performace degradatio through process variatios. A sigificat reductio of the short circuit dissipatio ca also be achieved through scalig of the power supply. Accurate expressios to estimate short circuit currets has bee doe by Caufape [13]. The speed of ROMs is degraded sigificatly because the trasistor drivig the bit lies is close to miimum size. Results Table 2, 3 ad 4 shows the cummulative effects of applyig multiple low power methods. First a covetioal 256 x 24 ROM usig two s complemet was desiged. Next sig magitude was applied to the data plugged ito the ROM. The ext desig implemets the row ivert ecodig i additio to sig magitude. Table 2 compares the results of the several ecodigs i a 256 x 24 ROM. The data stored i the ROM was geeratedthrough a pseudoradom fuctio i C laguage. The ROMs were desiged with a mux ratio of 4 to 1, simulated with PowerMill [14] at 3.3V, 10MHz i 0.35m techology. From the table it ca be observed that sice the data i the ROM is radom, power savigsusig row ivert ecodig are greater tha usig sig magitude ecodig. For digital filters (see figure 4) ad other applicatios where small egative umbers are required sig magitude gives better results. Table 3 shows a compariso of the ROM with row ivert ecodig before ad after selective precharge has bee implemeted. Through selective precharge oly 1 out of 4 colums are prechargig resultig i sigificat savigs i power. Table 4 shows the power dissipatio of the ROM whe the voltage is scaled to 2.5V. Although sigificat savigs are reached quadratic savigs are ot achieved due to icrease i short circuit currets. Table 4: Voltage Scalig Voltage Power 3.3V 0.58 2.5V 0.39
Table 5: Voltage Scalig Techique Coditios Power Savigs ** *** (%) Sig Magitude Radom Data 2.5 Row Ivert After Sig Magitude 11 Selective Precharge After Sig Magitude 14 ad Row Ivert Voltage Scalig After Other Techiques 24 Total After all techiques 51 Table 5 shows the power savigs of the differet techiques. The power savigs show for selective precharge ad voltage scalig are after the other techiques have bee applied. [11] H. J. Veedrick, Short-Circuit Dissipatio of Static CMOS Circuitry ad Its Impact o the Desig of Buffer Circuits, IEEE Joural of Solid-State Circuits, vol. SC-19, pp. 468-473, 1984. [12] M. Satoro, Desig ad Clockig of VLSI Multipliers, Ph.D. Dissertatio, Staford Uiversity, 1990. [13] J. Figueras, Power Modelig, NATO Semiar o Low Power Desig i Deep Submicro Electroics, Lucca, Tuscay, Italy, August 20-30, 1996. [14] C.X. Huag, B. Zhag, A-C. Deg, ad B. Swirski, The Desig ad Implemetatio of PowerMill, Proceedigs1995 Iteratioal Symposiumo Low Power Desig, pp. 105-109, 1994. Coclusio ROM Lowpower techiquesat the architectural ad the circuit level have bee preseted. The use of several of these techique sigificatly reduces the power dissipated i the ROM. The efficiecy of the differet techiques depeds o the data stored to be stored i the ROM core, speed requiremets ad area overhead. High power savigs ca oly be achieved through the use of multiple techiques. REFERENCES [1] A. P. Chadrakasa, S. Sheg ad R. W. Broderse, Low- Power CMOS Digital Desig, IEEE Joural of Solid-State Circuits, vol. 27, pp. 473-483, 1992. [2] D. A. Hodges ad H. G. Jackso, Aalysis ad Desig of Digital Itegrated Circuits, Secod editio, McGraw-Hill Publishig Compay. pp. 346-353, 1988. [3] M. Yoshimito, K.Aami, H. Shiohara, T. Yoshihara, H. Takagi, S. Nagao, S. Kayao, ad T. Nakao, A Divided Word- Lie Structure i the Static RAM ad its Applicatio to a 64K Full CMOS RAM, IEEE Joural of Solid-State Circuits, vol. SC-18, pp. 479-485, 1983. [4] N. Sakarayya ad K. Roy, Algorithms for Low Power FIR Filter Realizatio Usig Differetial Coefficiets, IEEE 10th Iteratioal Coferece o VLSI Desig, Hyderabad, Idia, pp. 174-178, 1997. [5] N. Weste, ad K. Eshraghia, Priciples of CMOS VLSI Desig: A Systems Perspective, Secod editio, Addiso- Wesley, pp. 585-588, 1993. [6] C. Piguet, Low-Power Microprocessors ad Memories, NATO Semiar o Low Power Desig i Deep Submicro Electroics, Lucca, Tuscay, Italy, August 20-30, 1996. [7] E. de Agel ad E. E. Swartzlader Jr., Survey of Techiques for Low Power VLSI Desig, Iteratioal Coferece o Iovative Systems i Silico, pp. 159-169, 1996. [8] R. C. Jaeger, Commets o A optimized output state for MOS itegrated circuits, IEEE Joural of Solid-State Circuits, vol. 10, pp. 185-186, 1975. [9] G. L. Havilad ad A. A. Tuszyski, CMOS Tapered Buffer, IEEE Joural of Solid-State Circuits, vol. 25, pp. 1005-1008, 1990. [10] J. Choi ad K. Lee, Desig of CMOS Tapered Buffer for Miimum Power-Delay Product, IEEE Joural of Solid-State Circuits, vol. 29, pp. 1142-1145, 1994.