Design and Analysis of CMOS Full Adders for Low Power and Low Frequency of Operation for Scavenged-Power Wireless Sensor Networks

Design and Analysis of CMOS Full Adders for Low Power and Low Frequency of Operation for Scavenged-Power Wireless Sensor Networks Jerry Lam 100323125 December 18, 2007 Abstract While many VLSI applications require or benefit greatly from low power consumptions, scavenged power wireless sensor networks have far more stringent power requirements, often in the sub-uw range. At these power levels, static and leakage power consumption can form significant amounts of the total power consumption of digital circuits, so traditional design methodologies must be modified somewhat to take this into account. Thus, the logic styles, voltage levels and transistor sizes used must be optimized for the expected frequency of operation and circuit complexity. The results shown in this paper demonstrate that leakage power can dominate over switching power at certain regions of operation, but that both forms of power consumption can be reduced by using low supply voltages. Despite the increased delays that this may provide, good design techniques, including the proper selection of logic families and transistor sizes, can allow sensor node controller circuitry to operate at these levels. Finally, a design flow is suggested for the design of such circuits. 1 Introduction As VLSI circuits become increasingly more complex, the number of transistors in typical designs increases rapidly, greatly increasing power consumption. Although newer processes can offset this slightly with smaller and faster transistors, the optimization of power is increasingly becoming a vital design characteristic. This is particularly true in wireless applications, where the available power sources limit the amount of power and energy available for the circuit to consume. In some applications, traditional power sources such as batteries or power leads cannot be used. This is typical of many biomedical wireless sensor networks, particularly in the case of implanted sensors, where such sources of power would greatly increase the size of the devices to impractical levels, or would interfere with the operation of the sensor itself. The solution to this problem is to use scavenge power from the environment, although this puts severe constraints on the device performance. For instance, [1, 2] describe a variety of scavenged energy techniques, of which typical power supply levels range from sub-uw to several mw (see [3] for an example). RF transceivers have been shown to be able to operate in this power range (see [4] for instance), but only on an average basis. This forces the transceivers to only operate for short periods of time in order to transmit and receive data, while operating in a low power sleep mode for the majority of the time. Thus, in even a simple sensor network with 2 elements, the two nodes must be synchronized so that they may be switched on and off together. Many schemes for obtaining time synchronization of wireless sensor networks have been proposed (for instance, see [5, 6]), but these algorithms are sufficiently complex to require implementation using digital logic. In order to allow the sensor node to scavenge sufficient power for the next radio transmission/reception, the logic must consume power at far lower levels than that being scavenged, thus forcing total power consumption levels to fall below the uw range. However, since the logic used to implement such algorithms tends to be very simple, requiring only basic calculations based on time of arrivals of signals, and the decoding of short messages. Thus, the logic can operate at low frequencies (under 1 MHz), thus allowing the use of very low supply voltages, without too much effect from the resulting increased delay, as discussed in [7]. 1

This paper looks at the design and analysis of a CMOS full adder under these design constraints. Since full adders are used in a variety of common digital circuits, such as DSP blocks and counters, and because they are fairly complex combinational logic circuits, full adders form a good basis of demonstrating the important design concerns that must be taken into account when designing digital circuits for use in wireless sensor networks. Since the exact task of the circuits being demonstrated in this paper is not specified, a general purpose static CMOS full adder is designed, although discussion is made as to when the design choices made are feasible and when other alternatives form better design choices. 2 Background 2.1 Sources of power dissipation The power consumption of digital logic circuits can be divided into 4 main categories: static power, leakage power, dynamic power and short circuit power [8]. Static power comes from bias current that naturally flows through the logic circuits, even when they are not switching, due to the imcomplete turn-off of transistors. This is found in logic families such as pseudo-nmos logic or static differential split-level logic (SDSL) [9]. Leakage power comes from the sub-threshold power dissipation. The sub-threshold current is an exponential functions of the drain, gate and source voltages, given by [7] as ( ) I ds = Ke Vgs V t nv T 1 e V ds V T (1) where K is a function of W/L. The power consumption from static power and leakage power can be calculated by multiplying the current by the supply voltage. Dynamic power and leakage power both arise when the gate switches states, and is thus related to the switching frequency of the input. It is described in [8] that dynamic power, which arises from the current needed to charge or discharge the capacitive load of the next stage, is given by P dynamic = KCfV 2 DD (2) where K represents the activity factor corresponding to how frequently the gate changes state with respect to the main clock frequency f. Likewise, short circuit power, which arises from the temporary short circuit formed when all transistors in a conduction path are turned on in the middle of a transition, can be given by P shortcircuit = αf (V DD 2V T ) 3 (3) with α being a proportionality constant. It should be noted that operating at a supply voltage less than 2V T will eliminate the short circuit power, as it will be impossible to turn on all transistors in a conduction path. In conventional digital circuits, dynamic and short circuit form the dominant sources of power dissipation [8]. In this paper, both forms will be grouped together under the term switching power, as it is difficult (and impractical) to separate the two when simulating a circuit. 2.2 Methods of reducing power consumption As the reduction of power consumption is increasingly becoming a key goal in design of digital circuits, much literature exists which is devoted to the techniques available for reducing power consumption [7, 9]. These include system level optimizations, such as shutting off circuits that are not in use or minimizing the complexity of algorithm being implemented, or by lower level design choices, such as the method implementation of the algorithm. Commonly discussed topics concerning the latter include discussions of the style of logic implemented, the effects of reduced supply voltages, and the sizing of transistors. This paper will examine the effects that these have by examining the effects that they have on a full adder for use in a wireless sensor network. 2

3 Simulations and Results 3.1 Overview of constraints and operating regions As discussed previously, the 4 main types of power consumption are all proportional to the supply voltage, or a power of the supply voltage. Thus, decreasing the supply voltage levels will clearly decrease the power consumption. However, such decreases come at a cost of increased delays [7], as the decreased supply voltages lower the current able to drive additional loads, thus taking more time, and limiting the maximum frequency of operation. However, as discussed previously, the logic required for many time synchronization schemes is not too complicated, and may be able to operate at lower frequencies, ranging from multiple khz to MHz or higher, depending on the complexity of the algorithm involved and the time resolution required. Thus, the increased delay that arises from low supply voltages. In addition, because the circuit does not require very fast rise or fall times, the driving power of the adder can be made very weak, thus allowing the use of minimum size transistors. This will also decrease the dynamic power, as it is proportional to the area of the gate area of the next stage, and is typically the dominant source of power consumption for conventional circuits. The following section will look at the important design principles to be followed when designing an adder for these conditions. The design procedure will be carried through on a conventional static CMOS full adder with minimum size transistors, although discussion will be made as to when this is an appropriate decision, and when it is not. 3.2 Note on accuracy of results All of the results demonstrated in this paper are obtained from simulations using the Spectre simulator using the IBM 0.13 um RF kit, using standard transistors with a maximum voltage of 1.2 V and a threshold voltage of 0.3 V. As many of the simulations involve transistors in the sub-threshold region of operation, the accuracy of these results are dependent on the accuracy of the models in this region. Since this region is typically used far less often then the saturation or triode regions, the accuracy of the models in this region may not be completely accurate. This was observed when a different version of the models was used the results obtained subsequently differed by a large degree, which could be attributable to subtle changes in the models used. Thus, to ensure complete accuracy, the circuits discussed should be manufactured and tested. This was not done due to time and resource limitations. 3.3 Comparisons of Logic Families An algorithm for a logic circuit is just a behavioral description of the circuit it does not specify how the circuit should be constructed to implement the function, or even how to represent different binary values. Given the many different issues that digital logic circuits face, a vast number of logic styles have emerged, including conventional static CMOS, pass logic families, differential logic families, or dynamic logic families [9], as well as customized designs [10, 11, 12] and other logic design methodologies, such as subthreshold logic [13], or MOS- Current Mode Logic [14]. Due to time and space constraints, not every logic style and adder design has been evaluated but rather a selection of a few designs. Discussion is made on the important characteristics of each family, and how they perform under low voltage and low frequency of operation, thus showing the important issues that must be dealt with when choosing a logic family. As comparisons of the logic families varying from slightly above V T to the maximum V DD, subthreshold logic is not discussed, although it can provide the ultra-low power needed in a sensor network [13]. 3.3.1 Conventional Static CMOS Conventional static CMOS is one of the most common logic styles used, due to its relative simplicity, lack of static power dissipation, and decent performance. Conventional static CMOS logic is derived by deriving the logic for a pull-down NMOS stage and its complement for the pull-up PMOS stage. The truth table of a full adder is seen in Table 1. From this truth table, the following equations can be derived for the Sum and Carry outputs. 3

Table 1: Truth table of a full adder A B C Sum Carry 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 Figure 1: Schematic of first possible configuration of conventional static CMOS full adder Carry = A (B + C)+BC (4) Carry = A ( B + C ) + B C (5) Sum = A ( B C + BC ) + A ( BC + BC ) (6) Sum = A ( B C + BC ) + A ( BC + BC ) (7) Two possible implementations of the circuit exist: one seen in Figure 1 and one seen in Figure 2. The primary difference between the two lies in the position of the branching. Because of the reduced parasitics on this node in the second implementation, it is preferred. The power consumption of each over a range of frequencies can be seen in Figure 3, where the second implementation has 20% lower switching power. The static power for both remains the same. It should be noted the results obtained in this graph were obtained using an older set of models, and so the precise values cannot be compared to those obtained in later sections of this paper. However, the relative behavior of the circuits should not be change too much from one set of models to the next. 3.3.2 Complementary Pass Logic Complementary Pass Logic or CPL, is a logic family which uses NMOS transistors as switches to pass certain signals, based on the values of other signals [9]. As it only uses NMOS transistors, a CPL device has a very weak pull-up as can be seen in Figure 4, where the supply voltage is at 1.2 V. Although the expected maximum voltage is V DD V T, this is based on the assumption of no current flow when the transistors are in the sub-threshold region of operation. Since the output loads are small, even the small sub-threhold current can continue to charge 4

Figure 2: Schematic of second possible configuration of conventional static CMOS full adder Plot of average switching power vs frequency of operation Frequency (Hz) 1.00E+03 1.00E+04 1.00E+05 1.00E+06 1.00E+07 Configuration 1 Configuration 2 Power (W) 1.00E-10 1.00E-11 Figure 3: Comparisons of the switching power consumption of the two possible configurations of conventional static CMOS full adder with ideal inputs and no load 5

Sum and Carry waveforms of a CPL adder 1.2 1 Sum Inverse Carry Inverse 0.8 Voltage level (V) 0.6 0.4 0.2 0 0 0.000005 0.00001 0.000015 0.00002 0.000025 0.00003-0.2 Time (s) Figure 4: Plot of inverted sum and carry waveforms, showing weak pull-up outputs the next stage, increasing the voltage drop from the expected value, assuming sufficient time is given for the charging to occur. Thus, the circuit is functional, even with a supply voltage as low as 0.4 V. The weak pull-up output can pose a problem in circuits requiring a large noise margin, as the weakened logic 1 level become more susceptible to additive noise. In addition, the driving strength of the transistor is also weakened, since the pass transistors reduce the amount of current that is available to charge the next stage. In addition, if the voltage drop is too large, than the gate may hold the next stage in a state between the two logic levels, resulting in a short circuit current flow. This can be partially remedied by using a buffer can be used to increase its maximum output value, at the expense of additional transistors and thus, additional leakage current. The buffer can also be sized properly to reduce or eliminate the short circuit current resulting from weak outputs. Figure 5 shows the schematic of the CPL adder used, based on [9]. 3.3.3 Domino Logic A common issue with static logic is that output glitches can consume a lot of power, as the output can switch several times for a single set of input changes. To remedy this, dynamic logic families were created to guarantee a maximum number of output transitions per input change [9]. One example of this is found with dual rail domino logic. The logic features an NMOS stage, similar to that employed in the pull-down stage of a conventional static CMOS logic device. Surrounding this are two transistors connected to a main clock. When the clock is low, the top transistors is active, pre-charging the load. When the clock is high, the logic is allowed to pull-down the output, if the inputs are at the correct value. Thus, each clock cycle has no more than 2 transitions per clock cycle, although there is always 1 transition to charge the output. An example of a Domino adder can be seen in Figure 6. Simulations were run and showed that the adder was non-functional at lower frequencies (below 10 MHz). This is because the long data periods were sufficient to allow the output to discharge, even if the pull down stage is inactive (see Figure 7). While this could be remedied by a larger output capacitance (in the form of wider transistors), this would cause the dynamic power to increase beyond the benefits caused by the glitch reduction. 3.3.4 Customized Logic The final adder tested is that described in [12], which will be referred to as the 10T adder. Since it was observed that the two outputs of the adder can share some logic, the number of transistors that are needed to implement an adder can be reduced to 10. However, in simulations, it was observed that the resulted in weak pull-up and pull-down outputs (see Figure 9), and so the circuit was buffered. The schematic of the circuit tested can be seen in Figure 8. Most notable about this circuit is the fact that combines elements seen in conventional static CMOS and pass logic families. The circuit only needs 1 version of each input (either its true value or its complement), unlike 6

Figure 5: Schematic of complementary pass logic CMOS full adder Figure 6: Schematic of Domino adder 7

Figure 7: Output of the Domino adder, showing the carry output (top waveform) reverting back to a logic 1 after about 5 us after the clock transition (lower waveform) the previous designs, which required both the value and its complement. This greatly reduces susceptibility to timing mismatches, which could cause glitches, and reduces the load on the previous stage. 3.4 Unloaded Performance To compare the performance of full adders, they were first tested with no loads and ideal input pulses (with a 1 ns rise and fall time). The inputs consisted of a series of 3 clocks, each with a doubled frequency of the last. This ensures that all possible input combinations are tested. In each case, measurements are done with the transistor sizes being being kept minimum size. Some of the effects of optimization of transistor sizes are discussed in a later section. In general, the benefits of transistor resizing do not tend to span many orders of magnitude, so the relative performance of each adder should be still be similar. This precludes the need to optimize each transistor separately for each test case being used (with each scenario requiring a different set of optimizations). The important measurements observed include the leakage power (the power drawn by the adder when no switching is taking place), the switching power (the average power needed to change the state of the device), as well as the rise/fall time and the maximum power. The latter measurement may be important in some applications, as the power supply must be able to supply this peak power, if the operation of the device is to be maintained. Depending on the power scavenging mechanism and storage mechanism used, sharp spikes in power consumption may pose a problem. Note that the power-delay product, usually used as a method of evaluating the performance of a circuit, is ignored, since in general only the power dissipation being of concern, with long delays being tolerable, to a certain extent. Figure 10 shows the leakage power of the adders at different supply voltages. As can be seen, the conventional static CMOS adder performs the worst, with high power consumption. The CPL adder and the 10T have much lower leakage power, although they decrease at a different rate. The 10T has a significantly lower power consumption, although if properly optimized, they may be made more comparable. Figure 11 shows the rise/fall time of the adders at different supply voltages. Since each of the outputs has a different rise/fall time associated with it, only the worst one, which limits the performance of the entire circuit, is shown. As can be seen, the conventional static CMOS adder performs the worst, with very slow rise times, particularly at voltages. Both the CPL adder and the 10T have much higher rise/fall times, being able to operate in the MHz range even at low supply voltages. The exact value is of course subject to change with the addition of loads. Not shown here is the input to output delay, although it was observed that the delay and the rise/fall time track each other fairly consistently. Here, it can be seen that the CPL adder can operate at higher speeds 8

Figure 8: Schematic of buffered 10T adder Figure 9: Plot of sum and carry waveforms for the unbuffered 10T adder showing weak pull-up and pull-down outputs 9

Leakage Power vs. Supply Voltage Supply Voltage (V) 0.3 0.5 0.7 0.9 1.1 1.3 1.5 Static Power (W) 1.00E-10 Conv. Static CMOS CPL 10T Figure 10: Plot of leakage power of the unloaded full adders Maximum rise/fall time vs. Supply Voltage Maximum rise/fall time (s) 1.00E-05 Supply Voltage (V) 0.3 0.5 0.7 0.9 1.1 1.3 1.5 Compl. Static CMOS CPL 10T 1.00E-10 Figure 11: Plot of maximum rise and fall times of each adder. than the 10T, due to the fact that the input signal passes through fewer pass stages in the CPL adder. Both adders could be made faster if the need to buffer the outputs was removed. Figure 11 shows the maximum power consumption of the adders at different supply voltages. As can be seen, the conventional static CMOS adder performs the worst, with peak powers many orders of magnitude greater than the rest. The CPL adder and the 10T adder have very similar results. The switching power consumption over a range of frequencies was also observed, and can be seen in Figures 13, 14, 15, and 16. As can be seen, the static CMOS adder consistently consumes 100 to 1000 times more power than the other two adders. The CPL and the 10T perform very similarly, with each adder outperforming the other in different circumstances. As these results show, the optimal design for an adders will very depending on the operating condition (supply voltage, input frequency, etc.) and on which constraints are the most important. 3.5 Loaded Performance To simulate the adders under a more realistic condition, the adders were tested with a load. This was done by constructing a 3-bit ripple carry adder, with inverters feeding buffering the input signals, and with inverters at the output to simulate loads. The inputs signals were random binary signals, with random clock jitter and rise/fall time variations to model inexact signal arrivals. The result of this is to add glitches, which will effect the performance of the adder, since the output may change several times per input. The results of the simulations can be seen in Figures 17, 18, and 19. As can be seen, the relative performance 10

Peak Power vs. Supply Voltage 1.00E+00 Supply Voltage (V) 0.3 0.5 0.7 0.9 1.1 1.3 1.5 1.00E-01 1.00E-02 Peak Power (W) 1.00E-03 1.00E-04 1.00E-05 Compl. Static CMOS CPL 10T Figure 12: Plot maximum power of each adder Switching Power vs. Input Frequency at 1.2 V 1.00E-05 1000 10000 100000 1000000 Conv. Static CMOS Compl. Pass Logic 10T Input frequency (Hz) Power (W) 1.00E-10 Figure 13: Plot of switching power for each adder at a supply voltage of 1.2 V Switching Power vs. Input Frequency at 0.9 V 1.00E-05 1000 10000 100000 1000000 Conv. Static CMOS Compl. Pass Logic 10T Input frequency (Hz) Power (W) 1.00E-10 1.00E-11 1.00E-12 Figure 14: Plot of switching power for each adder at a supply voltage of 0.9 V 11

Switching Power vs. Input Frequency at 0.6 V 1.00E-05 1000 10000 100000 1000000 Conv. Static CMOS Compl. Pass Logic 10T Input frequency (Hz) Power (W) 1.00E-10 1.00E-11 1.00E-12 Figure 15: Plot of switching power for each adder at a supply voltage of 0.6 V Switching Power vs. Input Frequency at 0.4 V 1.00E-05 1000 10000 100000 1000000 Conv. Static CMOS Compl. Pass Logic 10T Input frequency (Hz) Power (W) 1.00E-10 1.00E-11 1.00E-12 1.00E-13 Figure 16: Plot of switching power for each adder at a supply voltage of 0.4 V 12

Power vs. Supply Voltage for Static CMOS Ripple Carry Adder 1.00E-03 Supply Voltage (V) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.00E-04 Power (W) 1.00E-05 Static Power Switching Power at 10 khz Switching Power at 50 khz Max Power Figure 17: Plot of results of a static CMOS adder using the loaded simulation at 50 khz Power vs. Supply Voltage for CPL Ripple Carry Adder Supply Voltage (V) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.00E-03 1.00E-04 1.00E-05 Power (W) 1.00E-10 1.00E-11 Static Power Switching Power at 10 khz Switching Power at 50 khz Max Power Figure 18: Plot of results of a CPL adder using the loaded simulation at 50 khz in the loaded simulation mirrors that of the unloaded simulations, as the conventional static CMOS adder performs much more worse than either the CPL or the 10T adder, which have similar performances. It should be noted that the exact specifics change, depending on the scenario. For instance, the leakage power dominates power consumption in certain cases (particularly at low voltages and at low frequecies), while switching power dominates in others. Thus, when optimizing a circuit for power consumption, the operating conditions need to be taken into account to determine what to minimize. 3.6 Effect of changing transistor sizes As can be seen in Table 2, the effect of changing the transistor sizes can change the leakage and the switching power. These results were obtained for the loaded case when As transistors are made longer, the leakage current decreases proportionally, but the switching power increases proportionally. It can be shown using elementary calculus that this is the case, the total power is minimized when the power from both the switching and leakage are equal, as if P total = K 1 L + K 2 L (8) then dp total dl = K 1 K 2 L 2 (9) 13

Power vs. Supply Voltage for 10T Ripple Carry Adder 1.00E-03 1.00E-04 1.00E-05 Supply Voltage (V) 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Static Power Switching Power at 10 khz Switching Power at 50 khz Max Power Power (W) 1.00E-10 1.00E-11 Figure 19: Plot of results of a 10T adder using the loaded simulation at 50 khz Table 2: Effect of transistor sizes (in um) on power consumption (in W) for a static CMOS adder PMOS NMOS Width Length Width Length Leakage Power Switching Power Total Power Max Power 160 120 160 120 6.35E-07 9.83E-07 1.62E-06 3.40E-04 240 120 240 120 9.31E-07 1.19E-06 2.12E-06 4.88E-04 240 180 240 180 7.80E-07 1.59E-06 2.37E-06 4.80E-04 160 180 160 180 5.45E-07 1.07E-06 1.61E-06 3.08E-04 160 120 160 180 6.19E-07 9.36E-07 1.56E-06 3.06E-04 160 180 160 120 5.39E-07 1.15E-06 1.69E-06 3.90E-04 160 200 160 200 5.53E-07 1.11E-06 1.67E-06 3.19E-04 which is equal to 0 (thus minimizing P total )when L = K 2 /K 1, thus making K 1 L and K2 L equal. From the results shown previously, this occurs with an increased transistor length of about 50%. As can be seen in the results, an increase in length by this amount reduces the total power consumption by 4%. Slightly more power can be saved by changing only the length of the PMOS. Thus, optimization using simulation results may be required to yield the best power. Making the transistors wider does not reduce overall power at all, since it increases both leakage and switching power. Wider transistors do result in lower delays, however, and so may be necessary in circuits with a low supply voltage where a few critical circuits need to operate at lower delays. The length of the transistors also effects the performance of the CPL adder. As can be seen in Figure 20, increasing the length can improve the voltage drop of the pull-up stage slightly. The exact reason for this was not ascertained, it is suspected that the slight changes to the threshold voltage caused by transistor size changes may be responsible. It should also be noted that the increase in size lowers the rise and fall time of the adder slightly. The effect of transistor sizes on power consumption can be seen in Figure 21. The results of this simulation were run using the loaded case at 50 khz. Here, it can seen that overall power consumption can be improved by making the transistors longer, as it results in a much lower leakage power. After a certain size, the switching power begins to dominate and the power increases slightly for increases in transistor sizes. As can be seen in the figure, the optimal transistor size varies depending on the operating conditions, with a length of 0.250 um being the best with a supply voltage at 0.4 V, but with a length of 1 um being the the best with a supply voltage of 1.2 V. 3.7 Layout and Extracted Simulations The basic static CMOS fuller was given a layout, using a style similar to that of a standard cell, albeit with transistors stacked on top of each other. This version used minimum size transistors to determine the minimize the area as much as possible, taking up an area 8 um long and 9 um tall, yielding a transistor density of 472000 14

Inverse sum waveforms with different CPL transistor lengths 0.4 0.35 0.3 L=480 um L=120 um L=1500 um 0.25 Voltage (V) 0.2 0.15 0.1 0.05 0 0 0.000005 0.00001 0.000015 0.00002 0.000025 0.00003 0.000035-0.05 Time (s) Figure 20: Plot of CPL adder output waveforms with varied lengths showing different maximum voltages 1.0E-06 Total power at 50 khz vs transistor length with loaded CPL adder Transistor length (um) 100 1000 10000 1.2 V 0.9 V 0.6 V 0.4 V 1.0E-07 Power (W) 1.0E-08 1.0E-09 Figure 21: Plot of CPL adder output waveforms with varied lengths showing different total power levels 15

Figure 22: Layout of static CMOS adder with minimum size transistors transistors per square millimeter. As a result making the design compact, the bottom 3 metal layers were needed for routing, making it more difficult to use this cell in a standard cell design, which normally uses the second and third layers for routing between cells. The layout can be seen in Figure 22. The design, after passing DRC and LVS tests, was extracted and simulated. The simulation results can be seen in Figure 23. When compared to the results in Figure 17, it can be seen that the extracted results are several orders of magnitude lower than that of the schematic level simulations. No physical explanation can be given, as the added parasitics should yield a slightly higher power consumption. The most likely explanation may be due to the models used in simulating the extracted view transistors may not be completely accurate, as the results shown here are of the same magnitude of those seen in Figure 3. 3.8 Monte Carlo Simulations To determine the effects that random process variations have on power consumption, the basic conventional static CMOS adder was tested under loaded conditions, with a supply voltage of 1.2 V and an input frequency of 50 khz. Under these conditions, both static and dynamic power play a key role. The results can be seen in Figure 24. While these show only the variations in current, they are proportional to the power by a factor of 1.2 V. As can be seen, the total average current draw, which has a nominal value 1.43 ua and a standard deviation of 0.27 ua, can vary by more than 30% of its nominal value, with an almost uniform distribution around the 16

Power vs. Supply Voltage for Extracted Static CMOS Ripple Carry Adder 1.00E-03 1.00E-04 1.00E-05 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Static Power Switching Power at 50 khz Max Power Supply Voltage (V) Power (W) 1.00E-10 Figure 23: Plot of results of extracted static CMOS adder using the loaded simulation at 50 khz median, forcing the designer to give the entire circuit a large margin for power consumption, if high yield is to be obtained. Similar results are observed for the leakage current draw, with a nominal value of 598 ua and a deviation of 230 ua, the maximum current, with a nominal value of 1420 ua and a deviation of 300 ua, and the deviation of the static current from state to state within the same run, with a nominal value of 62 ua and a deviation of 25 ua. 4 Conclusion and Discussion As observed in the results shown in this paper, many important characteristics that need to be considered in the design of digital circuits for low power and low frequency of operation. Most notable are the need to chose an a supply voltage based on the frequency of operation, and the need to optimize for both leakage power and switching power. Since the conventional assumption that switching power always dominates power consumption no longer holds, all design issues, such as logic families and transistor sizes, must be taken into account to reduce both forms of power consumption. This must be done in consideration with all other issues, including output capacitance, rise/fall time, area, etc. Thus the design procedure becomes more complicated, and may require the use of automated tools for good optimization, although modifications to existing tools may be necessary, as typical design flows may underestimate leakage power. 4.1 Summary of observations The comparisons between logic families show that the simple conventional static CMOS style provides a high power consumption, even at low voltage levels, and has very high delays at lower voltage levels. The latter may be remedied slightly by using wider transistors, but this increases power dissipation. A far better method is to used different logic families, although the creation of customized circuits may have better performance. It should be noted that dynamic logic families are unable to operate at the low frequencies, and so may not be appropriate for a low power design. Since the performance of each circuit is dependent on its operating conditions (supply voltage, input frequency, output load), it is impossible to choose a best logic family. It must be chosen based on how the circuit is to be used. As is evident from the results shown in this paper, the decision to focus on a conventional static CMOS adder was not appropriate for the scenarios discussed. However, in circumstances where simplicity of design, or strong outputs are important, static CMOS may become a better design decision. In addition, certain circuits, such as NAND/NOR gates are most effectively implemented using conventional static CMOS, rather than other families [9]. Thus, the exact nature of the circuit to be constructed is important to take into account when selecting a logic family. The same can be said of the transistor sizes. Although low driving strength can be tolerated, the amount of tolerance for high rise/fall times is dependent on the circuit s clock frequency, and the fan-out of the circuit. In addition, longer length transistors may be needed to reduce leakage power. Thus, there is no best transistor 17

Figure 24: Plot of results from Monte Carlo simulations, showing average supply current (top left), maximum current (top right), average leakage current (bottom left), and standard deviation of the static current in a single run (bottom right) 18

size for low voltage and low frequency of operation, it depends on how the device is to be used. Also evident from the results obtained, the need to keep leakage power consumption low is shown by the average power consumption of a simple 3-bit adder. As the average power of a static CMOS adder was in the low uw region, and a typical scavenged power voltage supply is only capable of producing power in the uw region [1], the circuit must be kept as simple as possible. The complexity constraint can be extended significantly by using low voltage logic and by switching to more efficient designs, but the power constraint will always impose significant design constraints such that all power dissipation types, both leakage and switching, must be considered. 4.2 Proposed design flow for design of sensor node controller circuitry Based on the results obtained, the following design flow is proposed for the design of logic for scavenged power wireless sensor nodes. The design algorithm must be chosen based on what purpose the node will serve. Thus, the required time resolution and accuracy, the number of nodes to by synchronized, as well as the synchronization algorithm will need to be chosen. Choosing tighter constraints will force the logic to become more complex, and may preclude the use of very low supply voltages, but may be required to reduce the power draw of the RF transmitters, or may be required due to the nature of the sensor node. Since these are typically more important in the design of a wireless sensor network, these issues must be given a higher priority than the design of the implementing logic. The logic should be implemented in an HDL, and then synthesized using a fairly complete standard cell library. The purpose of this is to break down the implementing logic into basic cells such as adders or multiplexers or simple gates. The purpose of this is to determine the circuits which draw the most power, either through a higher circuit complexity, or a high frequency of operation. These areas will become the focus of power reduction through custom optimization. Thus, it is important that the library used should be contain complex modules, as if the design were to be implemented using a simple library of NAND gates and inverters, power optimization would become very difficult. Ideally, the cell library used should be optimized for low power and low frequency of operation, and its blocks should use logic styles that are predicted to be optimal. A main clock frequency, supply voltage (or multiples thereof) should be chosen, and global power optimization schemes should be picked. Thus, any logic that is not required for 100% of the circuit operation should be shut down when not in use, assuming that doing so would be power efficient. The clock frequency should be chosen such that sufficient time is given for the most complex nodes to rise and fall, but should also meet timing resolution requirements. Since choosing different clock frequencies places different constraints on the logic, the synthesis stage may have to be repeated several times so that the logic can be optimized based on the different timing constraints. The supply voltage will need to be chosen based on the delay requirements, the clock frequency. Ideally, the lowest voltage possible should be picked that allows the circuit to function. However, other issues, such as the availability of multiple voltage power supplies and noise margin requirements, need to be considered. The logic of performance critical cells should then be optimized carefully. This should be done by taking into account the driving strength of the previous load, the capacitive load on the next stage, and the supply voltage and clock frequency being used. This optimization will involve the selection of logic styles and transistor sizes to reduce total power consumption while staying within the design constraints. Since this will change the constraints on the next and previous stages, recursive optimization may be required, which should be done using automated tools, should they be available. Finally, layout should done to reduce area, while minimizing parasitic capacitance as much as possible. Since biomedical sensor nodes may be required to operate in environments where minimum area is crucial (such implantable sensors) [2], area requirements are crucial, and thus may be a high priority design constraint. But since parasitic capacitances may increase switching power, care must be taken not to degrade performance too much by layout design choices. Throughout the design procedures, simulations of each cell and of the entire system should be performed to ensure that the power of the entire design falls within the required constraints. Since these simulation results may be dependent on models with limited amount of accuracy, care must taken to ensure that the simulation results make sense. Actual manufacturing of the chip is the only certain method to ensure the circuit is designed correctly. 19

References [1] J. A. Paradiso and T. Starner. Energy scavenging for mobile and wireless electronics. IEEE Pervasive Computing, 4(1), 2005. [2] K. A. Townsend, J. W. Haslett, T. K. K. Tsang, M. N. El-Gamal, and K. Iniewski. Recent advances and future trends in low power wireless systems for medical applications. Proceedings from the 5th international workshop on System-on-Chip for Real-time applications, 2005. [3] Kuan-Yu Lin, T. K. K. Tsang, M. Sawan, and M. N. El-Gamal. Radio-triggered solar and rf power scavenging and management for ultra low power wireless medical applications. Proceedings from the IEEE International Symposium on Circuits and Systems, 2006. [4] Peter H. R. Popplewell, Victor Karam, Atif Shamim, John Rogers, and Calvin Plett. An injection-locked 5.2 GHz SoC transceiver with on-chip antenna for self-powered RFID and medical sensor applications. IEEE Symposium on VLSI Circuits, pages 88 89, June 2007. [5] Liming He and Geng-Sheng Kuo. A novel time synchronization scheme in wireless sensor networks. IEEE 63rd Vehicular Technology Conference, 2:568 572, 2006. [6] F. Zhang and G. Y. Deng. Probabilistic time synchronization in wireless sensor networks. Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing, 2:980 984, 2005. [7] Anantha P. Chandrakasan and Robert W. Brodersen. Low Power Digital CMOS Design. Kluwer Academic Publishers, 1995. [8] Noureddine Chabini and Wayne Wolf. Synchronous sequential digital designs using retimng and supply voltage scaling. IEEE Transactions on VLSI, 13(10):1113 1126, October 2005. [9] Dimitrios Soudris, Christian Piguet, and Costas Goutis, editors. Designing CMOS Circuits for Low Power. Kluwer Academic Publishers, 2002. [10] Hung Tien Bui, Yuke Wang, and Yingtao Jian. Design and analysis of low-power 10-transistor full adders using novel XOR-XNOR gates. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 49(1):25 30, January 2002. [11] Yingtao Jiang, A. Al-Sheraidah, Yuke Wang, E. She, and Jin-Gyun Chung. A novel multiplexer-based low-power adder. IEEE Transactions on Circuits and Systems II: Express briefs, 51(7):345 348, July 2004. [12] Jin-Fa Lin, Ming-Hwa Sheu, and Yin-Tsung Hwang. Low-power and low complexity full adder design for wireless base band application. Proceedings from the International Conference on Communications, Circuits and Systems, 4:2337 2341, June 2006. [13] H. Soeleman, K. Roy, and B. Paul. Robust ultra-low power sub-threshold DTMOS logic. Proceedings of the International Symposium on Low Power Electronics and Design, pages 377 380, September 2000. [14] S. Badel and Y Leblebici. Breaking the power-delay tradeoff: Design of low-power high-speed MOS currentmode logic circuits operating with reduced supply voltage. IEEE International Symposium on Circuits and systems, pages 1871 1874, May 2007. 20