ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis

ALPS: An Automatic Layouter for Pass-Transistor Cell Synthesis Yasuhiko Sasaki Central Research Laboratory Hitachi, Ltd. Kokubunji, Tokyo, 185, Japan Kunihito Rikino Hitachi Device Engineering Kokubunji, Tokyo, 185, Japan Kazuo Yano Central Research Laboratory Hitachi, Ltd. Kokubunji, Tokyo, 185, Japan Abstract- The layout synthesis for passtransistor cells (PTCs) is different from that for CMOS cells because of the various sizes of transistors used in a PTC and the imbalance in the number of pmos and nmos transistors. This makes it difficult to apply commonly used linear transistor placement to PTC layout. Moreover, the mixed placement with CMOS cells restricts the layout freedom of PTCs. Therefore a sandwiched selector structure and pass-transistor graph search are proposed for enabling a multi-row transistor layout and an efficient search algorithm for the diffusion layer sharing problem. Pass-transistor cells generated by ALPS (automatic layouter for pass-transistor cell synthesis) are confirmed to have almost the same area density as that of manually designed cells. I. INTRODUCTION In microprocessor design, low-power and high-speed are becoming more and more important because the recent widespread use of portable equipment has accelerated the demand for low power consumption. However, high-speed operation is still an important factor. Pass-transistor logic (especially single-rail nmos pass-transistor logic) is expected to make it possible to meet both of these requirements[1,,3]. Under these circumstances, the logic synthesis for pass-transistor logic has been researched intensively and recently it has been reported that the mixed use with CMOS logic is very effective for random logic circuits[4]. To utilize pass-transistor logic in various process technologies, however, the cell library must be provided in advance. Figure 1 shows typical pass-transistor cells (PTCs) in a library, where there are various shapes of pass-transistor network. When diffusion-shared-type and gate-shared-type cells are included in the library, the number of cells exceeds one hundred even if the depth of passtransistor is limited by three. To provide so many cells for each process technology is not an easy task. In CMOS, the layout synthesis for the leaf cells has been researched for many years because CMOS logic is the main stream in logic circuits[5,6,7,8]. Especially for the combinational circuit with the dual relation between pmos and nmos, the layout synthesis using the linear (one-row) transistor placement works well. So we can apply CMOS logic to various process technologies without much difficulty. On the other hand, for pass-transistor logic, the layout has been manually designed in most cases because it has been used mainly for arithmetic macros for many years. Thus there has been no need to synthesize cell layouts for the pass-transistor logic. Though not in an arithmetic macro, the pass-transistor logic was used in a gate array architecture a few years ago. The base cell layout specific to the pass-transistor logic was proposed for this architecture and it had a multi-row transistor placement resulting in a high area density [9]. However, a gate array needs only one base cell layout and there was no need to prepare a lot of cell layouts, so this work did not really inspire the PTC layout synthesis. The PTC synthesis became a requirement after the maturity of the standard-cellbased pass-transistor logic synthesis. If the conventional approach used in the CMOS layout synthesis was also available for PTC, it would be easy for the pass-transistor logic to be used in various processes. However, the CMOS synthesis is not suitable for PTC for several reasons. To answer this situation, a new layout synthesis for PTC is inevitable for the utilization of the pass-transistor logic. I4 I7 I8 I9 I5 I6 = I4 I5 I6 I7 I4 I5 I4 I5 0 1 I6 I7 I8 I9 I4 I5 I6 I7 I4 I5 I7 I6 I4 I5 I6 3 I7 I8 I90 1 I4 I5 I6 = I7 I4 I5 I6 I7 I8 I9 0 1 3 4 5 I6 I7 I5 I4 A B C D E I4 I5 F G H I J Fig. 1. Typical pass-transistor cells (PTCs). I6

II. CHARACTERISTICS OF PTC LAYOUT There are several difficulties in PTC layout. The first is related to unique characteristic of its having various sizes of transistors as shown in Fig.. If the linear transistor placement for each pmos and nmos is used for the PTC, the cell must have much dead area. The second difficulty is the imbalance in the number of pmos and nmos transistors. In general, the number of nmos transistors is more than that of pmos transistors resulting in the dead area in the pmos region. The third problem happens when pass-transistor logic is used with CMOS logic. There is a possibility of the abutment between a pass-transistor cell and a CMOS cell, therefore the power lines (VDD and GND) must have the same position and width in both cells. In CMOS cells, the VDD line and the GND line are on the upper part and the lower part in the cell respectively in most cases, so this condition must also be kept in a PTC. Consequently the location of passtransistors in the cell is restricted. For these reasons, it has been very difficult to generate PTC layout with high area density. In this paper, we propose a sandwiched selector structure to enable a multi-row transistor layout. We also propose an original method for the transistor clustering and placement using the pass-transistor graph to solve the diffusion layer sharing problem specific to PTC. The layout system for PTC based on the new technology is ALPS (automatic layouter for pass-transistor cell synthesis). IN3 passtransistor (medium) IN4 IN5 IN IN3 IN6 IN input inverter feedback transistor output buffer (large) OUT Fig. Circuit of a pass-transistor cell with various sizes of transistors. used. These methods are conventional and well-known technologies, so they are not mentioned here. As for the programming, we developed ALPS in a hybrid way. That is, we used two object-oriented languages: perl 5 with Tk and c++. The transistor clustering gathers transistors in two steps. In the first step (first-level clustering), the transistors in the selector and the output buffer are gathered based on the auxiliary cluster information. This information is given by the designer directly with the circuit netlist and it tells which transistors belong to the selector and which belong to the output buffer. So this step is straightforward. In the second step (second-level clustering), the pass-transistors which can share the common diffusion layer are searched and gathered: namely, several first-level clusters are combined to make a second-level cluster. The pass-transistor graph is used to solve the diffusion-layer-sharing problem, which is explained in Section V. The cluster placement defines the location of the transistors in two ways: intra-cluster and inter-cluster. For the intra-cluster, a sandwiched selector structure is used, which is explained in Section IV. For the inter-cluster, the transistors are placed according to the result of the second-level clustering and the position in the signal flow. The well-boundary shifting is applied only when the sandwiched selector structure is difficult to use due to the restriction imposed by the CMOS cell model. Circuit Netlist Transistor Clustering Cluster Placement Well Bounday Shifting Routing Compaction Cell Layout Process Technology Fig. 3. PTC layout flow used by ALPS. Cluster Auxiliary Information III. ALPS SYSTEM OVERVIEW The synthesis flow in ALPS is shown in Fig. 3. It is divided into five sub-processes, that is, transistor clustering, cluster placement, well-boundary shifting, routing and compaction. The proposed technologies are related to transistor clustering, cluster placement, and well boundary shifting. For the routing, the maze running was used, and for the compaction, the constraint graph based method was IV. SANDWICHED SELECTOR STRUCTURE There are three parts in a pass-transistor cell: the output buffer, the pass-transistor selector, and the input inverter. As shown in Fig., the output buffer has large-size transistors and the pass-transistor selector has medium-size transistors. The input inverter has small-size transistors. Hence there are three sizes of transistors in a pass-transistor cell. If commonly used linear transistor placement is applied to the

PTC layout, much dead area results around the medium and small transistors as shown in Fig. 4. If these medium and small transistors were combined and placed in a good way, it would reduce the dead area. Therefore, instead of placing a pass-transistor and an input inverter independently, the passtransistor is inserted between the pmos and the nmos transistors of the input inverter as shown in Fig. 5, which we call the sandwiched selector structure. This structure is very suitable for PTC layout for the following reasons. Because the source and drain terminals of the pass-transistors are connected to internal or external signal nodes, it is desirable not to locate pass-transistors near the power lines in order to prevent any short circuit when pulling wires out from the terminals. Placing the pmos and the nmos transistors of the input inverter near the power lines is reasonable because their source terminals are connected to the power lines. Moreover, the gate of one of the pair pass-transistors and that of input inverter's pmos and nmos transistors belong to the same signal net: therefore three gate terminals can be connected with one poly-silicon line. This enables the placement of three transistors on one column and this also achieves the reduction of the wire length. VDD line output buffer (large) GND line feed-back transistor dead area input inverters pass-transistors (medium) Fig.4. Pass-transistor cell layout with the linear transistor placement. The sandwiched selector structure is the initial structure, so the sandwich-like relation in the X coordinate between the pass-transistor and the input inverter transistors will change after the compaction. However, the relation among these three transistors in the Y coordinate will be kept the same through out the cell generation in order to take advantage of the multi-row transistor placement. The connection using one poly-silicon line, not necessarily a linear poly-silicon, will also be kept through out the generation. In several cases, it is impossible to use the sandwiched selector structure. For example, it happens when the minimum size of the transistor is relatively large or when the power lines are much wider than normal. The impossibility comes from the imbalance in the number of pmos and nmos transistors. In general, the well boundary is located at the center of the CMOS cell, and this makes it difficult to place two transistors on one column in the nmos region. In this case, shifting up the well boundary is effective to give more space for the nmos region (as shown in Fig. 6) and this makes it possible to use the sandwiched selector structure. well boundary located at the center of the cell sandwiched selector structure Fig. 5. Sandwiched selector structure. input inverter pmos passtransistor input inverter nmos V. TRANSISTOR CLUSTERING AND PLACEMENT Transistor clustering consists of two steps as described in Section III: first-level clustering and second-level clustering. The first-level clustering only gathers transistors in each of selectors and output buffers to make corresponding clusters according to the auxiliary information. The second-level clustering combines several first-level clusters to take advantage of the diffusion layer sharing. The combined clusters are placed according to the representative values which show their position in the signal flow. Because the first-level clustering is straightforward, the second-level clustering and placement will be described in this section. VDD line GND line shifted wellboundary Fig. 6. Shifting of the well-boundary. In the CMOS cell model, the height is already defined, so decreasing the area means reducing cell width. CMOS layout synthesis tries to maximize the diffusion layer sharing because this will reduce the diffusion layer isolation; namely the reduction of the cell width. Maximizing the sharing is also effective for PTC synthesis, but the situation is different from the CMOS one in several points. In PTC layout, the diffusion layer sharing mainly happens between pass-

transistors and between the-same-channel type transistors of the input inverters. Assuming the sandwiched selector structure, the cell width is dominated by the diffusion layer sharing of pass-transistors. This is because a pass-transistor selector needs two nmos transistors while an input inverter needs only one pmos transistor and one nmos transistor. Thus the sum of the width of the diffusion layer islands of the pass-transistors defines the cell width. As a result, concentrating on the sharing problem only for passtransistors is enough to minimize the cell width. The pass-transistor graph is used for this purpose. It is an undirected graph where each edge corresponds to either a pass-transistor or a pair of pass-transistors in a selector as shown in Fig. 7. These graphs correspond to the circuit in Fig.. After the construction of a pass-transistor graph for the circuit, the minimum number of Eulerian paths is searched. As shown in Fig. 7, there are two ways to correspond each edge: one is corresponding each edge to a pass-transistor and the other is corresponding each edge to a pair of pass-transistors in a selector. The advantage of attaching the constraint of pairing two pass-transistors is explained in the following. First, the true pass-transistor and the false pass-transistor are placed next to each other resulting in the reduction of wire length between the passtransistor and the input inverter. Second, the number of nodes and edges in the pass-transistor graph is reduced, so the search space for finding the minimum number of Eulerian paths is narrowed. On the other hand, the disadvantage is that it may increase the number of Eulerian paths in the pass-transistor graph resulting in the increase in the number of diffusion islands. Consequently we examined the circuits in a library and counted the number of Eulerian paths in each of two graphs and found that there is little difference in most cases. Actually, in the case of Fig. 7, although the number of Eulerian paths is two in both graphs (a) and (b), the search is much easier in (b) than in (a). Thus the constraint of pairing two pass-transistors is used. N N0 (a) N1 IN3 IN4 IN5 Eulerian path # Eulerian path #1 N (b) N1 IN3 IN4 IN5 Eulerian path # Eulerian path #1 Fig. 7. Pass-transistor graph: (a) without the constraint of pairing of pass-transistors and (b) with the constraint. Next, the first level-clusters, which have pass-transistors in the same Eulerian path, are combined to make the second-level cluster. Figure 8 shows an example after the clustering. The transistor placement is performed based on the signal flow in the circuit, because it is expected to help reduce the total wire length. To quantify each cluster s position on the signal flow, the depth from the output is calculated for each transistor. The value of the depth is defind as follows. First, the number of transistors the signal pass through from each of three terminals of the transistor to the output is calculated. In this step, the terminals connected to the power line or external signals are eliminated because they influence little on the internal wire length. Second, the values for all terminals are averaged to make the depth value for the transistor. The average of the depth of all transistors in a cluster is calculated and it is set to the representative value of the cluster as shown in Fig. 8. According to the depth representatives, clusters are placed from the one with a small value on one side to the one with a large value on the other side. VI. EXPERIMENTS We applied ALPS to typical PTCs referred to Fig. 1 and compared the cell width with that of manually designed cells. The process used in ALPS is 0.5µm technology and that of manual design is 0.35µm technology, so the value of the metal pitch is used to compare the cell widths in each technology. The results are shown in Table 1. The cell width of PTCs with conventional linear transistor placement is very large because of the dead area. On the other hand, the width of the cells generated by ALPS is much smaller and it is almost the same as that of manually designed cells. In G case and I case, the results of ALPS are worse than that of manual design. The reason is as follows. In the manual design, shifting adjacent pass-transistor diffusion islands alternately in y-direction could give a better compaction for PTCs with more than four selectors and this reduced the cell width. Using this shifting, the limiting design rule could be changed from diffusion space rule to poly-diffusion space rule. Although the current version of ALPS does not have the pass-transistor shifting algorithm, introducing the algorithm will improve the layout efficiency. To analyze the area efficiency, the number of transistors per pitch were compared among CMOS cells, the conventional PTCs with linear transistor placement, and PTCs generated by ALPS. The results are shown in Table. For CMOS cells, the number varies from 1.33 to 1.6. On the other hand, for PTCs with linear transistor placement, the number varies from 0.84 to 1.0, and for the PTCs generated by ALPS, the number varies from 1.36 to 1.50. The ratios to the conventional PTCs are also shown in Table. The area

efficiency of the PTCs generated by ALPS is 40 to 80% higher than that of the conventional PTCs. Comparing the number of the PTCs generated by ALPS with that of CMOS cells, we notice that the area efficiency of the PTCs is almost the same as that of the CMOS cells. This means that the area needed for one transistor is the same even if either CMOS logic or pass-transistor logic is used. Threfore, when we need a function and want to integrate it on a silicon, the area needed is almost proportional to the number of the transistors. For example, if the function we want to integrate is NAND logic, the area is smaller in CMOS logic than in passtransistor logic because CMOS logic needs four transistors while pass-transistor logic needs seven transistors. On the other hand, if the function is EOR logic, the area is smaller in pass-transistor logic than in CMOS logic because CMOS logic needs twelve transistors while pass-transistor logic needs seven transistors. IN cluster #3 INB 3 IN3 IN4.5 depth of transistor IN N IN5 cluster # depth representative depth of net cluster #1 0.5 B cluster # 1.875.33 3 cluster #3.5 IN0 N1 1.67 1.5 IN0B IN0 N0 1 cluster #1 Fig.8. Clusters and the depth representaives from the output. Figure 9, which corresponds to the circuit in Fig., shows the layout generated by ALPS. In the layout, transistors are placed in a multi-row style and one diffusion layer island is shared among four pass-transistors. The true and false pass-transistors are placed next to each other and the wires between the pass-transistor and the input inverter are short. In terms of design time, when we designed manually it took three to four months to layout about 70 PTCs including the verification time. On the other hand, we needed two to three weeks to layout the same number of cells with ALPS. This means pass-transistor logic is usable in various process technologies because of the shortened preparation time of a cell library. VII. SUMMARY A layout synthesis method for pass-transistor cells (PTCs) is proposed. Here the sandwiched selector structure and the pass-transistor-graph-based clustering and placement are used. ALPS based on the proposed technologies 0.5 0.5 0.5 0 OUT generated high-density PTCs comparable to manually designed cells. The cells also showed about 40 to 80% higher value in the number of the transistors in a metal pitch than the PTCs with the linear transistor placement. In terms of design time, ALPS reduced it to about one fifth that of manual design. TABLE 1. Comparison of the cell width among manual design, conventional PTC and PTC generated by ALPS. Unit is metal pitch. cell type A B C D E F G H I J manual PTC (conventional) PTC (ALPS) 5 7 5 8 1 8 17 17 17 13 13 15 7 16 18 3 18 0 37 1 10 17 10 ACKNOWLEDGEMENT We thank H. Inayoshi, K Sasaki, and K. Uchiyama for their steady encouragement. We also thank T. Hattori, R. Shibata, and Y. Shimizu for their valuable advice on the layout method. REFERENCES [1] K. Yano, Y. Sasaki, K. Rikino, K. Seki, Top-down passtransistor logic design, IEEE J. Solid State Circuits, Vol. 31, p79 Apr. 1996. [] Y. Sasaki, K. Yano, S. Yamashita, H. Chikata, K. Rikino, K. Uchiyama, K. Seki, Multi-level pass-transistor logic for low power ULSIs, Dig. 1995 Symposium on Low-Power Electronics, p14, 1995. [3] K. Taki, A Survey for recent pass-transistor logic research and development Dig. DA Symposium 97 in Japan, p147, 1997. [4] S. Yamashita, K. Yano, Y. Sasaki, Y. Akita, H. Chikata, K. Rikino, K. Seki, Pass-transistor/CMOS collaborated logic; The best of both worlds, Dig. 1997 Symposium on VLSI circuits, p31, 1997. [5] T. Uehara, W. M. Vancleemput, Optimal layout of CMOS functional arrays, IEEE Transactions on Computers, vol. c-30, no. 5, p 305, May 1981. [6] C. C. Chen, S. L. Chow, The layout synthesizer: An automatic netlist-to-layout system, IEEE Design Automation Conference p3, 1989. [7] C. L. Ong, J. T. Li, C. Y. Lo, GENAC: An automatic cell synthesis tool, IEEE Design Automation Conference, p39, 1989. [8] Y. C. Hsieh, C. Y. Hwang, Y. L. Lin, Y. C. Hsu, LiB: A CMOS cell compiler, IEEE Transactions on Computer-Aided Design, Vol. 10, No. 8, p994, August 1991. [9] Y. Sasaki, M. Hiraki, K. Yano, M. Miyamoto, T. Matsuura, K. Seki, Pass-transistor-based gate-array architecture, Dig. 1995 Symposium on VLSI circuits, p13, 1995.

TABLE. Comparison of the number of transistors per pitch. CMOS # of Tr cell type in a pitch NAND 1.33 NOR 1.33 3NAND 1.50 3NOR 1.50 4NAND 1.60 4NOR 1.60 3AND 1.33 3OR 1.33 EOR 1.33 ENOR 1.33 PTC (conventional) # of Tr cell type in a pitch A 1.00 B 0.9 C 0.88 D 0.88 E 0.88 F 0.86 G 0.85 H 0.84 0.84 0.88 PTC (ALPS) # of Tr in a pitch 1.40 1.38 1.36 1.36 1.36 1.46 1.44 1.50 I 1.48 J 1.50 ratio to the conv. 1.40 1.50 1.55 1.55 1.55 1.70 1.70 1.79 1.76 1.70 Fig. 9. Layout of the circuit in Fig. generated by ALPS