CHAPTER 4 MIXED-SIGNAL DESIGN OF NEUROHARDWARE

69 CHAPTER 4 MIXED-SIGNAL DESIGN OF NEUROHARDWARE 4. SIGNIFICANCE OF MIXED-SIGNAL DESIGN Digital realization of Neurohardwares is discussed in Chapter 3, which dealt with cancer cell diagnosis system and a layer-multiplexed architecture that optimizes the area. To realize a portable and real-time application, the neurohardware should be adaptive, robust, fast and less power consuming (Schemmel et al. 2002). Digital circuitry can satisfy the first two requirements, whereas the other two are met in analog designs. Hence a mixed-signal design approach is attempted and analyzed in this chapter. Digital weight enables an adaptive, reliable and long-term storage. Analog design simplifies the circuit complexity to an appreciable extent that increases speed (Gowda et al. 993) and reduces power consumption. They are sensitive to inaccuracies in device models that lead to gain errors or offsets (Satyanarayana et al. 992). To some extent, it is overcome in ANN that has numerous weighted inputs and multiple redundant modules. Moreover, the parallelism in ANN reduces the effect of uncorrelated noise. And the differential mode of operation in neuron rejects the correlated noise (Coggins et al. 995). Above all, a suitable learning algorithm is to be selected since the weight perturbations further minimize the error. Three important contributions are presented in this chapter. Instead of using any conventional Spice tools, the Simulink tool is used to model the MOSFETs and hence the neuron, which forms the first contribution. Using this model, a voltage mode pulse-coupled neuron with Integrate and

70 Fire (I&F) operation is designed in the first architecture. A novel Successive Approximation (SA) algorithm that alters all the weights simultaneously is developed for the second architecture. It follows a simple logic that converges faster and speeds up the learning process, which forms the second contribution. Specifically, the third contribution is, solving the parity functions with lesser number of hidden nodes than what is proposed in the literature (Wilamowski et al. 2003). Any N-bit parity function needs atleast N neurons in the hidden layer. It is reduced to N/2, when a Fully Connected Neural Network (FCNN) is implemented. Though FFNN is widely accepted to be FCNN, a clear discretion is made (Wilamowski et al. 2003) in this chapter. As per this, an FCNN has input neurons, which are connected to output neurons directly, in addition to hidden neurons. Realizing XOR function, parity function and character recognition validate the neuron design. Using the same MOSFET model, the second architecture is built-on the prominent current-mode neuron. Thus the suitability of the small-signal model on different design principles is ensured. The potential of the SA algorithm and architectures are experimented to solve XOR logic function, parity generation and checking. Architectural and functional descriptions, followed by the implementation results are presented. 4.2 MOSFET MODELING AND WEIGHT LEARNING Small-signal Simulink model of MOSFET and the procedure for weight learning are presented here. Very few reported work combines Simulink and VLSI design (Reyneri et al. 2000, Nassar et al. 2006). It paves the way for hardware/software co-design (Reyneri 2003). It is innovatively attempted in this research work, by developing a small-signal model of MOSFETs. Though many other simulation tools like pspice, WinSpice etc., provide the MOSFET as a component, the approach tried here is unique in the sense, the model allows the designer to precisely alter the parameters, which

7 needs mentioning. Similarly, a simple but efficient Successive Approximation algorithm used in the second architecture draws attention. 4.2. MOSFET Modeling Neuron model uses n-channel (nmos) and p-channel (pmos) FETs. The transistors function in linear and saturated modes of operation depending upon the applied Gate-Source Voltage V gs and develop an output V ds by conducting a Drain Current I d. The small-signal equivalent circuit with its Simulink model is shown in Figure 4.. Conversion of BJT translinear circuits to subthreshold MOS translinear is possible, because of the similarities in the equations. It may be intended, since subthreshold MOS circuits operate at low power (Koosh 200). Figure 4. Small-signal Equivalent Circuit of MOSFET 4.2.2 Weight Learning The Pulse-Coupled Neuron (PCN) architecture adopts the most popular BP algorithm. It needs to know the first derivative of the neuron s Activation Function (AF). In analog circuits, the device variations and mismatches will affect the neuron AF and its derivative that make the

72 calculation cumbersome (Schemmel et al. 200). As a remedy, the Parallel Perturbation Successive Approximation (SA) algorithm is chosen for the second analog architecture. SA algorithm has the advantage of parallelism along with the reduction in the number of iterations. The reduction achieved is proportional to the number of bits, thus achieving speed through its unique mechanism. Finer selection of the weights and a minimum error are brought about by perturbation and unperturbation mechanism. Figure 4.2 shows the flowchart of the successive approximation algorithm. START Get INP,W,TRG Find OUT Is OUT TRG YES NEW W=W STOP NO Is OUT<TRG NO UNPERTURB W YES PERTURB W Figure 4.2 Flowchart of the SA Algorithm

73 Input and target are decided, weights are initialized. Output is found for the input and random weight. It is compared with the target, if found equal, the weights are accepted to be the final weight and the computation is stopped. Else if the output is lesser than the target, weights are perturbed else unperturbed. The procedure is repeated until the output matches the target. 4.3 DESIGN OF SYNAPSES AND NEURONS To explore mixed-signal design, the first architecture based on PCN is realized and discussed in Section 4.3.. It does not need a separate multiplier, adder or the threshold modules, instead a compact structure executes the neuron operations. Moreover, the neuron s synaptic coupling strength determines the synchronization effect, which is useful in pattern recognition and character recognition. The second architecture with Transconductance Multiplier is discussed in Section 4.3.2. It comprises a Transconductance multiplier as a synapse; and a threshold function realization as a neuron. The linearity of synapse is improved by increasing the weight values. 4.3. CMOS Architecture of Pulse-Coupled Neuron Biological neural networks, which communicate through pulses, use the timing of the pulses to transmit information and perform computation. Here, communication is done using frequency modulated pulse streams (Ota and Wilamowski 996). The Pulse Coupled Neuron (PCN) design adapted in the first approach of this chapter follows natural biological process that utilizes Frequency Modulation (FM). The Neuron circuit shown in Figure 4.3 is an electronic analogy of a biological neuron (Ota and Wilamowski 999).

74 The neuron has two input nodes, one at each capacitor, to control both excitatory and inhibitory signals. M and M 2 realize resistors and M 3, M 4 and M 5 are the active MOS transistors. The neuron excites when the potential due to positive input exceeds the potential due to negative input. Hence the natural phenomenon Integrate-and-Fire of biological neuron is exhibited. Simulink Model of a PCN and its symbol are shown in Figure 4.4. To realize a complete network, the neurons are to be connected together, with differing synaptic weights. Figure 4.3 PCN with Active Resistors Figure 4.4 Simulink Model and Symbol of a PCN

75 4.3.2 Architecture of Current Mode Neuron In the second approach of mixed-signal realization, synapse module and current mode neuron module (Koosh and Goodman 2002) are designed. The unique SA algorithm followed in this thesis, not only overcomes the matching errors between the current sources and synapses but also improves the linearity. The Binary Weighted Current Source (BWCS) stores the weight in a compact architecture that acts as a Digital-to-Analog Converter (DAC). (i) Synapse module The Synapse module shown in Figure 4.5 is a five-bit multiplier, which comprises a BWCS. The synapse accepts differential input voltage (V + - V - ) and generates a differential output current (I + - I - ). The sign bit represented by b 5 sets the bias and b 0 ~ b 4 represent weights. Input I 0 ~ I 4 is applied through DAC in the synapse (Coggins et al. 995). b 5 Sign bit V + - V - Diff. Input I 0 ~ I 4 Input 4Q Multiplier I s Binary Wei ghted Current Source (BWCS) b 0 ~ b 4 Weight I + Current Output I - Figure 4.5 The Synapse Module The transistors are biased in weak inversion, to generate the transfer function of the synapse as represented by equation 4..

76 k(v V ) Is tanh if b5 2 I I k(v V ) Is tanh if b5 0 2 (4.) where k is a factor that depends on the slope and the thermal voltage. The output current of the synapse when operated in the linear region is I (kis V ) 2 (4.2) In the above threshold region, the synapse output is not sigmoidal; rather, it is wide linear. Therefore, I 2 KIs V (4.3) where K is the factor in saturation current and I K[V V ] 2 D GS T (4.4) Depending on the value of LSB current (I 0 ), the synapses may either operate below or above threshold. The learning will overcome the offsets that arise due to different modes of operation. Differential input voltage (V + - V - ) related to differential output current (I + - I - ) for a varying digital weight-set is plotted in Figure 4.6. From the linearity response of synapse, it is observed that, when current moves from below threshold (small weight values) to above threshold (large weight values), the synapse linearity improves. Higher the weight value, wider the linearity range (Koosh and Goodman 2002), and the synapse nonlinearity

77 decides the network non-linearity. The BWCS contains one resistor or current source for each bit of DAC. BWCS includes binary-weighted output transistors that deliver analog output currents corresponding to binary signals. Figure 4.6 Synapse Differential Input Voltage vs Output Current (ii) Neuron Module The output of the synapse is a differential current and hence summed up easily when the nodes are connected together. In the design followed, the bias current is designed to be in the 0-9 A (na) range to reduce power consumption. The low synaptic current may not be able to drive the next layer. The neurons must improve the range and hence should provide high resistance to develop high voltage at the output of neuron. Neuron circuit performs the summation of differential input current from the synapse. After summation, it is converted into differential voltage and hence the neuron acts like a current-to-voltage converter. In the design shown in Figure 4.7, the Current Mirrors (CM) and 2 (CM2) divide the input current that is summed up at the drain of CM4. This in-turn mirrors the sum, which is split

78 equally by CM3. The difference between the current and input is formed at nodes and 2. This differential current is mirrored into the current-to-voltage conversion stage, which executes the sigmoidal nonlinearity. This output voltage drives the synapses of consecutive layers. In contrast to the conventional neurons, the sigmoidal nonlinearity is exhibited only in the synapse circuit of next layer, instead of taking place in the neuron. This restricts the inputs of first layer to be within the linear range of the synapses. The drain current equation of transistor above threshold in the triode region is denoted as, I 2K(V V V ) V D GS T DS DS 2 (4.5) ID 2K(VGS V T ) VDS forsmall VDS ; where K = µcox W/L (4.6) Vcm CM CM2 CM3 Vgain V o + V o - 2 CS CS2 CM4 Voffset Figure 4.7 Design of a Neuron Let K and K 2 are the prefactor of the cascode transistors in current source and 2 respectively. When K = K 2,

79 V o I V 2 K [Vgain V T ] 2 K I T (4.7) It yields an expression for the effective resistance of the neuron, assuming a small input current I, and V gain controls the effective gain of the neuron. R N 2K(Vgain V T ) (4.8) 4.4 IMPLEMENTATIONS OF ARCHITECTURES Two-bit XOR, parity and character recognition networks are implemented using PCN in the first approach. XOR and parity generation and checking are realized using current mode neuron in the second approach. 4.4. Implementations using PCN (i) XOR Function with Unipolar Neurons Design of two-bit (A, B) XOR using PCN (Wilamowski et al. 2003) is done without learning the weight values. Instead, the following expressions are used to realize the function. A + B 0.5 > 0 (4.9) A + B.5 > 0 (4.0) Equation (4.9) is satisfied if any one input is high that is +. Equation (4.0) is satisfied when both inputs are high. The following observations are made from the above equations: All weights from input to hidden layer is +. Biasing weights are -0.5 and -.5 respectively. Weights for the output neurons are +, - and the bias is -0.5.

80 Figure 4.8 shows XOR using PCN, where N H i hidden and output layer neurons respectively. and N O j indicate A N H - 0.5 + A, B inputs, N H, N H 2 - Hidden Neurons, N O - Output Neurons N O Output - - 0.5 B N H 2 -.5 Figure 4.8 XOR using PCN (ii) Three-bit Parity Function Parity N functions are symmetrical, as the output is generated based on the number of excited inputs, and the position of input does not influence it. Realization of parity-n function using XOR needs (N-) XOR gates. It is implemented in several layers and hence introduces significant delay. On the other hand, the direct implementation followed here, minimizes both the hardware and delay. To solve N-bit parity functions, the hidden layer needs a minimum of N neurons (Kim and Shanblatt 995b). As N increases, the network size grows and becomes complex for hardware realization. So, the fully connected networks are implemented, which needs only N/2 neurons in the hidden layer (Minnick 96). Hardware realization is further simplified, since the majority of weights equal +, which obviates weight multiplication.

8 The following equations help to solve three-bit parity function of inputs (A, B, C). A + B + C 0.5 > 0 (4.) A + B + C.5 > 0 (4.2) A + B + C 2.5 > 0 (4.3) Equation (4.) is satisfied if atleast one input is high. Equation (4.2) is met when any two inputs are high. Equation (4.3) is satisfied if all inputs are high. The following parameters are derived from the above equations: All weights from input to hidden layer is +. Biasing weights are -0.5, -.5, and - 2.5 respectively. Weights for the output neurons are +, -, + and the bias is -0.5. Figure 4.9 shows the architecture of three-bit parity function, comprising three hidden neurons and an output neuron. In A N H - 0.5 A, B, C inputs, N H ~N H 3 - Hidden N eurons, N O - Output N eurons In 2 B N H 2 - N O O utput -.5-0.5 In 3 C N H 3-2.5 Figure 4.9 Three-bit Parity Function

82 (iii) Four-bit Parity Function The architecture to solve four-bit parity functions, as reported by Wilamowski et al. (2003) is 4-7-. It is reduced to 4-2- as shown in Figure 4.0 when a fully connected network is simulated. It increases both the interconnections and the number of synapses. It does not affect the present design, since the analytical manipulations of equations fix the majority of weight values to that obviates a multiplier. In In 2 In 3 In 4 Figure 4.0 Four-bit Parity Function (iv) Character Recognition Character recognition is a trivial task for humans, but for computers it is extremely difficult. The main reason for this is the many sources of variability. In terms of recognition and feature extraction, PCNN can be very effective (Wilamowski et al. 996). 4.4.2 Implementations using Current-mode Neuron XOR network, four-bit parity generation and checking network are realized in this phase. Since the second architecture is built on the XOR gate

83 only, the details of the same are presented. General architecture of XOR function using analog modules is shown in Figure 4.. It comprises the synapse and neuron modules; synapse in-turn contains BWCS module. NEURON SYNAPSE SYNAPSE SYNAPSE W 5 BIAS W6 NEURON NEURON SYNAPSE SYNAPSE SYNAPSE SYNAPSE SYNAPSE SYNAPSE W BIAS W2 W3 BIAS W4 Figure 4. Architecture of XOR Function 4.5 RESULTS AND DISCUSSIONS 4.5. Results using PCN Architectures (i) XOR Function using PCN Figure 4.2 shows the model of XOR Using PCN. It comprises two inputs In and In2 and one output Out. Totally three neurons form the network. Bias values 0.5,.5 and 0.5 are applied through constant. XOR results are presented with two input and one output waveform in Figure 4.3. Pulse generators with duty cycle 50% and 75% are applied for Input and Input 2 respectively and XOR output indicates the output.

84 (ii) Three-bit Even Parity Function Simulink Model of 3-bit parity function is presented in Figure 4.4. It comprises three hidden neurons and one output neuron. All four neurons are connected to Constant through bias values of 0.5,.5, 2.5, and 0.5 respectively. Figure 4.5 shows the input and output waveforms. Three inputs have varying period and phase delays to include all combinations of three-bit parity. The output for some combinations is indicated by Parity3 output. (iii) Four-bit Even Parity Function Fully connected network to solve four-bit parity function with only two hidden layer neurons is shown in Figure 4.6. Though the interconnections are increased, it does not increase the circuit complexity as justified already. Bias values for two hidden neuron and one output neuron are 0.5,.5 and 0.5 respectively. Simulated results are shown in Figure 4.7. Four different pulse generators apply inputs Input ~ Input 4 and 4-bit Parity Output indicates the generated output. Figure 4.2 XOR Using PCN

85 In In 2 In 3 Figure 4.3 Results of XOR Function Figure 4.4 Three-bit Parity Function

86 In In 2 In 3 Output Figure 4.5 Results of Three-bit Parity Function Figure 4.6 Four-bit Parity Function

87 In In 2 In 3 In 4 Output Figure 4.7 Results of Four-bit Parity Function (iii) Character Recognition CMOS architecture of PCN is used to configure the character recognition problem. The alphabets A to Z are represented in 7 5 matrix and the output stage is modeled with 26 display devices. Applied input stream triggers the corresponding display to indicate. The command for Character J is as shown. Display 0 is excited to display as shown in Figure 4.8. s=[0;alphabet(:,0)]'; s = Columns through 8 0 0 0 0 0 0 0 0 0 0 0 Columns 9 through 36 0 0 0 0 0 0 0 0 0 0 0 0 0

88 Buffers Display Devices Input from workspace Character Recognition Architecture J Figure 4.8 Architecture of Character Recognition 4.5.2 Results using Current mode Neuron Architectures The CMOS realization of synapse module and neuron module are shown in Figure 4.9 and Figure 4.20 respectively. Simulink model of XOR and its internal structure are shown in Figure 4.2 and Figure 4.22 respectively. Simulation results of XOR function are shown in Figure 4.23. Internal modules namely, Parity Generator and Parity Checker are shown in Figure 4.24 and Figure 4.25 respectively. Complete architecture and simulation results are shown in Figure 4.26 and Figure 4.27 respectively.

89 Figure 4.9 Implementation of Synapse Module Figure 4.20 Simulink Model of Neuron Module

90 Figure 4.2 Simulink Model of XOR Figure 4.22 Internal Structure of XOR Circuit

9 In In 2 Output Figure 4.23 Simulation Results of XOR Function [Vo ff] XOR Vo ffse t [Vdd] [Vcm] Vd d Vc m Vout + GE NER ATED O/P [Gain] In 2 In2 [Iin] Ga i n Vin + Vin- Iin Vo ut - [Vd d] [Voff] [Vcm] Voffset Vdd Vc m XOR4 Vout + [Ga in] [Iin] Gai n Vin + Vin- Iin Vout - [Gain] Gain [Voff] Voffset XOR 0 Vc m [Vcm] [Vd d] [Ga in] 3 In3 4 In4 [Vcm] [Iin] Vdd Vc m Gai n Vin + Vin- Iin Vout + Vout - 0 Vo ff I Input 5 V DC [Voff] [Iin] [Vdd] Figure 4.24 Structure of Parity Generator

92 XOR [Voff] Voffset XOR2 [Vdd] [Vcm] [Gain] Vdd Vcm Gain Vout+ XOR4 [Vdd] [Voff] [Vcm] Voffset Vdd Vcm Vout+ CHECKED RESULT Vin+ [Voff] Voffset [Gain] Gain In 2 In2 [Iin] Vin- Iin Vout- [Vdd] [Vcm] [Gain] Vdd Vcm Gain Vout+ 5 In5 [Iin] Vin+ Vin- Iin Vout- Vin+ Vin- Vout- [Iin] Iin [Gain] Gain [Voff] [Vdd] [Vcm] [Gain] XOR Voffset Vdd Vout+ Vcm Gain 0 Vcm 0 Voff [Vcm] [Voff] [Iin] 3 In3 4 In4 [Iin] Vin+ Vin- Iin Vout- I Input 5V DC [Vdd] Figure 4.25 Structure of Parity Checker In In BIT In2 In2 BIT 2 GENERATED O/P In3 CHECKED RESULT Scope BIT 3 In3 In4 BIT 4 In4 In5 PARITY GENERATOR PARITY CHECKER Figure 4.26 Architecture of Parity Generator and Checker

93 In In 2 In 3 In 4 Output Result Figure 4.27 Simulation Results of Parity Generator and Checker 4.6 CONCLUSION As an innovative approach, the small-signal Simulink modeling of MOSFET has been executed. It opens up a new area of research for the development of hardware/software co-design by linking the Simulink tool with software codes. Its potential is verified by implementing two different architectures. The first architecture realizes an Integrate-and-fire neuron. It is realized with five transistors and two capacitors only, since it does not need a multiplier and a threshold function block. It operates on Frequency Modulation (FM) principle the natural process in biological neuron and accumulates the potential. Analytical computation of weight obviates the need for a learning algorithm. Performance of PCNN on different problems like XOR and N-bit parity functions are found to be satisfactory. For Parity function, the feedforward network with one hidden layer requires N neurons

94 in the hidden layer. It is reduced to N/2 in a fully connected network. As a proof-of-concept, a three-bit parity using unipolar neuron and four-bit parity using fully connected network is realized. The architectures are claimed to be suitable for hardware implementation, since the majority of weight values are +; that obviates the need for a multiplier. The designed PCN has synaptic coupling capability where the coupling strength determines the synchronization effect. It is useful in character recognition problem and 26 alphabets are recognized by the architecture. It exhibits a satisfactory performance accepting 0% noisy patterns. In the second architecture, mixed-signal modules with analog synapse and neurons that have digital weight storage are implemented. Successive Approximation learning without involving complex calculations is followed to improve the speed. XOR, and Parity generation and checking problems exhibit appreciable performance. Thus the efficacy of the Simulink modeled MOS architectures are verified with voltage mode PCN and current mode Transconductance synapse and Neuron. In the next Chapter, Pulsedensity Neurohardwares that exhibit the merits of mixed-signal realization are explored.