Habilitation Thesis. Neuromorphic VLSI selective attention systems: from single chip solutions to multi-chip systems

Size: px

Start display at page:

Download "Habilitation Thesis. Neuromorphic VLSI selective attention systems: from single chip solutions to multi-chip systems"

Eugene Mosley
5 years ago
Views:

1 Habilitation Thesis Neuromorphic VLSI selective attention systems: from single chip solutions to multi-chip systems Giacomo Indiveri A habilitation thesis submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY (ETH ZURICH) September 2005

2 2

3 Contents I Neuromorphic VLSI and Selective Attention 2 1 Introduction Selective attention systems Saliency-based Model of Selective Attention Neuromorphic Engineering Basic Neuromorphic Circuits The subthreshold domain The MOS field-effect transistor The differential pair The current normalizer Resistive Networks Design principles II Winner-take-all Networks 13 3 Winner-take-all network models Neural network models Non-linear Programming Formulation Current mode Winner-Take-All circuits The original current-mode WTA circuit The hysteretic WTA circuit Local Excitatory Feedback Diode-source degeneration Lateral coupling Lateral excitation Local inhibition Applications III Single-chip Attention Systems 31 5 Neuromorphic vision sensors as single chip selective attention systems A one-dimensional tracking chip System Architecture Adaptive Photoreceptor Circuit Spatial Derivative Circuit Edge-Polarity Detector Circuit Hysteretic WTA Network Spatial Position Encoding Circuit Stand Alone Visual Tracking Device Active Tracking System Roving Robots Extensions of 1-D tracking sensors A 2-D tracking sensor i

4 CONTENTS The differentiating adaptive photoreceptor The 2-D hysteretic winner-take-all circuit Peripheral I/O circuits Experimental results IV Multi-chip Attention Systems 52 6 Multi-chip models of selective attention systems The Address-Event Representation The Address-Event I/O Interface Address-Event Neuromorphic Sensors A 1-D AER selective attention chip System Overview The Excitatory and Inhibitory Synapses The Hysteretic Winner-Take-All Network The Output Inhibitory Integrate-and-Fire Neuron Testing the 1-D selective attention chip A 2-D AER selective attention chip Experimental Results Selective attention applications An active AER selective attention system The Transient imager chip The Motor Control Algorithm System response in absence of camera movements System response in presence of camera movements System response to natural stimuli Silicon neural models of winner-take-all networks A Low-Power Integrate-and-Fire Neuron Circuit Circuit operation Power dissipation characteristics Networks of Integrate and Fire neurons A competitive ring-of-neurons network V Outlook and Conclusions 88 8 Outlook Emulating Neural Circuits Commercial Application Scenarios Automotive Applications Toys and Sensory Gadgets Autonomous Mobile Systems Space Exploration Conclusions 92 ii

5 List of Figures 1.1 Schematic diagram of a saliency based model of selective attention (adapted from Itti, Koch and Niebur (1998)) Subthreshold and above threshold current response of a MOS transistor, as a function of Gate-to-Source voltage difference (a) Circuit diagram of the differential pair. The differential output current I 1 I 2 is controlled by the differential input voltage V 1 V 2 and scaled by a constant factor set by the bias voltage V b. (b) Experimental data obtained from a differential transconductance amplifier with a bias voltage set to V b = 0.6V (a) Circuit diagram of the transconductance amplifier. The output current I out = I 1 I 2 is a function proportional to a hyperbolic tangent of the differential input V 1 V 2. (b) Schematic symbol used to represent the transconductance amplifier circuit Two-input current normalizer circuit Current diffusor circuit. The current I 3, proportional to (I 2 I 1 ), diffuses from the source to the drain of M Similarities between (a) current-mode diffusor network, and (b) resistive network Network of N excitatory neurons (empty circles) projecting to one common inhibitory neuron (filled circle), which provides feedback inhibition. Small filled circles indicate inhibitory synapses and small empty circles indicate excitatory synapses. x 1...x N are external inputs; y e1...y en are the outputs of the excitatory neurons; y i is the output of the inhibitory neuron; w e1...w en are the excitatory synaptic weights of the external inputs; w l1...w ln are the excitatory weights onto the global inhibitory neuron; and w i1...w in are the inhibitory weights from the inhibitory neuron onto the excitatory neurons Simulations of a WTA network comprising 100 linear-threshold units ordered along one spatial dimension. The input (solid line) is composed of 3 Gaussians. The outputs are shown for two cases: w e j = 1,w i j = 1 and w l j = j (dashed line); w e j = 1,w i j = 1 and w l j = j (dotted line) Numerical simulation the same WTA network shown in Fig. 3.2 now with weight values w e j = 1,w i j = 1 and w l j = (a) is the input distribution of increasing amplitude. (b) Network responses to the three inputs shown in (a) Two cells of a current mode WTA circuit Responses of the two-cell WTA circuit shown in Fig (a) Voltage output (V d1 and V d2 ) versus the differential input voltage. (b) Current output (I out1 and I out2 ). The bias voltage V b = 0.7V. The small difference in the maximum output currents is due to device mismatch effects in the read-out transistors of the two cells Hysteretic WTA cell, with local excitatory feedback, lateral excitatory coupling, lateral inhibitory coupling and diode-source degeneration Response of the hwta circuit (outer hysteresis plot) superimposed to the response of the classical WTA circuit (inner central plot). The output of the classical WTA circuit was shifted vertically by a few nanoamperes for sake of clarity Diode-source degenerated WTA network output and classical WTA network output Simplified WTA circuit, used to analyze the excitatory diffusor network Effect of lateral excitatory coupling on the hwta network. (a) Output currents I all (see Fig. 4.3) measured at each cell of the network for four increasing values of V ex. The inset shows a fit of the data from cells 2 to 20 with an exponential function. (b) Output currents I all measured for three increasing values of I in. Each data set is normalized to the maximum measured current iii

6 LIST OF FIGURES 4.8 Scanned output currents of hwta network state (top solid-line), of hwta output (bottom solid-line) and of classical WTA output (bottom dotted line). (a) Input currents are applied to cell 1 (V gs,1 = 1.1V ), cell 12 (V gs,12 = 1.0V ) and cell 13 (V gs,13 = 1.0V ), lateral excitation is turned off (V ex = 0V ) and inhibition is global (V inh = 5V ). Both the basic WTA network and the hwta network select cell 1 as the winner. (b) Input signals and network bias settings are the same as in (a), but lateral excitation is turned on (V ex = 1.825V ). The basic WTA network keeps on selecting the strongest absolute input as the winner (cell 1), but the hwta network selects the region with two neighboring cells on, because it has a stronger mean activation. (c) Input currents are applied to cells 5, 12 and 16 (V gs,5 = 1.2V, V gs,12 = 1.1V, V gs,16 = 1.0V ), lateral excitation is turned off and inhibition is global (V ex = 0V, V inh = 5V ). Both the basic WTA network and the hwta network select cell 5 as the winner. (d) Input signals and network bias settings are the same as in (c), but inhibition is local (V inh = 3.35V ). If inhibition is not global, the hwta network allows multiple winners to be selected, as long as they are spatially distant (cell 16 is selected as local winner, despite cell 12 receives a stronger input current) Response of the hwta network to a single cell input (cell 13, with V gs,13 = 1.1V ) for a fixed value of V ex = 1.825V. (a) Current output for 4 different values of V inh. (b) Relative difference between output of the network with global inhibition (V inh = 5V ) and output of the network with 3 different values of V inh Block diagram of single-chip tracking system. Spatial edges are detected at the first computational stages by adaptive photoreceptors connected to transconductance amplifiers. The edge with strongest contrast is selected by a winner-take-all network and its position is encoded with a single continuous analog voltage by a position-to-voltage circuit (see Section 5.1.6) Portion of layout of the 1.2µm chip containing 7 processing columns. The size of each computational stage is evidenced on the right (a) Response of the array of adaptive photoreceptors to a black bar on a white background (upper trace) and output traces of the edge-polarity detector circuit (lower traces); (b) Output characteristic of the positionto-voltage circuit. The figure s inset contains snapshots of many output traces of the WTA network superimposed, as a stimulus was moving from left to right. The data points in the main figure represent the output of the circuit corresponding to the pixel position of the winner in the inset data (a) Response of the array of photoreceptors, with a very slow adaptation rate, to a dark bar on a white background moving from right to left with an on-chip speed of 31mm/s. The DC value of the response has been subtracted. (b) Response of array of photoreceptors with a fast adaptation rate to the same bar moving at the same speed (left pointing triangles) and at a slightly slower speed (upward pointing triangles) Circuit diagram of the current polarity detector. Positive I di f f currents are conveyed to the n-type current mirror M4,M5. Negative I di f f currents are conveyed to M6 through the the p-type current mirror M1,M6. Depending on the values of the control voltage signals V CT RL and V REF, the output current I edg represents a copy of only one of the two polarities of I di f f, or of both polarities of I di f f (see text for details) Response of the WTA network to the ON-edge of a bar moving from left to right at an on-chip speed of 31mm/s. The top trace represents the currents I sum of the WTA array while the bottom trace represents the voltage outputs of the array of adaptive photoreceptors Schematic diagram of position-to-voltage circuit. Example of three neighboring cells connected together Picture of the stand-alone tracker board. The neuromorphic sensor is on the chip beneath the lens. On the left part of the board there is an array of potentiometers used to bias the chip s control voltages. On the top there is an LED display, comprising three display bar lines with their corresponding drivers. The scale in the left part of the figures is in millimeters (a) Output of the system in response to a finger moving back and forth in front of the chip; (b) Output of the system in response to a pen moving at approximately 8000 pixels/s on a stationary light background. Note the different time scales on the abscissae Picture of tracker chip mounted on a DC motor. The output of the chip is sent to a dual-rail power amplifier which drives directly the motor (a) Setup of the active tracking system as seen from above. The angle θ represents the angular displacement produced by the DC motor, x represents the target s position in the visual space, y represents the distance of the target s projection on the retina from its center. The angular velocity θ is proportional to y. (b) Chip data measured as the system was engaged in tracking a swinging bar. The bar s position (circles) was measured using a separate (fixed) tracking board, while its velocity (solid line) was computed off-line from the discretized position data. The crosses represent the output of the active sensor used to drive the system s DC motor Tracker chip mounted on a LEGO robot performing a target exploration task. Using very little CPU power, this robot is able to simultaneously explore (make random body/head movements), attend (orient the sensor toward high-contrast moving edges) and pursuit (drive towards the target) iv

7 LIST OF FIGURES 5.13 (a) Koala robot with neuromorphic sensor mounted on its front. (b) Positions of Koala following a line, sampled at intervals of 0.25 seconds for a period of 37.5 seconds, in which the robot completed 4 loops. The features (white squares) were obtained by tracking a dark cross drawn on the white top of Koala (a) Koala robot with neuromorphic sensor mounted on its front and a white sheet of paper with crosses attached on its top, seen from above. (b) Positions of Koala following a white line on a light-blue carpet floor, sampled at intervals of one second over a period of approximately 3 minutes. The features (white squares) were obtained by tracking the bars appearing on the top part of Koala (see text for explanation) Two-dimensional tracker chip architecture Differentiating adaptive photoreceptor circuit Hysteretic WTA circuit with spatial coupling Two-input pass-transistor demultiplexer. The voltage on V c is routed either to V P2V (if V sel is high) or to V ENC (if V sel is low) Output of the analog P2V circuits in response to a target moving from the right top corner to the bottom central part of the sensor s field of view. The bottom trace (V x ) reports the x position of the target. The top trace (V y ), offset in the plot by 5V for sake of clarity, reports the y position of the target. The inset shows V y versus V x Output of the analog P2V circuits in response to a target moving from the bottom left corner to the top right one, on to the top left, to the bottom right, and back to the bottom left corner Output of least significant bit (bottom trace) and second-least significant bit (top trace, displaced by 6V) of the X address in response to a target moving from right to left Histogram of the addresses measured from the sensor s address encoders in response to a target moving on a circular trajectory Schematic diagram of an AER chip to chip communication example. As soon as a sending node on the source chip generates an event its address is written on the Address-Event Bus. The destination chip decodes the address-events as they arrive and routes them to the corresponding receiving nodes Image captured from a silicon designed by Jörg Kramer, (at the Institute of Neuroinformatics, Zurich), while the subject was moving Biologically equivalent architecture of selective attention model. Input spike trains arrive from the bottom onto excitatory synapses. The populations of cells in the middle part of the figure are modeled by a hysteretic WTA network with local lateral connectivity. Inhibitory neurons, in the top part of the figure, locally inhibit the populations of excitatory cells by projecting their activity to the inhibitory synapses in the bottom part of the figure (a) Excitatory synapse circuit. Input spikes are applied to M1, and transistor M4 outputs the integrated excitatory current I ex. (b) Inhibitory synapse circuit. Spikes from the local output neurons are integrated into an inhibitory current I inh (a) Response of an excitatory synapse to single spikes, for different values of the synaptic strength V w (with V e = 4.60V). (b) Normalized response to single spikes for different time constant settings V e (with V w = 1.150V). (c) Response of an excitatory synapse to a 50Hz spike train for increasing values of V w (0.6V, 0.625V, 0.65V and 0.7V from bottom to top trace respectively). (d) Response of excitatory synapse to spike trains of increasing rate for V w = 0.65V and V e = 4.6V (12Hz, 25Hz, 50Hz and 100Hz from bottom to top trace respectively) Schematic diagram of the WTA network. Examples of three neighboring cells connected together Net WTA input current I net values at each pixel location for a static control input. Pixels 5 through 13 have input currents slightly lower than pixel 21. All other pixels receive weaker input stimuli. (a) In the absence of lateral coupling (V ex = 0V ) the network selects pixel 21 as the winner. (b) In the presence of lateral coupling (V ex = 1.5V ) the network smooths spatially the input distribution and selects pixel 9 as the winner Circuit diagram of the local inhibitory integrate-and-fire neuron Integrate-and-fire neuron characteristics. (a) Membrane voltage for two different DC injection current values (set by the control voltage V in j ). (b) Membrane voltage for two different refractory period settings. (c) Firing rates of the neuron as a function of current-injection control voltage V in j plotted on a linear scale. (d) Firing rates of the neuron as a function of V in j plotted on a log scale (the injection current increases exponentially with V in j ) Scanned net input currents to the WTA network I net (top traces) and inhibitory currents I inh (bottom traces) measured, by means of an off-chip current sense-amplifier, at every pixel location. (a) Response of the system to the onset of the stimulation, with a display persistence setting of 3s (b) Response of the system after a few seconds of stimulation, with a display persistence setting of 250ms v

8 LIST OF FIGURES 6.11 (a) Raster plots of neuron 10 in response to the control stimulus (see text for explanation). (b) Raster plots of neuron 22. (c) Peri-stimulus time histogram of neurons 10 (solid line) and of neuron 22 (dashed line). (d) Inter-spike interval distribution of neurons 10 (front bars) and 22 (rear bars) Test image with salient features. (a) Original color figure. (b) Corresponding saliency map. (c) Input spike frequencies obtained from the injective mapping describe in the text (upper trace) and distribution of the output neuron s spike counts recorded over a period of 3 seconds (lower histogram). (d) Position of the attended pixel recorded over time Mapping of the 1D data of Fig. 6.12(d) onto the re-sampled 2D saliency map data of Fig. 6.12(b). Shifts along the horizontal axis are due to the selective attention chip s response. Shifts along the vertical axis are introduced artificially via the injective mapping described in the text Block diagram of a basic cell of the 8 8 selective attention architecture Synaptic circuits. (a) Input excitatory synapse. Address events are converted into pulses by the circuit in the dashed box. Pulses are integrated into the excitatory current I ex by the p-type current-mirror integrator. The integrator s gain and time constant are modulated by the control voltages V w and V τe ; (b) Inhibitory synapse. On-chip pulses (V ior ) are integrated into the inhibitory current I ior by the n-type current-mirror integrator. The time constant and gain of this integrator are modulated by the voltages V q and V τi Hysteretic WTA cell. Input currents are sourced into node V in and 3 copies of the output current are sent to the two P2V circuits and to the I&F neuron Local output integrate and fire neuron. When the membrane voltage V mem increases above V thr the output voltage V out is driven to V dd and an address event is generated. The transistors in the dashed box are part of the output AER circuitry (a) Output of the P2V circuits of the selective attention architecture measured over a period of 300ms, in response to a test stimulus exciting four corners of the input array at a rate of 30Hz and a central cell at a rate of 50Hz; (b) Histogram of the chip s output address-events, captured over a period of 13.42s in response to the same input stimulus Event histograms of addresses generated by the workstation sent to the chip (a) and output addresses generated by the selective attention chip (b), (c), and (d). All chip parameters are kept constant throughout the plots except for the bias parameter V τi. The histogram in (b) was obtained with V τi = 227mV, the one in (c) with V τi = 207mV, and the one in (d) with V τi = 193mV Output address events of the selective attention chip biased with V τi = 207mV. The 2D address space of the chip s architecture is mapped into the plot s 1D ordinate vector by labeling each address successively, row by row Image representations of saliency maps. (a) Saliency map corresponding to the input stimulus used for the experiment of Fig. 6.18; (b) Saliency map used for the experiment of Fig. 6.19; (c) Fictitious example resembling a realistic saliency map (a) Block diagram of the sensory-motor selective attention model. The figure shows the basic computational blocks used, as well as the corresponding biological analogues and their function. (b) Schematic diagram of the active vision setup: The neuromorphic imager, mounted on a pan-tilt unit, transmits its output to the selective attention chip. The latter sends the results of its computations to a host computer which uses this data to drive the pan-tilt unit s motors Selective attention active vision system. The selective attention chip processes sensory data coming from an AER imaging sensor and transmits its output to a workstation that drives the pan-tilt unit on which the sensor is mounted. A standard CCD camera is mounted next to the AER sensor to visualize the sensor s filed of view Block diagram of irradiance transient detector with event-based communication interface Image captured from the CCD camera mounted next to the transient imager. The outer frame shown in the image corresponds to the field of view of the transient imager, whereas the inner frame is drawn to evidence the transient imager s central region. The cross to the bottom right of the image center represents the location of the focus of attention currently computed by the selective attention chip (a) Histogram of events generated by the transient imager pixels in response to two diffused flashing LEDs. The LED stimulating the region around pixel (5,9) has higher contrast than the other LED. (b) Histogram of events generated by the selective attention chip in response to the events generated by the transient imager chip Raster plot of the activity of the neurons of both transient imager chip (dots) and selective attention chip (circles) in response to the flashing LEDs. To plot the data from both chips using an address space with the same resolution, we sub-sampled the addresses of the transient imager chip. The LEDs flashed approximately at 0.25s, 1.25s and 2.25s vi

9 LIST OF FIGURES 6.28 Sequence of images showing the selection of a salient stimulus prior to and after a saccadic eye movement. (a) The system is attending the top LED, already centered on the central part of the imaging array. (b) The system selects the bottom LED, outside the central region of the imager. (c) The system performed a saccade toward the bottom LED, and is currently attending it Raster plot of the activity of the neurons of the transient imager chip (dots) and of the selective attention chip (circles) in response to two flashing LEDs. The focus of attention shifts from a central region of the imaging array to a peripheral one (see circles at 2s t < 6s). Consequently, the system makes a camera movement, at the time indicated by the vertical arrow, and re-centers the attended location Output of the P2V circuits of the selective attention chip (see Fig. 6.14), representing the scanpath of the focus of attention, switching back and forth between the fluttering fingers of both of the experimenter s hands. The scanpath data is superimposed onto a snapshot taken from the CCD camera during the experiment Saccadic eye movements in response to moving fingers. (a) CCD camera snapshot taken before the saccadic eye movement (the focus of attention has just switched from one hand to the other). (b) CCD camera snapshot taken just after the the saccadic eye movement (the focus of attention and the salient stimulus are now in the center of the imaging array) Circuit diagram of the I&F neuron (a) Measured data (circles) representing an action potential generated for a constant input current I in j with spike-frequency adaptation and refractory period mechanisms activated. The data is fitted with the analytical model of eq. (7.5) (solid line). (b) Circuit s f -I curves (firing rate versus input current I in j ) for different refractory period settings (a) Mean power dissipation of the neuron as a function of V s f for an average output firing rate of about 100Hz and typical operating condition bias settings (see text for details) (a) Raster plots showing the activity of an AER array of 32 I&F neurons in response to a constant input current, for four decreasing values of the refractory period (clockwise from the top left quadrant). (b) Mean response of all neurons in the array to increasing values of a global input current, for the same refractory period settings. The error bars represent the responses standard deviation throughout the array Architecture of the integrate-and-fire ring of neurons chip. Empty circle represent excitatory neurons. The filled circle represents the global inhibitory neuron. The gray line symbolizes inhibitory connections, from the inhibitory neuron to all excitatory neurons. Black arrows denote excitatory connections (a) Raster plot of input spike trains (small dots) superimposed onto the output spike trains (empty circles), with global inhibitory feedback turned off (the inhibitory-to-excitatory synaptic weights are set to zero). (b) Histograms of input spike distribution (top trace), output spike distribution of competitive network with global inhibition but no lateral excitation (middle trace) and output spike distribution of competitive network with global inhibition and with lateral excitation (bottom trace) (a) Arrangement of input signals used to stimulate a set of neurons of the network. Each box represents a Poisson distributed spike train source. (b) Raster plots representing input spikes (small dots), output spikes (empty circles), and coincident (within 1ms time window) output spikes (filled circles) for the three network configurations: Without global inhibition (top raster plot), with global inhibition (middle raster plot) and with global inhibition and local excitation (bottom raster plot) Pairwise cross correlations averaged over neuron pairs 9-10, 9-11 and The data of the top trace were computed from the response of the network in the absence of global inhibition. The middle trace corresponds to the case with global inhibition and the bottom trace corresponds to the case with both global inhibition and local excitation turned on vii

10 List of Tables 5.1 Characteristics of the visual tracking sensor Parameters used to fit the data of Fig. 7.2(a) viii

11 1 LIST OF TABLES

12 Part I Neuromorphic VLSI and Selective Attention 2

13 Chapter 1 Introduction Biological organisms perform complex selection operations continuously and effortlessly. These operations allow them to quickly determine, for example, the motor actions to take, in response to combinations of external stimuli and internal states; or to pay attention to subsets of sensory inputs, suppressing non salient ones; or to plan complex action sequences, serially choosing elementary behaviors among different alternatives. In essence these selection operations allow biological organisms to survive. One of the main computational expedients used by nature to perform these selection operations is implemented by Winner-Take-All (WTA) (WTA) networks. These are networks of competing elements (cells, neurons, populations of neurons or neural circuits) that sequentially select the elements receiving the strongest input signals and suppress the remaining ones. In this thesis we will argue that neuromorphic circuits are an optimal medium for constructing WTA networks and for implementing efficient hardware models of selective attention systems. To validate our argument, we will describe properties of neuromorphic circuits, and analyze in detail the characteristics of current-mode WTA circuits; we will then show examples of single-chip vision systems that use WTA networks to select and track the position of salient features, and of multi-chip systems that implement more elaborate models of selective attention mechanisms, and that are not restricted to just the visual sensory modality. Some of these examples will evidence how the biological inspiration and the neuromorphic technology used can lead to the design of devices with high potential for commercial exploitation. Other examples will evidence how the synthetic approach followed, and the constraints imposed by the analog VLSI circuits, can aid basic research, e.g. by limiting the space of possible models and providing possibles explanations on why biological organisms implement selective attention mechanisms with specific architectures. 1.1 Selective attention systems Processing detailed sensory information is a computationally demanding task for both biological and artificial systems. If the amount of information provided by the sensors exceeds the parallel processing capabilities of the system, as is usually the case with both biological and artificial vision systems, an effective strategy is to select subregions of the input and process them, shifting from one subregion to another, in a serial fashion [25, 90]. In biology this strategy, commonly referred to as selective attention, is used by a wide variety of systems, from insects [5, 99] to humans [18, 62]. In primates selective attention plays a major role in determining where to center the high-resolution central foveal region of the retina [85], by biasing the planning and production of saccadic eye movements [2, 43]. In general though, visual regions being attended by the focus of attention do not always correspond to the regions being analyzed by the fovea. Recent findings even suggest that attention can be used to keep track of multiple targets of interest simultaneously, if the visual task requires a low attentional cost [14, 18]. Psychophysical evidence indicates that visual attention mechanisms have two main types of dynamics: a transient, rapid, bottom-up, task independent one, and a slower, sustained one, which acts under voluntary control [93]. In this thesis we will focus on implementations of bottom-up models of selective attention. We will show how it is possible to implement these models using VLSI technology, and analog neuromorphic circuits, such as winner-take-all networks and silicon integrate-and-fire neurons Saliency-based Model of Selective Attention Several computational models of selective attention have been proposed [2, 90, 96, 98, 114]. Some of these models are based on the concept of dynamic routing [98], by which salient regions are selected by dynamic modification of network parameters (such as neural connection patterns) under both top-down and bottom-up influences. Some other models, 3

14 CHAPTER 1. INTRODUCTION Attended location Inhibition of return WTA network Saliency map Feature combination Feature maps Center-surround differences and normalization orientations intensity colors Linear filtering Input image Figure 1.1: Schematic diagram of a saliency based model of selective attention (adapted from Itti, Koch and Niebur (1998)). based on similar ideas, promote the concept of selective tuning [114]. In these models, attention optimizes the selection procedure by selectively tuning the properties of a top-down hierarchy of winner-take-all processes embedded within the visual processing pyramid. The types of models we seek to implement in hardware are the one based on the concept of the saliency map, originally put forth by Koch and Ullman [65]. These biologically plausible types of models account for many of the observed behaviors in neurophysiological and psychophysical experiments and have led to several software implementations applied to machine vision and robotic tasks [1, 12, 59, 113]. They are especially appealing to us because they lend themselves nicely to hardware implementations. A diagram describing the main processing stages of such type of model is shown in Fig A set of topographic feature maps is extracted from the visual input. All feature maps are normalized and combined into a master saliency map, which topographically codes for local saliency over the entire visual scene. Different spatial locations then compete for largest saliency, based on how much they stand out from their surroundings. A winner-take-all (WTA) circuit selects this most salient location as the focus of attention. The WTA circuit is endowed with internal dynamics, which generate the shifts in attention based on a mechanism named inhibition of return (IOR) (a key feature of many selective attention systems) [36]. As saliency-based selective attention models are highly modular, multi-chip neuromorphic systems that implement them can scale up to arbitrarily complex selective attention systems. 1.2 Neuromorphic Engineering Neural network theories, used as an additional methodology for solving pattern recognition and constraint minimization problems, have emerged in recent years as a practical technology and represent a well established research field. Neural 4

15 1.2. NEUROMORPHIC ENGINEERING network algorithms, the type of non-linearities present in the transfer functions of their computational elements and the architectures that implement them are often loosely inspired by biological systems. An emerging new technology which tries to establish even closer links to biology, capitalizing on the advantages of interdisciplinary research, is the one of neuromorphic engineering. Specifically, neuromorphic engineering applies the computational principles discovered in biological organisms to those tasks that biological systems perform easily, but which have proved difficult to do using traditional engineering techniques. For example, biological neural systems for sensory perception and motor control are compact, energy efficient and robust to noise both in the input data and in the internal state variables. They typically have a relatively simple organization, consisting of arrays of similar processing elements that interact in nonlinear ways mainly with nearest neighbors. Neuromorphic systems, rather than implementing abstract neural networks remotely related to these types of systems, are hardware devices, containing analog circuits, that attempt to model in detail, (up to the device-physics level) their properties and the physical processes in them embedded that underlie neural computation [31]. The closest medium, widely accessible to the research community, that allows researchers to implement detailed hardware models of neural systems is silicon. Using analog, continuous time circuits implemented with a standard CMOS VLSI technology it is possible to build low-cost, compact implementations of such models. The greatest successes of neuromorphic analog VLSI (avlsi) to date have been in the emulation of peripheral sensory transduction: Silicon retinas and silicon cochleas have been successfully implemented and used in a wide variety of applications [10, 22, 34, 67, 76, 79]. In these analog devices, as in their biological counterparts, it is the structure of the architecture, the morphology of the system, that determines their functionality. This constraint is added to the ones that come from the fact that neuromorphic systems have to cope with issues such as minimizing power consumption, maximizing robustness to noise and optimizing reliability in their performance, while interacting in real-time with the environment. It is by trying to satisfy these very constraints that researchers are hoping to obtain more insight into the workings of biological neural systems. One could suggest to use software simulations to validate models of biological neural systems. Besides not being able to implement real-time, compact, cheap and low power systems using traditional digital technology (as compared to neuromorphic systems), there are also considerations on the computational load of digital simulators to take into account. Detailed simulations of neural processes are among the most computationally intensive and (realistic) simulations of large populations of neurons still result prohibitive, despite the continuous improvements of digital technology. Furthermore, to obtain realistic simulations, one should attempt to model in software also the dynamics of the system with which the neural model interacts, the noise present in the environment, and the constraint that might arise from power consumption minimization. The systems of equations to solve arising from these additional constraints would increase the computational load of the digital system even more. Neuromorphic engineering is thus mainly concerned with hardware correlates of biological systems. Yet, the nature of the research carried out by neuromorphic engineers is twofold: on one side there is the desire to learn more about the computational properties of the brain by tackling the same problems that nature and evolution solved in the course of 600 million years, on the other there is the desire to design and develop efficient neuromorphic engineered systems that can be used to solve real world problems and that can eventually lead to successful industrial applications. 5

16 Chapter 2 Basic Neuromorphic Circuits In this Section we will introduce some basic concepts of analog circuit design necessary for understanding the circuits and systems presented in the subsequent parts of the thesis. A more thorough description of analog VLSI circuits and principles can be found in the textbook that we recently published [75]. 2.1 The subthreshold domain Perhaps the most elementary computational element of a biological neural structure is the neural cell s membrane. The nerve membrane electrically separates the neuron s interior from the extracellular fluid. It is a very stable structure that behaves as a perfect insulator. Current flow through the membrane is mediated by special ion channels (conductances) which can behave as passive or active devices. In the passive case, ion channels selectively allow ions to flow through the membrane by the process of diffusion. In electronics, it is possible to implement the same physical process by using MOS field-effect transistor devices, operated in the subthreshold region (also referred to as weak inversion) [75, 81, 118]. 2.2 The MOS field-effect transistor One of the most common devices used in today s integrated circuit technology is the Metal-Oxide-Silicon Field Effect Transistor (MOSFET) 1. The currents in this device comprise either positively-charged holes or negatively-charged electrons. MOSFETs are used typically as digital elements (either fully open or closed). Only a small percentage of VLSI devices uses them in the analog domain, and there are even fewer cases in which MOSFETs are used in the subthreshold domain. As neuromorphic circuits are among those few examples, here we concentrate on the current-voltage characteristics of MOSFETs in the subthreshold domain. MOS transistors operate in the subthreshold region of operation when their gate-to-source voltage is below the transistor threshold voltage. This mode of operation of a transistor has been largely ignored by the analog/digital circuit design community, mainly because the currents that flow through the source-drain terminals of the device under these conditions are extremely low (typically of the order of nanoamperes). In subthreshold, the drain current of the transistors is related to the gate-to-source voltage by an exponential relationship (see Fig. 2.1). Specifically, for an n-type MOS transistor, the subthreshold current is given by: I out = W ( L I (1 κ) V ( BS U 0e T )e κ V ) GS U T ( ( V DS 1 e ) U T + V ) DS V 0 where W and L are the width and length of the transistor, I 0 is the zero bias current, κ is the subthreshold slope coefficient, U T is the thermal voltage, V 0 is the Early voltage and V GS, V DS and V BS are the gate-to-source, drain-to-source and bulk-tosource voltages respectively. Typical values for devices with W = L = 4µm fabricated with standard 2µm technology are: I 0 = A, κ = 0.65, V 0 = 15.0 V. If the transistor operates in saturation region (i.e. if V DS 4U T ) and if V 0 V DS the above equation can be simplified to yield: I out = W ( ) κvg V L I S U 0e T (2.1) (2.2) 1 The field-effect transistor structure was first described in a series of patents by J. Lilienfeld that were granted in the early 1930s. The MOSFET is the fieldeffect transistor type that is almost exclusively used today. Historically, other field-effect transistor types were invented including the junction field-effect transistor (JFET), and the metal-semiconductor field-effect transistor (MESFET). 6

17 2.3. THE DIFFERENTIAL PAIR subthreshold 10-5 Ids (A) above threshold Vgs (V) Figure 2.1: Subthreshold and above threshold current response of a MOS transistor, as a function of Gate-to-Source voltage difference. The diffusion of electrons through the transistor channel is mediated by the gate-to-source voltage difference. As the input/output characteristic of a subthreshold transistor is an exponential function, circuits containing these devices can implement the base functions required to model biological processes: logarithms and exponentials. 2.3 The differential pair On of the most common tricks used both by biological and engineered devices for computing measurements insensitive to absolute reference values and robust to noise, is the one of using difference signals. The differential pair is a compact circuit comprising only three transistors that is widely used in many neuromorphic systems (see Fig. 2.2). It has the desirable property of accepting a differential voltage as input and providing in output a differential current with extremely useful characteristics: if the bias transistor is operated in the subthreshold domain and if we assume that all the transistors are in saturation (so that equation 2.2 holds), the transfer function of the circuit is: I 1 I 2 = I b tanh κ(v 1 V 2 ) 2U T (2.3) The beauty of this transfer function lies in the properties of the hyperbolic tangent present in it: it passes through the origin with unity slope, it behaves in a linear fashion for small differential inputs and it saturates smoothly for large differential inputs. To provide in output the differential term I 1 I 2 using a single terminal, one needs simply to connect a current mirror of complementary type to the differential pair output terminals (e.g. a current mirror of p-type MOS transistors in the case of Fig. 2.2). The circuit thus obtained would then be the glorious differential transconductance amplifier [75, 81] (see Fig.2.3). 7

18 CHAPTER 2. BASIC NEUROMORPHIC CIRCUITS I1 I2 (I 1 I 2 ) (na) V1 V2 0.5 Vb Ib (a) (V 1 V 2 ) (mv) (b) Figure 2.2: (a) Circuit diagram of the differential pair. The differential output current I 1 I 2 is controlled by the differential input voltage V 1 V 2 and scaled by a constant factor set by the bias voltage V b. (b) Experimental data obtained from a differential transconductance amplifier with a bias voltage set to V b = 0.6V. Vdd Vdd M 4 M 5 I 1 I 2 V 1 V 2 M 1 M 2 V s I out V out V 2 V 1 + I out V out V b I b (b) V b M 3 (a) Figure 2.3: (a) Circuit diagram of the transconductance amplifier. The output current I out = I 1 I 2 is a function proportional to a hyperbolic tangent of the differential input V 1 V 2. (b) Schematic symbol used to represent the transconductance amplifier circuit. 2.4 The current normalizer During the last 40 years, the vast majority of analog circuits have used voltages to represent and process relevant signals. However, recently, current-mode signal processing circuits, in which signals and state variables are represented by currents rather than voltages [111], have shown advantages over their voltage-mode counterparts. Their advantages include higher bandwidth, higher dynamic range, and they are more amenable to lower power supplies. A current-mode circuit that will form the basis of the more complex circuits described throughout this thesis is the current 8

19 2.5. RESISTIVE NETWORKS I in1 I in2 I out1 I out2 V d1 V d2 M 1 M 3 M 2 M 4 V c V c I b V b Figure 2.4: Two-input current normalizer circuit. normalizer (see Fig. 2.4). This circuit, based on the Gilbert normalizer, receives analog continuous time input currents and provides normalized output currents. It is a modular circuit that can be extended to an arbitrary number of cells by simply connecting additional current mirrors to the common node V c. If the input currents are subthreshold, the circuit is characterized by the equations I ini = I 0 e κ V d i U T I outi = I 0 e κ V d i U T Vc U T (2.4) where i is the index of the i th cell of the circuit, U T is the thermal voltage and κ is the subthreshold slope coefficient. By applying Kirchhoff s current law to the common node V c we obtain N I outi = I b (2.5) i=1 where I b is a constant current set by the control voltage V b. We use this constraint to solve Eq. 2.4 for V c, and to derive the dependence of the output current on the input currents: I ini I outi = I b. (2.6) j I in j The output current of each cell I outi is directly proportional to its input current (with a proportionality constant I b ), but scaled by the sum of all the input currents j I in j. 2.5 Resistive Networks Conventional methods of implementing resistors in VLSI technology include using complex circuits such as the transconductance amplifier of Section 2.3. These methods have the disadvantage of emulating linear resistors for only a very limited 9

20 CHAPTER 2. BASIC NEUROMORPHIC CIRCUITS I 1 M 1 V 3 I 2 M 2 V 1 V 2 I in1 M 3 I 3 I in2 Figure 2.5: Current diffusor circuit. The current I 3, proportional to (I 2 I 1 ), diffuses from the source to the drain of M 3. range of voltages, and of resistance values. If we consider currents, and not voltages, to represent input and output signals of MOSFETs, then we can implement resistive networks using single transistors instead of resistors. In this configuration, the transistor is linear for a wide range of current values. Furthermore, if the transistor is operated in the subthreshold regime, then the resistance (or conductance) can be varied by changing its gate voltage. A conventional conductance, G, is defined by the relationship I ab = G (V a V b ) where I ab is the current flowing from terminal a to terminal b, and V a,v b the voltages at the corresponding terminals. If the two terminals a and b are the source and the drain of a subthreshold nfet, the current I ab can be expressed by the usual transistor relationship: I ab = I 0 e κ Vg U Va T U T I 0 e κ Vg U V b T U T (2.7) where V g is the transistor s gate voltage. If we define the pseudo-voltage [119] V = V 0 e U V T (where V 0 is an arbitrary scaling voltage), and the pseudo-conductance G = I 0 V e κ Vg U T, then we can write 0 I ab = G (Va Vb ) (2.8) where the value of pseudo-conductance G depends exponentially on the transistor s gate voltage V g. Using Eq. (2.8) we can map any resistive network into an equivalent transistor network: Each resistor R i of the resistive network can be replaced by a single transistor M i, provided that all the transistors share the same substrate (that is, they are all either nfets or pfets). If the gate voltages V gi of all the transistors are equal, then the transistor network is linear with respect to current [118]. This linear behavior holds for the entire range of weak inversion, which may be as much as 6 orders of magnitude in transistor current. Because all V gi must be the same, the values of the individual conductances can only be adjusted by changing the W/L ratio (which modulates I 0 ) of each transistor. An alternative interpretation of the mapping between resistive and transistor networks uses the concept of a current diffusor [8] illustrated in Figure 2.5. The currents I in1 and I in2 are inputs to the circuit. Assuming that the three nfets are identical (that is, their I 0 and κ parameters are equal), and solving the circuit equations, we obtain: I 3 = I 0 e κ V 3 U T I 2 I 1 I 0 e κ V 2 U T I 0 e κ V 1 U T. (2.9) If V 1 = V 2 = V re f, this relationship can be simplified to yield I 3 = e κ U T (V 3 V re f ) (I2 I 1 ). (2.10) The diffusion current I 3 through M 3 is proportional to (I 2 I 1 ). The proportionality factor can be modulated by either V re f or V 3. The current-mode diffusor network (Fig. 2.6(a)) is composed of multiple instances of the circuit of Fig In this network, current injected at a node j diffuses laterally and decays with distance [81]. Consequently the network acts as spatial low-pass filter; and because the network is linear, the effects of currents injected at different nodes superimpose. 10

21 2.5. RESISTIVE NETWORKS V R I outj-1 V R I outj V R I outj+1 V G V G V G V G V j-1 I j-1 V j I j V j+1 I j+1 I inj-1 I inj I inj+1 (a) I inj-1 I inj I inj+1 G V j-1 I j-1 G V j I j G V j+1 I j+1 G I outj-1 I outj I outj+1 R R R (b) Figure 2.6: Similarities between (a) current-mode diffusor network, and (b) resistive network. The diffusor network (Fig. 2.6(a)) has the same network response as the resistive network in Fig. 2.6(b). This equivalence can be demonstrated by comparing the transfer functions of the two circuits. Applying Kirchhoff s current law at node V j of Fig. 2.6(a): I out j (I j I j 1 ) = I in j. (2.11) Using Eq. (2.10), we can express I j and I j 1 in terms of the output currents: Substituting these two relationships in Eq yields I j 1 = e κ U T (V G V R ) (Iout j I out j 1 ) (2.12) I j = e κ U T (V G V R ) (Iout j+1 I out j ). (2.13) I out j I in j = e κ U T (V G V R ) (Iout j+1 2I out j + I out j 1 ) (2.14) Similarly, we can apply Kirchhoff s current law at node V j of Fig. 2.6(b): I out j (I j 1 I j ) = I in j. (2.15) Because I j = G(V j V j+1 ) and I out j = V j /R, I j can be expressed as a function of I out j and of I out j+1 : I j = 1 RG (I out j I out j+1 ). (2.16) 11

22 CHAPTER 2. BASIC NEUROMORPHIC CIRCUITS Combining this equation with Eq yields I out j I in j = 1 RG (I out j+1 2I out j + I out j 1 ). (2.17) The term (I out j+1 2I out j + I out j 1 ) in Eq is the discrete approximation of the d2 dx 2 operator. Both circuits of Fig. 2.6 approximate the diffusion equation that characterizes the properties of a continuous resistive sheet [81]: λ 2 d2 dx 2 V out(x) = V out (x) V in (x) (2.18) where λ is the diffusion length. In the discrete resistive network of Fig. 2.6(b) the diffusion length λ = 1/ RG, while in the diffusor network of Fig. 2.6(a) the diffusion length is λ = e 2U κ (V T G V R ). 2.6 Design principles Many additional subthreshold circuital building blocks can be designed using single transistors, differential pairs, current mirrors and exploiting the physics of silicon. Some examples are described in the now classical textbook Analog VLSI and Neural Systems [81] and in the more recent book Analog VLSI: Circuits and Principles [75]. But what should be stressed is the importance of the design principles used by neuromorphic engineers: complex systems can be built by locally interconnecting elementary computational elements and exploiting the non linear, recursive characteristics of the bio-inspired architectures thus designed. The physical constraints imposed by the hardware medium help designers in keeping these non linear systems from diverging even in the (many) cases in which positive-feedback loops are present. Furthermore, the advantages offered by VLSI technology allow them to faithfully reproduce the properties of high parallelism, redundancy and collective computation present in biological systems. In the next chapters we will apply these design principles to models of selective attention systems implemented both on single-chip systems (at a high level of abstraction) and on multi-chip systems (at a lower level of abstraction). But first we analyze from a theoretical perspective WTA networks and show how to map the equations arising from our analysis into subthreshold circuits. 12

23 Part II Winner-take-all Networks 13

24 Chapter 3 Winner-take-all network models A winner-take-all (WTA) circuit is a network of competing cells (neural, software, or hardware) that reports only the response of the cell that has the strongest activation while suppressing the responses of all other cells. These circuits are typically used to implement and model competitive mechanisms among populations of neurons. For example, they are used to select specific regions of an input space [124]. Many WTA networks have been implemented both in software [38, 61, 94, 120] and in hardware [17, 24, 40, 51, 70, 73, 105, 107]. In this Section we analyze a class of WTA networks that emulate biological networks, consisting of a cluster of excitatory neurons that innervate a global feedback inhibitory neuron. These networks have been implemented in avlsi and applied to a wide variety of tasks, including selective attention [13, 50, 121], auditory localization [72], visual stereopsis [78], smooth pursuit/tracking [33, 45], and detection of heading direction [55, 91]. 3.1 Neural network models We shall focus on a particularly simple yet powerful model that describes a population of N homogeneous excitatory units that excites a single global inhibitory unit which feedbacks to inhibit all the excitatory units (Fig. 3.1). For sake of simplicity, we neglect the dynamics of the system and examine only the steady-state solutions. Dynamic properties of these networks and of other physiological models of competitive mechanisms are described in detail in Ben-Yishai et al. [3], Grossberg [38], Kaski and Kohonen [61], Yuille and Geiger [124]. x 1 w e1 x 2 w e2 x 3 w e3 w en-2 x N-2 w en-1 x N-1 w en x N w w in-2 w in-1 i1 w w in i2 w i3 y e1 y e2 y e3 y en-2 y en-1 y en w l3 w l2 w l1 w ln-2 w ln-1 wln y i Figure 3.1: Network of N excitatory neurons (empty circles) projecting to one common inhibitory neuron (filled circle), which provides feedback inhibition. Small filled circles indicate inhibitory synapses and small empty circles indicate excitatory synapses. x 1...x N are external inputs; y e1...y en are the outputs of the excitatory neurons; y i is the output of the inhibitory neuron; w e1...w en are the excitatory synaptic weights of the external inputs; w l1...w ln are the excitatory weights onto the global inhibitory neuron; and w i1...w in are the inhibitory weights from the inhibitory neuron onto the excitatory neurons. 14

25 3.1. NEURAL NETWORK MODELS Consider a network (Fig. 3.1), in which the external input to the j th excitatory neuron is x j, the response of the j th excitatory neuron is y e j, the response of the inhibitory neuron is y i ; and in which the weights of the synapses from the external inputs to the excitatory neurons, from the inhibitory neuron to excitatory ones and from the excitatory neuron to the inhibitory one are w e j, w i j and w l j respectively. We can write this network as y e j = f (w e j x j w i j y i ) y i = f ( N j=1w l j y e j ) (3.1) where f ( ) denotes the transfer function of both excitatory and inhibitory neurons. This system of coupled equations describes the recurrent interactions between excitatory neurons and the inhibitory neuron. We explore the behavior of the system by considering three special cases: 1. The case in which all neurons have a linear transfer function ( f (x) = x). 2. The case in which the neurons are linear-threshold ( f (x) = max(0,x)), and all external inputs are identical. 3. The case in which the neurons are linear-threshold, and one external input is much larger than all others. More general cases using non-linear transfer functions are difficult to solve analytically; however, they can be studied using numerical simulations. Linear Units If the neurons are fully linear ( f (x) = x) we can solve the system analytically: y e j = w e j x j w i j y i y i = w l j (w e j x j w i j y i ) (3.2) j which implies that y e j = w e j x j w i j k w lk w ek x k 1 + k w lk w ik y i = In the simplified case, we assume that all the weights of each kind are the same: k w lk w ek x k 1 + k w lk w ik. (3.3) w e j = w e w i j = w i w l j = w 0 j j j and so y e j = w e x j w e k x k 1 w i w 0 + N (3.4) The output of each neuron is proportional to its input, but has a normalizing term subtracted. Equation (3.4) shows that the response y e j of a linear excitatory neuron can have both positive and negative values, depending on the inputs x k, on its connection weights w e, w i, w 0 and on the total number of excitatory neurons N. 15

26 CHAPTER 3. WINNER-TAKE-ALL NETWORK MODELS Linear Threshold Units with Uniform External Inputs The half-wave rectification function ( f (x) = max(0, x)) is a more biologically realistic function than the linear one of the previous case. Neurons with this transfer function have a response of only positive values. In this case, the system of equations (Eqs. 3.1) becomes a system of non-linear coupled equations, and it is not longer possible to obtain a general closed form solution. However if all external inputs are identical (x j = x 0 j), we can reduce the system to y e j = max(0,w e j x 0 w i j y i ) ( ) y i = max 0, w l j y e j j (3.5) and if we make the working hypothesis that (w e j x 0 w i j y i ) > 0 j, then we obtain the linear system: which yields y e j = w e j x 0 w i j y i y i = w l j (w e j x 0 w i j y i ) (3.6) j y e j = x 0 w e j ( 1 + k w lk w ik ) wi j k w ek w lk 1 + k w lk w ik j w e j w l j y i = x 0 (3.7) 1 + j.w l j w i j If the synapses from external inputs and those from the inhibitory neuron have equal strength (w e j = w i j = w 0 j), then y e j = x 0 1 w + 0 k w lk y i = x 0 j w l j. 1 w + 0 k w lk (3.8) The hypothesis used to obtain Eq. (3.6) is satisfied for all values of x 0 > 0, w 0 > 0, and w l j > 0 j. In summary, if all inputs are equal, then all excitatory linear threshold units have identical outputs which are equal to the input normalized by a term that is directly proportional to the weights w l j and inversely proportional to w 0. Linear Threshold Units with One Input Much Greater than All Others Now consider the case in which one input (say the external input to unit j 0, x j0 ) is much greater than all other external inputs (x j0 x j j j 0 ) and the synaptic weights are as described above. Again, we assume a priori that the weighted external excitatory input to unit j 0 exceeds the inhibitory input to the same unit (w e j0 x j0 w i j0 y i > 0) and that the weighted external inputs to all other excitatory inputs don t (w e j x j w i j y i < 0 j j0). Under these assumptions, Eq. 3.5 can be rewritten which can be simplified to yield y e j0 = (w e j0 x j0 w i j0 y i ) y e j = 0 j j0 y i = w l j0 (w e j0 x 0 w i j0 y i ) (3.9) y e j0 = w e j0 x j0 1 + w l j0 w i j0 y e j = 0 j j0 y i = w e j0 w l j0 x j0 1 + w l j0 w i j0. (3.10) This solution satisfies the assumption that w e j0 x j0 > w i j0 y i for all values of w e j0, x j0, and w i j0. It also satisfies the a priori assumption that w e j x j < w i j y i as long as the external input x j0 is sufficiently large with respect to all other x j inputs. Summarizing: if one external input is much greater than the other inputs, then all excitatory linear threshold units, except the one receiving the strongest input, are suppressed. The output of the winning unit is a normalized version of the input, and the normalizing factor is directly proportional to the connection weights w i j0, w l j0, and inversely proportional to w e j0. 16

27 3.1. NEURAL NETWORK MODELS 0.8 Unit activity Unit position Figure 3.2: Simulations of a WTA network comprising 100 linear-threshold units ordered along one spatial dimension. The input (solid line) is composed of 3 Gaussians. The outputs are shown for two cases: w e j = 1,w i j = 1 and w l j = j (dashed line); w e j = 1,w i j = 1 and w l j = j (dotted line). Input activity Unit position (a) Output activity Unit position (b) Figure 3.3: Numerical simulation the same WTA network shown in Fig. 3.2 now with weight values w e j = 1,w i j = 1 and w l j = (a) is the input distribution of increasing amplitude. (b) Network responses to the three inputs shown in (a). Numerical Simulations It is not possible to obtain a closed form solution for networks with linear threshold units and any arbitrary input distribution, or networks with arbitrary transfer functions, however numerical simulations are useful for providing insight into the general computational properties of the network. For example, the simulations shown in Figs. 3.2 and 3.3 explore the response of a network with f (x) = max(0,x) and N=100 to a more complicated input distribution, consisting of three Gaussians centered at unit positions 20, 50, and 80, and having maximum values of 0.75, 0.5, and 0.35 respectively (see solid line of Fig. 3.2). The simulations of Fig. 3.2 show the effect of modifying the excitatory to inhibitory weights w l j (with all other weights set to one). When w l j = j, the output is a thresholded version of the input, consisting of 3 peaks of activity. However, when w l j is increased to j, only the strongest input peak is reflected in the output. In the simulations of Fig. 3.3(a), the excitatory to inhibitory weights w l j are set to an intermediate value of to j, and the network responds to the two strongest peaks in the input. The form of the response is invariant to the input strength (or alternatively, the strength of the w e j weights) as shown in Fig

28 CHAPTER 3. WINNER-TAKE-ALL NETWORK MODELS 3.2 Non-linear Programming Formulation The competitive mechanism that emerges from the neural architecture of Fig. 3.1 can also be described mathematically. The following set of non-linear equations select the largest number among N real numbers by multiplying the neuron output signals y e j by binary-valued constants (α j, either 0 or 1): ( ) N min α j y e j with constraint: j=1 N α j = 1 (α j {0,1}). (3.11) j=1 Systems of non-linear equations with discrete constraints are difficult to solve. We can simplify the system if we extend the domain of α j to the continuous interval [0,1] and include an additional constraint that forces the continuous values of α j to tend either toward zero or one. : ( min α j y e j )with constraints: j α j = 1 j α j ln α j = 0. (3.12) j These types of systems can be solved using the Lagrange multipliers method [6]. Solving Eq is equivalent to finding min(l) and max(l), where λ 1 and λ 2 are parameters called Lagrange multipliers, and L is the cost function α j λ 1,λ 2 N L = ( N j y e j + λ 1 j=1α j=1 α j 1 If we set λ 2 to a constant, then we can find min(l) and max(l) by solving: α j λ 1 which implies α j L = y e j + λ 1 + λ 2 (ln α j 1) = 0 ) + λ 2 N j=1α j ln α j. (3.13) L = λ 1 α j 1 = 0 (3.14) j λ 1 +λ 2 e λ 2 ye j λ 1 λ 2 α j = e λ 2 = e ye j /λ 2 (3.15) j and α j = eye j /λ 2 k e ye k /λ 2. (3.16) This equation approaches the solution of Eq when λ 2 is sufficiently small. This system of constrained non-linear equations can be implemented using MOSFETs in the subthreshold domain. The circuit that solves this system of equations is the current-mode WTA circuit described in the next Section (see Fig. 4.1). If we assume that the MOSFETs of the circuit of Fig. 4.1 are in saturation, so that eq. (2.2) holds, we can write: I out j = I 0 e κ V d j U T Vc U T 18

29 3.2. NON-LINEAR PROGRAMMING FORMULATION If we then apply Kirchhoff s current law at the common node V c (I b = j I out j ), and we observe that the circuit s output currents I out j can be expressed as a fraction α j of the total bias current: I out j = α j I b = α j N k=1i outk. then we can prove the equivalence between the circuit s response and the system of equations (Eqs. 3.15): ye j I out j = e λ 2 λ 1 +λ 2 I b = e λ 2 with ( y e j = λ 2 κ V d j V ) c + ln(i 0 ) U T U T λ 1 = λ 2 (ln(i b ) 1). 19

30 Chapter 4 Current mode Winner-Take-All circuits CMOS implementations of WTA networks are an important class of circuits widely used in neural networks and patternrecognition systems. They implement architectures that select one node, out of many, through a competition mechanism that depends on the amplitude of the architecture s input signals. Several types of WTA circuits have been proposed in the literature [17, 24, 32, 42, 70, 73, 84, 92, 105, 123]. Each WTA circuit was designed with specific optimization constraints in mind. For example, the circuits proposed in [92] and in [105] are optimal for high-speed, high-precision applications, whereas the circuits of [84] and [32] are optimal for pulse-coded neural networks. The WTA circuit proposed by Lazzaro et al. [73] optimizes power consumption and silicon area usage. It is ideal for applications that do not require high precision or high speed computation, such as sensory perception tasks [26, 44, 72]. This circuit, proposed more than ten years ago, still remains one of the most compact and elegant designs of analog current-mode WTA circuits. It is asynchronous; it responds in real-time; and it processes all its input currents in parallel, using only two transistors per node, if the output signal is a voltage, or four transistors if the output signal is a current (see Fig. 4.3(b)). Recently, some extensions to the basic design described in [73] have been proposed [27, 48, 107]. They endow the WTA circuit with local excitatory feedback [107] and with distributed hysteresis [27, 48]. Local excitatory feedback enhances resolution and speed performance of the circuit, providing a hysteretic mechanism that withstands the selection of other potential winners unless they are stronger than the selected one by a set hysteretic current. Distributed hysteresis allows the winning input to shift between adjacent locations maintaining its winning status, without having to reset the network. These enhanced types of WTA networks are able to select and lock onto the input with strongest amplitude, and to track it as it shifts smoothly from one pixel to its neighbor [47, 51, 88]. In this Section we will first analyze the original WTA circuit, first proposed in [73] and then describe a new version of the current-mode WTA circuit that contains local excitatory feedback and lateral excitatory coupling (to implement distributed hysteresis) but that also implements lateral inhibitory coupling and diode-source degeneration. The interactions between the non-linearities of the WTA network and the lateral coupling networks produce center-surround spatial response properties that differ from the ones obtained using conventional spatial diffusion networks [10, 119]. To make an accurate comparison between the performance of the new WTA network and the performance of the classical WTA network described in [73], we implemented both circuits on the same chip, using transistors of the same size, common bias pads and the same input sources. In the next two sections we describe the circuits, present experimental data from both networks, derive analytically the hysteretic WTA network s lateral coupling properties as a function of the circuit parameters, point out the differences to conventional diffusor networks and show the response properties of the circuit when both lateral excitatory and lateral inhibitory couplings are enabled. 4.1 The original current-mode WTA circuit The circuit of Fig. 4.1 is a continuous time, analog circuit that implements a WTA network. It was originally designed by Lazzaro et al. [73] and is extensively used in a wide variety of applications. The circuit is extremely compact and elegant: It processes all the (continuous-time) input signals in parallel, using only two transistors per input cell, and one global transistor that is common to all cells. Collective computation and global connectivity is obtained using one single node common to all cells. An example of a WTA circuit containing only 2 cells is shown in Fig Each cell comprises a current-controlled conveyor and is connected to a global node V c. The WTA network is modular and can be extended to N cells, by connecting additional cells to the node V c. Input currents are applied to the network through current sources which are implemented for example using subthreshold pfets. The output signals are encoded both by the I out1 and I out2 currents, and the V d1 and V d2 voltages. The voltage V b sets the bias current I b. Transistors M 1 and M 2 discharge nodes V d and so implement 20

31 4.1. THE ORIGINAL CURRENT-MODE WTA CIRCUIT I in,1 I in,2 I out,1 I out,2 V d,1 V d,2 M 3 M 4 M 1 V c M 2 V c I b V b Figure 4.1: Two cells of a current mode WTA circuit. inhibitory feedback. Transistors M 3 and M 4 implement an excitatory feedforward path by charging node V c. The overall circuit selects the largest input current I in j because cell j provides I out j I b, and so suppresses all other output voltages and currents (V di j 0, I outi j 0). Cell j wins the competition because its voltage V d j determines V c by the exponential characteristics of the transistor that sinks the output current I out j (for example, M 3 or M 4 ). We will analyze the behavior of the circuit in the steady-state case using the methods that we applied for the network model: By providing constant input signals and measuring the outputs after the circuit has settled. We consider three cases: Both inputs are equal; one input much larger than the other; and two inputs that differ by a very small amount (small-signal regime). Both Inputs Equal If the two input currents are equal (I in1 = I in2 = I m ) then the currents flowing into transistors M 1 and M 2 of Fig. 4.1 are also equal. In this case, because the gates of M 1 and M 2 are tied to the same common node V c, the drain voltages of M 1 and M 2 must take the same value (V d1 = V d2 = V m ). As a result, the output transistors M 3 and M 4 will have the same gate-to-source voltage difference (V gs3 = V gs4 = V m V c ). If both output transistors are in saturation then the output currents must be identical. Moreover, Kirchhoff s current law requires that, at the common node V c, I out1 = I out2 = I b /2 (Eq. 2.5). One Input Much Greater than The Other From eq. (2.1) we can observe that the subthreshold current flowing through a transistor can be divided into a forward component, I f, and a reverse component, I r : ( ) ( ) κvg V S κvg V D U I out = I f I r = I 0 e T U e T When the transistor s source voltage V s is approximately equal to its drain voltage V d, I r becomes comparable to I f. With this property in mind, we can consider the case in which I in1 I in2. In this case, the drain voltage of M 1 (V d1 ) will be greater than the drain voltage of M 2 (V d2 ). If the transistor M 1 is in saturation (V d1 > 4U T ), the dominant component of its drain current will be in the forward direction and its gate voltage V c will increase such that I d1 = I f1 = I 0 e κ U Vc T = I in1. Although the two input currents I in1 and I in2 are different, the forward component of the drain currents of M 1 and M 2 are (4.1) 21

32 CHAPTER 4. CURRENT MODE WINNER-TAKE-ALL CIRCUITS equal (I f1 = I f2 ) because the two transistors have a common gate voltage V c, and both their sources are tied to ground. The drain current I d2 of transistor M 2 can only be equal to the input current I in2 under the following conditions: I f2 I r2 = I in2 which implies that I r2 = I f2 I in2 so I r2 = I in1 I in2 0. The reverse component of I d2 becomes significant only if V d2 decreases enough for M 2 to operate in its ohmic region (V d2 4U T ). In this case, the output transistor M 4 is effectively switched off, and I out2 = 0. Consequently, M 3 sources all the bias current (I out1 = I b ), with V d1 satisfying the equation I 0 e κv d V 1 c = I b. The experimental data of Fig. 4.2 shows the output voltages (V d,1 and V d,2 ) and output currents (I out,1 and I out,2 ) of the circuit, in response to the differential input voltage V which encodes the ratio of the input currents. In this experiment, the input currents were provided by pfets operating in the subthreshold regime: The gate voltage V in1 of the pfet sourcing current into the first cell was set to 4.3V, while the gate voltage V in2 of the pfet sourcing current into the second cell was set to V in2 = V in1 + V. The two traces in each plot show the responses of the two cells as V was swept from -8mV to +8mV. When V is zero (the input currents are identical), the output signals of both cells are also identical. When V is large (one input current dominates), a single cell is selected. If V is small, the above description is not adequate. Instead, we can compute the outputs signals of the cells using small-signal analysis [71]. Two Inputs Differ by a Small Amount To analyze the circuit in this regime, we must consider the Early effect of the transistor operating in the saturation region (Eq 2.1): I ds = I sat (1 + V ds V e ) (4.2) where V e is the Early voltage. Assume that the two input currents I in1 and I in2 are initially equal. In this case, the transistors M 1 and M 2 operate in the saturation region: The output voltages V d1 and V d2 will settle to a common value, and the output currents I out1 and I out2 will both be equal to I b /2. If we now increase the input current I in1 by a small amount δ I and apply Eq. (2.1) to transistor M 1 of Fig. 4.1, then its drain voltage V d1 will increase by δ V = δ I I sat V e. (4.3) As V d1 is also the gate voltage of transistor M 3, the I out,1 will be amplified by an amount proportional to e δ V. The constraint of Eq. (2.5) requires that I out2 decrease by the same amount in steady state. This reduction means the gate voltage V d2 of M 4 must decrease by δ V. The gain of the competition mechanism ( δ V δi ) in the small signal regime is directly proportional to the Early voltage V e and inversely proportional to I sat. The Early voltage depends on the geometry of the transistors and is fixed at design time. On the other hand I sat depends on V c, which changes with the amplitude of the input currents. 4.2 The hysteretic WTA circuit The circuit diagram of one cell of the hysteretic WTA (hwta) network is shown in Fig The data shown in this Section was taken from a hwta network implemented using a 2 µm CMOS technology, as linear arrays of 25 cells. The cell size of the hwta network is 60 µm 100 µm. The current source of in Fig. 4.3 that generates the bias current I b can be implemented using a single n-type MOS transistor. If the transistor operates in weak-inversion the transistor is in saturation as long as V c 4U T (i.e. V c 100mV), its output κv b current being: I b = I 0 e U T. The term U T represents the thermal voltage, I 0 the zero bias current, and κ the subthreshold 22

33 4.2. THE HYSTERETIC WTA CIRCUIT Output Voltage (V) Differential input voltage (mv) (a) 250 Output Current (na) Differential input voltage (mv) (b) Figure 4.2: Responses of the two-cell WTA circuit shown in Fig (a) Voltage output (V d1 and V d2 ) versus the differential input voltage. (b) Current output (I out1 and I out2 ). The bias voltage V b = 0.7V. The small difference in the maximum output currents is due to device mismatch effects in the read-out transistors of the two cells. slope coefficient [75]. In practical applications I b can be set by providing an external bias current into a single diodeconnected transistor that has its gate connected to all the network s bias transistors (thus implementing a series of currentmirrors). Similarly, the input current source that generates I in can be implemented using a p-type transistor operating in the subthreshold regime. Although the WTA circuit can operate both in the weak and strong inversion regimes, it is typically operated in the weak inversion/subthreshold regime. In this regime the circuit is particularly sensitive to device mismatch and noise. In the existent implementation, when operated in subthreshold, the circuit selects one single winner if its input currents differ by at least 10% among each other, and one input is greater than the others. The low currents provided by the subthreshold input transistors and by the bias transistor (typically ranging from fractions of pico-amperes to hundreds of nano-amperes) also limit the circuit s dynamic response properties. As with the original WTA circuit, the network s time constant is dominated by the maximum input current and ranges from fractions of milliseconds up to fractions of seconds. The detailed, quantitative analysis of the WTA s dynamic response properties discussed in [73] is valid also for the circuit proposed here. As the original WTA circuit, this circuit is ideal for tasks that do not rely on high precision and do not require time constants lower than a few milliseconds. Fortunatelly, most applications involving perception and processing of sensory signals fall into this category. The main differences between the original WTA design and the one described here are implemented by transistors M5 23

34 CHAPTER 4. CURRENT MODE WINNER-TAKE-ALL CIRCUITS Vdd V gain Vdd Vdd I in M5 M3 M4 I out V out M2 M9 V inh M8 M1 V c V ex I all M7 M6 I b Figure 4.3: Hysteretic WTA cell, with local excitatory feedback, lateral excitatory coupling, lateral inhibitory coupling and diode-source degeneration. through M9, as shown in Fig Specifically, transistor M5, together with M3 of Fig. 4.3 implement local excitatory feedback. Transistor M6 implements diode-source degeneration, and transistors M8 and M9 implement inhibitory and excitatory lateral coupling respectively Local Excitatory Feedback The main effect of local excitatory feedback is to introduce a hysteretic behavior into the WTA network. Once a cell is selected as the winner, a current proportional to the network s bias current I b is sourced back into the cell s input node through the current-mirror formed by M3 and M5 (see Fig. 4.3). If the bias current I b is a subthreshold current, the proportionality factor of the local excitatory feedback current is modulated exponentially by the voltage difference (V dd V gain ). Hysteresis is evident because, after a cell has been selected as the winner, to lose its winning status the cell s input current has to decrease by an additional amount equal to the local excitatory feedback current. Figure 4.4 shows the output of a cell of the hwta network, superimposed on the output of the corresponding cell belonging to the classical WTA network, in response to the same input signals. For both types of WTA networks, input currents were applied only to two neighboring cells, while all other cells received no input. The common mode input current of the stimulated cells was set by biasing the input p-type transistors with a constant voltage V in = 4.2V. The bias current of both WTA networks was generated using a bias voltage V b = 0.67V. The local excitatory feedback loop of the hwta circuit was fully activated (V gain = V dd ). The width of the hysteresis curve can be modulated by changing either the WTA network s bias current I b, or the control voltage V gain. The stability properties of the hwta network are the same as those of conventional winner-take-all circuits with positive feedback, and have been analyzed in detail in [107]. Similarly, the dynamic response properties of the hwta network are the same as those of the classical current-mode WTA network described in [73] and depend mainly on the values of I b and of the total current entering the input nodes of the WTA cells Diode-source degeneration Source degeneration, also referred to as emitter degeneration for bipolar transistors, is a classical technique in analog design [37]. It consists of converting the current flowing through a transistor into a voltage, by dropping it across a resistor or a diode, and feeding this voltage back to the source of the transistor, to increase its gate voltage accordingly. At the WTA network level, source degeneration of the input transistor has the effect of increasing the circuit s winner selectivity gain. This is evident in Fig. 4.5, where the output of the diode-source degenerated network is superimposed on the output of the classical WTA network, in response to the same input signals. This figure shows the output of four cells (two neighboring cells per type of WTA network) as they change their state from winning to losing and vice-versa. Small differences in the 24

35 4.3. LATERAL COUPLING Output Current (na) Differential input voltage (mv) Figure 4.4: Response of the hwta circuit (outer hysteresis plot) superimposed to the response of the classical WTA circuit (inner central plot). The output of the classical WTA circuit was shifted vertically by a few nano-amperes for sake of clarity. 250 Classical WTA Source diode degenerated WTA 200 Output Current (na) Differential input voltage (mv) Figure 4.5: Diode-source degenerated WTA network output and classical WTA network output. amplitude of the winning signals are due to mismatches of the readout transistors (M4 of Fig. 4.3). The data was taken using the same input stimulus arrangement described for Fig The bias current of both WTA networks was generated using a bias voltage V b = 0.7V. Local excitatory feedback (and the hysteretic behavior associated with it) was disabled by setting the control voltage V gain to 3V. By adding just one transistor and connecting its gate to the diode-source degeneration transistor of each WTA cell it is possible to read out a copy of the cell s net input current I all (see M7 of Fig. 4.3). As I all represents the sum of all of the currents converging into the WTA cell (namely, the input current I in, the current being spread to or from the left and right nearest neighbors and the local excitatory feedback current coming from the top p-type current mirror), it is a useful measure for visualizing the state of the WTA network. 4.3 Lateral coupling Lateral coupling is implemented in the hwta network proposed here by means of diffusor (or pseudo-conductance ) networks [10, 119]. Diffusor networks are extensively used in silicon retinas and other types of neuromorphic circuits. In 25

36 CHAPTER 4. CURRENT MODE WINNER-TAKE-ALL CIRCUITS Vdd Vdd Vdd Vdd I in V 0 I r0 I f0 V i I r,i I f,i V i+1 I r,i+1 I f,i+1 V c V ex Vc V ex Vc V ex I a,0 I a,i I a,i+1 I B Figure 4.6: Simplified WTA circuit, used to analyze the excitatory diffusor network. the circuit proposed in this article the current diffusors are implemented by transistors M8 and M9 of Fig. 4.3, operated in the subthreshold regime. Specifically, transistor M9 implements lateral excitatory coupling and transistor M8 lateral inhibitory coupling. Functionally, the inhibitory diffusor network can be used to spatially decouple the WTA cells, while the excitatory diffusor network can be used to smooth the input signals, combined with the local excitatory feedback current of the winning cell (see Section 4.2.1) Lateral excitation To study analytically the principle of operation of the excitatory diffusor network let us neglect, for the time being, the inhibitory diffusor network (i.e. let us set the inhibition to be global with V inh = 5V ). Furthermore let us neglect, for the sake of simplicity, transistors M5, M6, and M7 of Fig. 4.3 and apply a constant subthreshold input current I in only to the first node of the network. In this case the hwta network reduces to the circuit shown in Fig As pointed out by the figure, the (subthreshold) currents flowing through the diffusors can be separated into forward and reverse components: I d,i = I f,i I r,i, where I r,i = I 0 e κ Vex U T V i U T (4.4) I f,i = I 0 e κ Vex U T V i+1 U T (4.5) From these equations the following relationship holds: I f,i = I r,i+1 (4.6) By writing Kirchhoff s current law at each node i we have: I a,i = (I f,i 1 I r,i 1 ) (I f,i I r,i ) (4.7) which, using eq. (4.6), turns into: I a,i = 2I r,i I r,i 1 I r,i+1 (4.8) but, if I a,i is a subthreshold current, we can also write: I a,i = I 0 e κ Vc U T (1 e V i U T ) (4.9) and, by expressing V i in terms of I r,i (using eq. (4.4)) ( I a,i = I 0 e κ U Vc T e κ Vc U Vex T U T )I r,i (4.10) 26

37 4.3. LATERAL COUPLING which yields I r,i = λi 0 e κ Vc U T λi a,i (4.11) where λ = e κ ( Vc U T Vex U T ). Substituting eq. (4.11) into eq. (4.8) we obtain the discrete approximation of a Laplacian: It follows that I a,i = λ(i a,i 1 2I a,i + I a,i+1 ) (4.12) I a,i = By using this equation recursively we can write λ 1 + 2λ I a,i 1 + λ 1 + 2λ I a,i+1 (4.13) I a,i = λ 1 + 2λ I λ 2 a,i 1 + (1 + 2λ) 2 (I a,i + I a,i+2 ) (4.14) If λ 1, eq. (4.14) reduces to I a,i λi a,i 1 (4.15) If we want to estimate the current flowing to ground through the n th transistor of the network I a,n, we can use eq. (4.15) recursively until we reach the first cell of the network (node 0): I a,n = I a,0 λ n (4.16) but, as I a,0 I in (if λ 1), we can write: I a,n = I in e nκ ( Vc U T Vex U T ). (4.17) The term λ is defined as the network s space constant. The space constant (and with it, the network s spatial coupling) is modulated exponentially by the term (V c V ex ). While V ex is a directly accessible circuit parameter, independent of other circuit parameters, the voltage V c depends logarithmically on the input current. Specifically, for the circuit of Fig. 4.6: I a,0 = I 0 e k Vc U T I in (4.18) With this relationship in mind,we can rewrite λ as a function of V ex and I in, and eq. (4.16) reduces to: I a,n = I in I 0e κ Vex n U T I in (4.19) According to this finding, an increase in V ex will increase (exponentially) the amount of spreading and allow more current to flow through the diffusors. Conversely, increases in the amplitude of I in will narrow the spreading width of the network and diminish the amount of current flowing through the diffusors. In this respect this excitatory diffusor network differs from the diffusor networks previously proposed [10] which have the undesirable property of increasing lateral spreading with increasing amplitude of input signals. In typical applications of diffusor networks, increasing the range over which spatial averaging takes place can be an effective strategy if the signal-to-noise ratio of the input signals is not too high. On the other hand, if input signals are strong, smoothing over large regions not only might not be useful, but could even be counterproductive. The experimental data of Fig. 4.7 confirms the theoretical predictions of eq. (4.19). In Fig. 4.7(a) we stimulated the first cell of the hwta network with a constant current and measured its response for different values of V ex. As for the theoretical analysis, lateral inhibition is set to be global (V inh = 5V ); the effects of the diode-source degeneration transistors can be neglected, as the currents flowing through the diode-connected transistors (M6 of Fig. 4.3) are of the order of a few nanoamperes. The discontinuity present in the response profile between the first cell of the network and the second is due to the non-linear nature of the WTA competitive mechanism. From the second cell on, the measured current decays exponentially with distance, as predicted by eq. (4.19), (see inset of Fig. 4.7(a)). In Fig. 4.7(b) we stimulated the first cell of the network with currents of increasing amplitude (for a fixed value of V ex ), measured the network s response and plotted the data on a normalized scale. As predicted by eq. (4.19), and as shown in Fig. 4.7(b), lateral spreading decreases with increasing amplitude of the input current. 27

38 CHAPTER 4. CURRENT MODE WINNER-TAKE-ALL CIRCUITS 18 Current (na) V ex Pixel position (a) Normalized Units I in Pixel position (b) Figure 4.7: Effect of lateral excitatory coupling on the hwta network. (a) Output currents I all (see Fig. 4.3) measured at each cell of the network for four increasing values of V ex. The inset shows a fit of the data from cells 2 to 20 with an exponential function. (b) Output currents I all measured for three increasing values of I in. Each data set is normalized to the maximum measured current. 28

39 4.3. LATERAL COUPLING Current (na) Current (na) Pixel position (a) Pixel position (b) Current (na) Current (na) Pixel position (c) Pixel position (d) Figure 4.8: Scanned output currents of hwta network state (top solid-line), of hwta output (bottom solid-line) and of classical WTA output (bottom dotted line). (a) Input currents are applied to cell 1 (V gs,1 = 1.1V ), cell 12 (V gs,12 = 1.0V ) and cell 13 (V gs,13 = 1.0V ), lateral excitation is turned off (V ex = 0V ) and inhibition is global (V inh = 5V ). Both the basic WTA network and the hwta network select cell 1 as the winner. (b) Input signals and network bias settings are the same as in (a), but lateral excitation is turned on (V ex = 1.825V ). The basic WTA network keeps on selecting the strongest absolute input as the winner (cell 1), but the hwta network selects the region with two neighboring cells on, because it has a stronger mean activation. (c) Input currents are applied to cells 5, 12 and 16 (V gs,5 = 1.2V, V gs,12 = 1.1V, V gs,16 = 1.0V ), lateral excitation is turned off and inhibition is global (V ex = 0V, V inh = 5V ). Both the basic WTA network and the hwta network select cell 5 as the winner. (d) Input signals and network bias settings are the same as in (c), but inhibition is local (V inh = 3.35V ). If inhibition is not global, the hwta network allows multiple winners to be selected, as long as they are spatially distant (cell 16 is selected as local winner, despite cell 12 receives a stronger input current) Local inhibition The local inhibitory diffusor network is equivalent in all respects to the local excitatory network. It can be shown, using the same methodology used to analyze the excitatory diffusor network, that the inhibitory diffusor network s space constant depends exponentially on I in and on V inh. We can see intuitively how V inh allows us to modulate the spatial extent over which the WTA cells compete. In one extreme case inhibition is global (i.e. V inh = 5V ), and the WTA network allows only one winner to be active at a time. In the other extreme case, the cells of the WTA network are completely decoupled from each other (V inh = 0), and all cells are allowed to be simultaneously selected as winners. For intermediate values of V inh the network can be biased to allow multiple winners to be active simultaneously, as long as they are sufficiently distant from each other. Combining the effects of both excitatory and inhibitory networks, we can bias the hwta network to exhibit different functional behaviors. Figure 4.8 shows a comparison between the behavior of the classical WTA network and the behavior of the hwta network for different input distributions and different settings of V ex and V inh. Fig. 4.9 shows perhaps one of the most interesting response profiles that can be obtained combining lateral excitation and local inhibition in this WTA network: the center-surround response profile was obtained by stimulating the central cell of 29

40 CHAPTER 4. CURRENT MODE WINNER-TAKE-ALL CIRCUITS V inh =5.00V V =3.00V inh V inh =2.95V V inh =2.90V V =3.00V inh V =2.95V inh V =2.90V inh Current (na) 10 Current (na) Pixel Position (a) Pixel position (b) Figure 4.9: Response of the hwta network to a single cell input (cell 13, with V gs,13 = 1.1V ) for a fixed value of V ex = 1.825V. (a) Current output for 4 different values of V inh. (b) Relative difference between output of the network with global inhibition (V inh = 5V ) and output of the network with 3 different values of V inh. the network with a constant input current, for a fixed value of V ex and different values of V inh. Lateral excitation and local inhibition can be combined together to yield center-surround type of response profiles. This is evident in Fig. 4.9, where we measured the response of the network with lateral excitation enabled (V ex = 1.825V ), after stimulating its central cell with a constant input current, for different values of V inh. 4.4 Applications Besides being a practical, compact, low-power circuit for generic applications that require a winner-take-all type of computation, the hwta circuit is particularly useful in all those applications that involve the processing of sensory signals and the selection of one or more inputs (e.g. for determining motor actions in a sensory-motor system). The center-surround response profile of the network shown in Fig. 4.9 is a rough approximation of a difference of two Gaussians (DOG), which in turn approximates closely a Laplacian of a Gaussian 2 G. It has been argued that this type of operator is ideal for detecting intensity changes in sensory stimuli [80] and resembles closely the response profile of many types of neurons, ranging from simple cells in the visual cortex of mammals [60] to cells in the somatosensory cortex of rats [86], to neurons in the midbrain of barn owls [63]. Examples of applications that exploit the local excitatory feedback and distributed hysteresis circuits are presented in the next Chapter. 30

41 Part III Single-chip Attention Systems 31

42 Chapter 5 Neuromorphic vision sensors as single chip selective attention systems Neuromorphic vision sensors are typically analog VLSI devices that implement hardware models of biological visual systems and that can be used for machine vision tasks [10, 79]. It is only recently that these hardware models have become elaborate enough for use in a variety of engineering applications [64]. These types of devices and systems offer an attractive, low cost alternative to special purpose DSPs for machine vision tasks. They can be used either for reducing the computational load on the digital system in which they are embedded or, ideally, for carrying out all of the necessary computation without the need of any additional hardware. They process images directly at the focal plane level. Typically each pixel contains local circuitry that performs in real time different types of spatio-temporal computations on the continuous analog brightness signal. In contrast CCD cameras or conventional CMOS imagers merely measure the brightness at the pixel level, eventually adjusting their gain to the average brightness level of the whole scene. In neuromorphic vision chips, photoreceptors, memory elements and computational nodes share the same physical space on the silicon surface. The specific computational function of a neuromorphic sensor is determined by the structure of its architecture and by the way its pixels are interconnected. Since each pixel processes information based on locally sensed signals and on data arriving from its neighbors, the type of computation being performed is fully parallel and distributed. Another important feature is the asynchronous operation of neuromorphic sensors, which is preferable to clocked operation for sensory processing, given the continuous nature of sensory signals. Clocked systems introduce temporal aliasing artifacts that can significantly compromise the time-dependent computations performed in real-time sensory processing systems. Several neuromorphic sensors based on models of visual attention have been presented [13, 45, 89, 121]. These systems typically contain photo-sensing elements and processing elements on the same focal plane, apply the competitive selection process to visual stimuli sensed and processed by the focal plane processor itself and perform visual tracking operations. Tracking features of interest as they move in the environment is a computationally demanding task for machine vision systems. The control loop of active vision systems, comprising motors that steer the visual sensor, relies on the speed of the specific computation carried out. The stability of system depends on the latency of the sensory-motor control loop itself. Neuromorphic tracking sensors can reduce this latency and improve the performance of the active vision system. Here we describe a tracking architecture that reduces the computational cost of the processing stages interfaced to it by carrying out an extensive amount of computation at the focal plane itself, and transmitting only the result of this computation, rather than extensive amounts of data representing the raw input image. Despite the approach here followed is very similar to the one followed in previously published work, the tracking architecture we implemented differs from previously proposed ones in two key features: it selects high-contrast edges independent of the absolute brightness of the scene (as opposed to simply selecting the scene s brightest region [13, 33, 88]); and it uses a hysteretic WTA network, with positive feedback and lateral coupling, to lock-onto and smoothly track the selected targets (different from WTA networks used in other tracking devices [13, 46, 88]). These features allow systems that use the architecture here proposed to reliably track natural stimuli in a wide variety of illumination conditions. 5.1 A one-dimensional tracking chip The tracking architecture here proposed is structured in a hierarchical way and can be implemented on a single chip device. As the architecture is one-dimensional, we can design thin, long processing columns in a way to optimize the area used and increase the number of pixels on the device. Two chips of approximately 2mm 2mm were fabricated using a standard 2µm and 1.2µm CMOS technology respectively. The processing columns of each chip, are 60λ wide, where λ is the 32

43 5.1. A ONE-DIMENSIONAL TRACKING CHIP CENTROID CENTROID CENTROID CENTROID WINNER TAKE ALL WINNER TAKE ALL ADAPTIVE PHOTORECEPTOR ADAPTIVE PHOTORECEPTOR ADAPTIVE PHOTORECEPTOR ADAPTIVE PHOTORECEPTOR ADAPTIVE PHOTORECEPTOR ADAPTIVE PHOTORECEPTOR Figure 5.1: Block diagram of single-chip tracking system. Spatial edges are detected at the first computational stages by adaptive photoreceptors connected to transconductance amplifiers. The edge with strongest contrast is selected by a winner-take-all network and its position is encoded with a single continuous analog voltage by a position-to-voltage circuit (see Section 5.1.6). scalable CMOS design rule parameter, corresponding to 1µm for the 2µm process and to 0.6µm for the 1.2µm process. As the circuits are analog and some circuit elements (such as capacitors) don t scale with λ, the layouts of the two chips are slightly different (despite the schematic diagrams are identical). The 2µm chip has a pixel pitch of 60µm and contains 25 processing columns, while the 1.2µm has a pixel pitch of 36µm and contains 40 processing columns System Architecture Image brightness data is processed in parallel through five main computational stages. A block diagram of the device s architecture is depicted in Fig The first stage is an array of adaptive photoreceptors [23] that map logarithmically image intensity into their output voltages. The second stage is composed of an array of simple transconductance amplifiers, operated in the subthreshold regime, which receive input voltages from neighboring photoreceptors [81]. The amplitude of their output currents encode the contrast intensity of edges and the sign their polarity. At the third computational stage the polarity of each edge is gated so that the sensor selectively responds either to ON edges (dark to bright transitions), or to OFF edges (bright to dark transitions) or to both. The fourth stage uses a hwta network (see Section 4.2) which selects and locks onto the feature with strongest spatial contrast moving at the speed that best matches the photoreceptor s velocity tuning. Finally in the last stage there is a position-to-voltage circuit [28], that allows the system to encode the spatial position of the WTA network s output with a single analog value. The 1.2µm chip layout of these circuits is shown in Fig Fig. 5.3 summarizes the general response properties of the 2µm chip by showing the outputs of the different computational stages above described. The top trace of Fig. 5.3(a) shows the responses of the array of adaptive photoreceptors to a black bar on a white background, imaged onto the chip s surface using a standard CS mount 4mm lens with an f -number of 1.2. The two lower traces of the figure are the response of the edge polarity detector circuits, representing the spatial derivative of the input stimulus. Fig. 5.3(b) shows the response of the position-to-voltage circuit to 11 different winning pixel positions. The figure s inset displays 11 snapshots of the WTA response to the 11 corresponding spatial positions of the input stimulus Adaptive Photoreceptor Circuit This photoreceptor circuit, originally designed by Tobi Delbrück [23], has been used extensively in many neuromorphic sensors. The response of the circuit is invariant to absolute light intensity, (changing logarithmically with image brightness). The adaptive photoreceptor exhibits the characteristics of a temporal bandpass filter, with adjustable high and low frequency cut off values. Fig. 5.4 shows the response of the array of photoreceptors to a moving bar, for two different adaptation settings. In Fig. 5.4(a) the adaptation rate was low with adaptation time constants in the order of hundreds of milliseconds. In Fig. 5.4(b) the adaptation rate was very high such that the photoreceptors adapt quickly to brightness 33

CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Figure 5.2: Portion of layout of the 1.2µm chip containing 7 processing columns.

44 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Figure 5.2: Portion of layout of the 1.2µm chip containing 7 processing columns. The size of each computational stage is evidenced on the right. transients. Because of its adaptation property, the photoreceptor biased in this way has a response which results in both contrast and speed dependence Spatial Derivative Circuit Spatial derivative is implemented using simple transconductance amplifiers operated in the subthreshold regime. The amplifiers receive input voltages from neighboring photoreceptors and provide a bidirectional output current that is proportional to the hyperbolic tangent of their differential input [81]. The output current saturates smoothly as the differential voltage increases (in absolute value) beyond mV. The possibility of electronically smoothing the input image (at the adaptive-photoreceptors stage) allows the user to operate the spatial derivative circuit always in its linear range, for a stimulus with fixed spatial frequencies. Furthermore, the presence of multiple stimuli with contrast high enough to saturate the transconductance amplifiers currents is not going to compromise the sensor s tracking performance, as the WTA network is able to lock onto the feature selected (see Section 5.1.5) Edge-Polarity Detector Circuit The polarity of edges in the visual scene is encoded by the sign of the transconductance amplifiers currents. Each of these currents is fed into a circuit of the type shown in Fig The amplifier in the left part of Fig. 5.5 together with transistors M1 through M6 implement a current conveyor [112]. This circuit is used to separate the positive component of the input current I di f f from the negative one, and to decouple the spatial derivative stage from the current-polarity selection stage. Negative input currents are conveyed to transistor M6, while positive ones are flipped through the current mirror 34

45 5.1. A ONE-DIMENSIONAL TRACKING CHIP 2.8 Derivative Circuit Output Voltage (V) Time Pixel Position (sec) (a) Centroid Circuit Output Output voltage (V) Data points (b) Figure 5.3: (a) Response of the array of adaptive photoreceptors to a black bar on a white background (upper trace) and output traces of the edge-polarity detector circuit (lower traces); (b) Output characteristic of the position-to-voltage circuit. The figure s inset contains snapshots of many output traces of the WTA network superimposed, as a stimulus was moving from left to right. The data points in the main figure represent the output of the circuit corresponding to the pixel position of the winner in the inset data. M4,M5 and conveyed to M8. Transistors M6 and M8 source their currents to the polarity selection circuit (transistors M9-M12) [46]. The output current of the polarity selection circuit I edg represents OFF edges (the positive component of I di f f ), ON edges (the negative component of I di f f ) or either type of edge (the absolute value of I di f f ), depending on the control voltage V CT RL and V REF settings. The voltage V BIAS on the positive node of the amplifier is a constant used to bring the circuit into its correct operating point and (in typical operating conditions) assumes values ranging from 1V to 2.5V. The output currents I edg of all edge-polarity detector circuits are sourced, in parallel, to the elements of the next processing stage: the hysteretic winner-take-all network Hysteretic WTA Network This circuit is the one described in Section 4.2. The hysteretic WTA network implemented on these chips contains an additional cell connected to an external bias. This additional cell can be used to set a threshold for the spatio-temporal contrast of edges present in the scene: if the input from external bias is higher than all other inputs the WTA will signal the 35

46 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Volts Pixel position (a) fast slow Volts Pixel position (b) Figure 5.4: (a) Response of the array of photoreceptors, with a very slow adaptation rate, to a dark bar on a white background moving from right to left with an on-chip speed of 31mm/s. The DC value of the response has been subtracted. (b) Response of array of photoreceptors with a fast adaptation rate to the same bar moving at the same speed (left pointing triangles) and at a slightly slower speed (upward pointing triangles). absence of high-contrast edges in the visual scene. The option of introducing hysteresis in the WTA network might cause problems in dynamic environments for which it is necessary to update the winning pixel position continuously (e.g in the domain of tracking applications). One solution would be to reset the WTA network manually any time it needs to be updated [107]. A more elegant solution is the one of using lateral coupling between cells as described in Section 4.3. Cells adjacent to the winning pixel will hence be facilitated in the winner computation process whereas cells in the periphery will be inhibited. This solution takes into account the assumption that the features being selected move continuously in space, and ensures that once the WTA network has selected a target and is engaged in visual tracking, it locks onto it and does not get distracted by possible distracting stimuli in the periphery. Fig. 5.6 shows an example of the response of the WTA network on the 2µm tracking chip to a moving high-contrast bar. The top trace of the figure represents the net input current to the WTA network, and shows the effect of spatial smoothing of the sum of input currents with the hysteretic current from the winner s positive feedback loop. It is clear from this figure that the active winning cell is the one corresponding to pixel 26. The bottom trace shows the istantaneous response of the adaptive photoreceptor array. The input stimulus was the same one used for the previous figures: a 1cm-wide black bar on a white background positioned at approximately 17cm away from the focal plane and 36

47 5.1. A ONE-DIMENSIONAL TRACKING CHIP Vdd Vdd Vdd Vdd M1 M6 M7 M8 Idiff M2 REF BIAS + M3 CTRL M9 M10 M11 M12 CTRL M4 M5 Iedg Figure 5.5: Circuit diagram of the current polarity detector. Positive I di f f currents are conveyed to the n-type current mirror M4,M5. Negative I di f f currents are conveyed to M6 through the the p-type current mirror M1,M6. Depending on the values of the control voltage signals V CT RL and V REF, the output current I edg represents a copy of only one of the two polarities of I di f f, or of both polarities of I di f f (see text for details) Volts Pixel position Figure 5.6: Response of the WTA network to the ON-edge of a bar moving from left to right at an on-chip speed of 31mm/s. The top trace represents the currents I sum of the WTA array while the bottom trace represents the voltage outputs of the array of adaptive photoreceptors. imaged onto the chip through a 4mm lens moving from left to right with an on chip speed of 31mm/s Spatial Position Encoding Circuit This circuit consists of a series of voltage followers, using a common global current mirror which receive inputs from a linear resistive network [28] (see Fig. 5.7). The currents I out being generated by the WTA network at the previous stage, are used as bias currents for the followers. As only one I outi is non-null at any given time, all followers are switched off except for the one connected to the winning WTA cell. The output of the spatial position encoding circuit V out thus represents the position of the winning cell in the array. 37

CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Vdd I left Vdd Vdd Vdd Iout i-1 Iout i Iout i+1 Vdd I right Vout Figure 5.

48 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Vdd I left Vdd Vdd Vdd Iout i-1 Iout i Iout i+1 Vdd I right Vout Figure 5.7: Schematic diagram of position-to-voltage circuit. Example of three neighboring cells connected together. Figure 5.8: Picture of the stand-alone tracker board. The neuromorphic sensor is on the chip beneath the lens. On the left part of the board there is an array of potentiometers used to bias the chip s control voltages. On the top there is an LED display, comprising three display bar lines with their corresponding drivers. The scale in the left part of the figures is in millimeters. 5.2 Stand Alone Visual Tracking Device We attached a 4mm lens to the 2µm chip and mounted it on a board with external potentiometers, used to set its bias voltages. The board also has a one-dimensional LED display with its driver (See Fig. 5.8). The LED display is used to have visual feedback on the position of the feature selected by the chip. The power supply to the whole board is provided by a 9V battery (attached to the back of the board) and a voltage regulator IC. The system is able to detect and report in real time the position of realistic types of stimuli moving within its field of view. It performs reliably in a wide variety of illumination conditions, ranging from dim artificial room illumination to bright sun light, thanks to the adaptive properties of the photoreceptors at the input stage. For this applications the bias settings of the photoreceptor stage are those of fast adaptation rates, as described in Section Lateral coupling between neighboring cells was turned off at the photoreceptor stage but turned on at the WTA level (V ex of Fig. 6.6 was set to 1.2V). Smoothing at the WTA level was useful to reduce the offsets introduced by the spatial derivative and edge-polarity detector circuits. The hysteretic current of the WTA network (summed back into the input nodes through the positive-feedback path) was set to be a small fraction of the maximum possible feed-forward input current (controlled by the bias voltage of the spatial- 38

49 5.3. ACTIVE TRACKING SYSTEM Edge position (V) Time (sec) (a) Stimulus velocity: 7955 pixels/sec 3 Edge position (V) Time (msec) (b) Figure 5.9: (a) Output of the system in response to a finger moving back and forth in front of the chip; (b) Output of the system in response to a pen moving at approximately 8000 pixels/s on a stationary light background. Note the different time scales on the abscissae. derivative transconductance amplifier). All other bias parameters on the chip were not critical and were set to reasonable subthreshold voltages (i.e. [0.5V 0.8V ] for n-type transistors and [4.4V 4.1V ] for p-type transistors). The system biased in such a way adapts out the background of a stationary scene and selects high contrast moving targets present in its field of view, tracking them as they move smoothly in space. Fig. 5.9(a) shows the output of the chip in response to a finger moving back and forth in front of the lens in a laboratory environment with cluttered background. Fig. 5.9(b) shows the output of the chip in response to a black pen moving at a speed of almost 8000 pixels/s on a uniform background. As mentioned in Section 5.1, each pixel of the 2µm chip is 60µm wide, and thus the velocity of the target on the focal plane corresponds to approximately 0.5m/s. The output of the chip is continuous in time, but discrete in space: the discrete jumps present Fig. 5.9 represent the shifting of the winning position from one pixel to the next. 5.3 Active Tracking System We implemented a fully analog active tracking system, by mounting a board with the 1.2µm tracker chip and a 4mm lens onto a DC motor (see Fig. 5.10). The bias settings of the chip were the same used in Section 5.2, except for the 39

50 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Figure 5.10: Picture of tracker chip mounted on a DC motor. The output of the chip is sent to a dual-rail power amplifier which drives directly the motor value of the hysteretic current in the positive-feedback path of the WTA network, which was set to be greater than the feed-forward current I edg. Specifically, the WTA bias voltage V b was set to a value slightly higher than the bias voltage of the spatial-derivative transconductance amplifier, and the source voltage of the p-type transistor of the positive-feedback current mirror (V gain in Fig. 6.6) was set to 5V. In this way the WTA network locks onto the selected target and allows only the nearest-neighbor units to win, if the selected stimulus moves (see also Fig. 5.6 is Section 5.1.5). The position-to-voltage circuits were biased to encode the position of the winner with voltages ranging from 1 to 4 Volts. The analog output of the chip was rescaled and amplified (via an ST L272 power amplifier), such that the selection of features in the right part of the visual field produces positive voltages and the selection of features in the left part of the visual field produces negative voltages. The output voltage, with an amplitude directly proportional to the distance of the target s position from the center of the retina, is used to drive the DC motor. The sensory-motor loop so designed implements a negative feedback system which attempts to zero the motion of the target on the retina: if a target appears in the periphery of the visual scene, the sensor will drive the DC motor so as to orient the sensor s gaze toward the target. As the projection of the target on the retina approaches the center of the pixel array, the output of the system (i.e. the motor s power supply) decreases towards zero, bringing the motor to a stop. In terms of equations we can write, to a first order approximation: { y(t) = Fx(t) θ(t) θ(t) = Ay(t) (5.1) where x(t) represents the position of the target in the visual space, y(t) represents its corresponding projection on the retina, θ the rotation angle produced by the DC motor around its axis and F the optical magnifying factor (see Fig. 5.11(a)). The term θ(t) corresponds to the motor s angular velocity, and A to the open-loop gain of the feedback system. Solving for ẏ(t) we obtain: ẏ(t) = Fẋ(t) Ay(t) (5.2) 40

51 5.4. ROVING ROBOTS If the system is successful in zeroing the motion of the target on the retina (ẏ(t) = 0) we should measure a retinal slip y(t) directly proportional to the velocity of the target in the visual space. Fig. 5.11(b) shows traces obtained from the system, while it was engaged in tracking a swinging target. The target stimulus was a black bar on a white background, similar to the one used to characterize the adaptive photoreceptor circuit, in Section The position of the target in visual space was measured optically by the stand-alone tracker board described in Section 5.2. The target s velocity was computed off-line by differentiating the discretized position signal (hence the jitters in the figure). As shown, the measured response matches, to a first order approximation, the theoretical prediction. The task performed by the system here described is that of smooth pursuit [102]. This model does not take into account the velocity of the target, but only its position. More elaborate models of smooth pursuit tracking have been proposed [33, 46], but none using fewer components (namely a neuromorphic CMOS sensor, a DC motor, a power amplifier and a dual power supply). The system here presented can be considered as the minimal, lowest cost and most compact solution to 1D visual tracking of natural stimuli. 5.4 Roving Robots An application domain that is well suited for the visual tracking chip is that of vehicle-guidance and autonomous navigation. These types of tasks in fact require compact and power-efficient computing devices which should be robust to noise, tolerant to adverse conditions induced by the motion of the system (e.g. to jitter and camera calibration problems) and possibly able to adapt to the highly variable properties of the world. To test our tracking sensor within this framework, we successfully interfaced it several types of robotic platforms, ranging from Koala (K-Team, Switzerland) rovers to LEGO toys (see Fig. 5.12). In these applications the computationally expensive part of the processing (involving visual preprocessing and target selection) is done in real-time by the neuromorphic sensor. Using simple control algorithms, in conjunction with these types of sensors, roving robot are able to reliably track lines randomly layed out on the floor, for a wide variety of conditions (e.g. floors with different texture, cables of different colors and sizes, extreme illumination conditions, etc.) Quantitative measurements were carried out using the Koala (K-Team, Lausanne) mobile robot and measuring the performance of the overall system in a line-following task. The Koala robot measures 32cm in length, 31cm in width and is 11cm high. It has an on-board Motorola processor, 12 digital I/O ports and 6 analog inputs (with 10bit A/D converters), 1 MByte of RAM, and two to three hours of autonomous operation from its battery. The tracking sensor was mounted onto a wire-wrap board together with a 4 mm lens with an f -number of 1.2, and it was attached to the front of Koala with the lens tilted towards ground at an angle of approximately 60 o, in a way to image onto the retinal plane the features present on the floor approximately 10cm ahead (see Fig. 5.13(a) and Fig. 5.14(a)). The bias settings of the chip were the same ones used in the analog active tracking system, described in Section 5.3. For this specific application example we made use of the additional node of the WTA network with its input current set by an external potentiometer. This allowed us to set a threshold value against which we could compare the contrast of edges present in the visual scene. In the case of absence of lines to follow, the WTA network selects the external input and the sensor outputs a unique voltage different from the set of voltages generated by visual stimuli. The output voltage of the tracking chip is directly applied to one of the analog input ports of the robot and digitized. To implement the line-following task Koala uses a very simple control algorithm which reads the tracking chip s output V out and backs up in a random direction if no edge if found. If on the other hand the tracker chip detects an edge and outputs a valid voltage, the algorithm shifts and re-scales V out so that the variable encoding the edge position pos is zero when the target is in the center of the chip s visual field; it sets the forward component of the velocity fwd to a value weighted by a Gaussian function of pos (fwd is maximum when pos=0 and it decays as pos increases); it sets the rotational component of the velocity rot to a value proportional to pos; and finally it executes motor commands sending fwd and rot directly to the robot s motors. Scaling the forward component of the velocity fwd by a Gaussian function of the line s eccentricity allows the robot to slow down in curves. If the line goes out of the field of view of the sensor (e.g. in presence of steep curves), the algorithm forces the robot to stop and back up until it finds again a line to follow. The line-tracking algorithm makes very little use of the on-board CPU s processing power (leaving it free for other CPU-time demanding processes). The computationally expensive part of the processing (involving visual preprocessing and target selection) is done in real-time by the neuromorphic sensor. Using this simple control algorithm, in conjunction with these types of sensors, the robot is able to reliably track lines randomly layed out on the floor, for a wide variety of conditions (e.g. floors with different texture, cables of different colors and sizes, extreme illumination conditions, etc.) [58]. Depending on the bias settings of the edge-polarity detector circuit, the line-following robot will always make left turns at road-forks (e.g. if the circuit is selective to OFF edges and the line is darker than the background) or right-turns. The bias settings can be changed at run-time by the robot using one of its digital I/O ports. Fig shows the robot in the process of tracking a line. The line (a high contrast bar layed onto the floor) is long approximately 323cm and forms a closed loop of elliptic shape with major axis long roughly 110cm and a minor axis long 90cm. The robot followed the line with an average speed of 5 loops/min (corresponding roughly to 27cm/s). To measure quantitatively the robot s performance, we stored a sequence of images (sampled at a rate of 4 frames/s) and applied them 41

CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS (a) 4.5 4 position retinal slip velocity 3.5 3 Volts (V) 2.5 2 1.5 1 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.

52 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS (a) position retinal slip velocity Volts (V) Time (s) (b) Figure 5.11: (a) Setup of the active tracking system as seen from above. The angle θ represents the angular displacement produced by the DC motor, x represents the target s position in the visual space, y represents the distance of the target s projection on the retina from its center. The angular velocity θ is proportional to y. (b) Chip data measured as the system was engaged in tracking a swinging bar. The bar s position (circles) was measured using a separate (fixed) tracking board, while its velocity (solid line) was computed off-line from the discretized position data. The crosses represent the output of the active sensor used to drive the system s DC motor. in input to the Kanade-Lucas-Tomasi Feature Tracker [106]. The data was taken in dim natural light conditions (typical of a cloudy rainy day in Zurich, Switzerland). Fig. 5.13(b) shows the features tracked by the algorithm for a sequence of 150 frames (in which the robot completed 4 loops). The features selected by the algorithm correspond to a (moving) black cross drawn on the robot s white top. Closely grouped features indicate the re-visitation of nearby positions over time. Features are more dense in the steep parts of the curve because of the slower speed values that the robot uses, as determined by its control algorithm. 42

53 5.5. EXTENSIONS OF 1-D TRACKING SENSORS Figure 5.12: Tracker chip mounted on a LEGO robot performing a target exploration task. Using very little CPU power, this robot is able to simultaneously explore (make random body/head movements), attend (orient the sensor toward high-contrast moving edges) and pursuit (drive towards the target). Fig shows an experiment similar to the one described in Fig. 5.13, but run in a different, less controlled environment. The robot was following a line of white paper adhesive tape layed on a light blue carpet forming an 8 figure in an area of approximately meters. The illumination conditions were of bright natural sunlight (typical of sunny summer days in Telluride, Colorado). The robot was partially covered with a sheet of paper containing bars and crosses (see Fig. 5.14(a)). The Kanade-Lucas-Tomasi tracking algorithm selects different corners of the crosses as the robot changes its orientation. Fig. 5.14(b) shows the output of the tracking algorithm for a sequence of 200 images, sampled at intervals of approximately 1s, in which the robot makes two full loops around the 8 figure. As in Fig. 5.13(b), white squares are more dense in the steeper parts of the curve because the robot slows down at those points. The robot is able to follow the line reliably in both directions, always passing the intersection of the 8 figure, for a wide selection of (maximum) speeds. At high speeds the robot occasionally looses the line (in the steep parts of the curve), comes to a stop, backs up and starts following the line again until it reaches the shallow parts of the curve where it speeds up again to the maximum speed. 5.5 Extensions of 1-D tracking sensors As the visual processing circuits operate in a fully parallel way, and the hysteretic WTA circuit relies on a global competition mechanism that requires one single node for the whole array, tracking architectures of the type described above can easily be extended to two dimensions [13, 33, 57]. 43

(b) Positions of Koala following a line, sampled at intervals of 0.25 seconds for a period of 37.

54 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS (a) (b) Figure 5.13: (a) Koala robot with neuromorphic sensor mounted on its front. (b) Positions of Koala following a line, sampled at intervals of 0.25 seconds for a period of 37.5 seconds, in which the robot completed 4 loops. The features (white squares) were obtained by tracking a dark cross drawn on the white top of Koala. 44

55 5.5. EXTENSIONS OF 1-D TRACKING SENSORS (a) (b) Figure 5.14: (a) Koala robot with neuromorphic sensor mounted on its front and a white sheet of paper with crosses attached on its top, seen from above. (b) Positions of Koala following a white line on a light-blue carpet floor, sampled at intervals of one second over a period of approximately 3 minutes. The features (white squares) were obtained by tracking the bars appearing on the top part of Koala (see text for explanation). 45

56 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS X ENCODER P OUT THR X P2V X OUT Y OUT Y DECODER Y SCANNER Y P2V Y ENCODER X SCANNER S OUT X DECODER Figure 5.15: Two-dimensional tracker chip architecture. 5.6 A 2-D tracking sensor The 2-D tracking sensor we present here is an extension of the 1-D devices described in the previous sections. It comprises a core array of pixels arranged on a hexagonal grid, and peripheral analog and digital input/output (I/O) circuits (see Fig. 5.15). Each pixel contains a photosensing stage, a hysteretic WTA circuit, and interfacing I/O circuits. The photosensing stage used in this sensor differs slightly from the one used in the 1D sensors, in that the adaptive photoreceptor circuits respond to contrast transients (rather then to absolute contrast). At the output stage, the chip comprises digital output circuits, next to the analog P2V circuits, to encode the position of the winner. The chip also has on-chip scanners and address decoders to report the DC response of the adaptive photoreceptor array serially (e.g. for displaying images on monitors) or in a random-access mode (e.g. for reading out sub-regions of the image). The input address decoders can be directly connected to the chip s digital outputs (encoding the position of the winning pixel) for selectively reading the photoreceptor output of just that pixel and displaying only the part of the image that is of interest. Regions of interest can be selectively accessed by addressing small windows around the winning pixel s address The differentiating adaptive photoreceptor The photoreceptor circuit with its readout circuitry is shown in Fig The photoreceptor consists of a photodiode D in series with a transistor M fb in source-follower configuration and a negative feedback loop from the source to the gate of M fb [23]. The feedback loop consists of a high-gain inverting amplifier in common-source configuration (M n, M p ) [23] and a thresholding and rectifying temporal differentiator stage (M on, M off, C) [69]. A sufficiently large positive irradiance change activates a transient current I on onto capacitor C, that is converted into a voltage V dt by the diode-connected transistor M dt. The photoreceptor voltage V out can be read out by the address decoder as V prd, if the address decoder select lines V dx and V dy are high. The voltage V out can also be read out by the on-chip scanner circuit, via V prs, to display the sensor output on monitors. The photosensing sub-circuit, developed by Kramer, has been analyzed in detail presented in [67]. The voltage V dt is used to provide input to the locally connected WTA cell The 2-D hysteretic winner-take-all circuit The basic cell of the 2D hysteretic WTA network shown in Fig It is the 2D extension of the circuit described in Section 4.2. The output current I on of the photoreceptor stage of Fig is mirrored by M in into node V ex. If the input current to the considered pixel is the strongest, the cell wins and transistors M cx and M cy source an output current proportional to the circuit s bias current, set by V wtab, bringing the output voltages V cx and V cy high. Voltages V cx of all 46

57 5.6. A 2-D TRACKING SENSOR Vdd V dt Vdd C Vdd M dt M on I out Vdd M p V prb V prd V prs M fb V out M dp M sp M off V dx M dx V dy M dy D M n M snp V scan Figure 5.16: Differentiating adaptive photoreceptor circuit. Vdd V gain Vdd Vdd Vdd V dt M wm M cx M cy M in M wfb V cx V cy V h V ex_t M ht V h V ex M wo V ex_r M hr V net M wi C s V h M hb V ex_b M netsf V l V inh_t M lt V l V inh V inh_r V scan M snw M net M wb V wtab V l M lb M lr V inh_b Figure 5.17: Hysteretic WTA circuit with spatial coupling. pixels belonging to common columns are tied together, and voltages V cy of all pixels belonging to a common row are tied together. A copy of the WTA bias current, attenuated exponentially by the bias voltage V gain is fed back into the input node, via M wfb. Transistors M ht, M hb, and M hr diffuse the currents coming from M in and M wfb to the V ex nodes of the three (top, bottom, and right) neighboring cells. The bias voltage V h is used to tune the diffusion space constant and to control the amount of lateral excitatory coupling. Conversely, transistors M lt, M lb, and M lr implement the inhibitory coupling among neighboring cells. The bias voltage V l is used to control the spatial extent of lateral inhibition. If V l is set to V dd, inhibition 47

58 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS V c /V sel V sel V sel V P2V V ENC Figure 5.18: Two-input pass-transistor demultiplexer. The voltage on V c is routed either to V P2V (if V sel is high) or to V ENC (if V sel is low). is global, and only one pixel in the whole array can win. The current flowing through M net represents the net current that the WTA cell is receiving, corresponding to sum of the input current from the photoreceptor circuit, the positive-feedback current and the diffused excitatory currents. The voltage V net, logarithmically proportional to this net current, can be scanned out to image the overall network activity and view the relative effects of positive feedback current modulation (V gain ), and excitatory and inhibitory coupling modulations (V h and V l respectively) Peripheral I/O circuits This device has analog position-to-voltage (P2V) circuits, and digital position encoding circuits for reading out the output of the WTA network; Furthermore there is an on-chip scanner circuit [83], for displaying on monitors the outputs of all photoreceptors, and/or the state of the WTA network activity (see V net described above); and there are input address decoders for accessing the analog output voltage of individual photoreceptors. WTA output: The voltages V cx and V cy of Fig are routed to the periphery of the architecture core, and fed into a two-input pass-transistor demultiplexer (see Fig. 5.18). Depending on the value of V sel (see figure), V cx and V cy are routed either to the analog P2V circuits, or to the position (address) encoders. In this way only one of the two (analog or digital) modes can be used at one time, but wiring and possible sources of cross-talk noise are minimized. Scanner circuits: The scanner reads the output voltages V prs and V net of the array in the sequence used for standard electronic cameras. Each output voltage of each pixel is buffered via a source follower consisting of an input transistor (M sp for V prs and M netsf for V net ) and a current source that is common to each column and signal. A vertical shift register sequentially addresses the rows with the binary voltage signal V scan via switching transistors (M snp for V prs and M snw for V net ), such that each source follower is driven by the signal of a single pixel at a time. The output voltages of the column source followers are transferred to a common output line for each signal via complementary pass transistors that are sequentially opened, column by column, by a horizontal shift register. The clocks of the two shift registers are synchronized, such that the output voltages of the entire array are sequentially read out, row by row. The voltages on the common lines are buffered to be sensed off chip. Address decoders: When properly driven, the chip s input address decoders activate the select lines V dx and V dy of the addressed pixel (see Fig. 5.16) and route the voltage V prd of that pixel to a unity gain follower of an analog output pad Experimental results In Fig we show experimental results obtained by enabling the analog P2V circuits (by setting V sel of Fig high) and measuring their output voltages V x and V y encoding the x and y position of the winning pixel. The WTA network was biased in a way to have local excitation (V h of Fig was set to 0.8V) and global inhibition (V l was set to V dd ). The measurement shows the sensor s response to a target appearing in the upper right corner of the sensor s field of view and quickly moving downward and to the right. Before the target appeared, the sensor s output was sitting around V x 0V and V y 0V. This is because the bottom-left pixel (0,0) receives an additional input current, set by an external bias voltage V thr, that sets a global threshold: if no visual stimulus is strong enough to overcome this threshold, the output is always zero. As soon as the target appeared in the sensor s field of view, the WTA network switched winner, and the P2V circuits modified V x and Vy accordingly. The response time of the WTA and P2V circuits combined, at the onset of the stimulation, is about 200µs. The switching time, required to report a change of winner from one pixel to its nearest neighbor, is around 15µs. 48

59 5.6. A 2-D TRACKING SENSOR V y V x, V y (V) 6 4 V x Time (ms) Figure 5.19: Output of the analog P2V circuits in response to a target moving from the right top corner to the bottom central part of the sensor s field of view. The bottom trace (V x ) reports the x position of the target. The top trace (V y ), offset in the plot by 5V for sake of clarity, reports the y position of the target. The inset shows V y versus V x V y V x, V y (V) 6 4 V x Time (s) Figure 5.20: Output of the analog P2V circuits in response to a target moving from the bottom left corner to the top right one, on to the top left, to the bottom right, and back to the bottom left corner. In Fig we show the response of the sensor to a target appearing in the bottom left corner of the field of view, slowly moving to the top right corner and then completing a figure-eight pattern. Note the different time scales in Figs and In both experiments the target was the light spot of a laser-pointer shone on a flat surface 30cm from the chip s focal plane. Images were focused onto the focal plane using an 8mm lens with an f number of 1.2. The sensor s response does not depend on the background onto which the target is overlaid, nor does it change with absolute background illumination. By switching the state of the demultiplexer connected to the WTA outputs we disabled the analog P2V circuits and enabled the asynchronous address encoders. Figure 5.21 shows the the response of two address lines (the least significant and second-least significant bits of the X address) in response to the same stimulus of Fig moving from right to left. The non-uniform pulse widths are due to the asynchronous response of the circuit to the variable speed of the stimulus. In a second experiment, we placed the sensor in front of a CRT monitor, showed a white box performing a circular motion on a black background, and sampled the chip s address encoder outputs every 25ms over a period of 40s. In this period the 49

60 CHAPTER 5. NEUROMORPHIC VISION SENSORS AS SINGLE CHIP SELECTIVE ATTENTION SYSTEMS Voltage (V) Time (s) Figure 5.21: Output of least significant bit (bottom trace) and second-least significant bit (top trace, displaced by 6V) of the X address in response to a target moving from right to left Y address X address Figure 5.22: Histogram of the addresses measured from the sensor s address encoders in response to a target moving on a circular trajectory. target made 16 full revolutions. The histogram of the sampled addresses is shown in Fig As the global threshold was set relatively high, address (1,1) was selected most often (193 samples, off-scale in the figure). The response time of the sensor to the sudden appearance of a target is 1.2µs when the digital outputs are enabled, and can be as long as 6µs when the analog outputs are enabled. Power consumption is also dependent on the output mode selected (see Table 5.1). In this device images are sensed and processed fully in parallel. The pixel reporting the strongest positive illuminance transient (e.g. induced by a high-contrast moving target) is selected by the WTA network. Its position can be read out using 50

61 5.6. A 2-D TRACKING SENSOR Fabrication technology 0.8 µm CMOS 2P 2M Resolution Fill factor 1.2% Pixel size 84.8µm 62.6µm Die size 3.22 mm 2.56 mm Power supply voltage single 5 V Power consumption scanned output 18.6mW digital output (scanners off) 1.1mW analog output (scanners off) 600µW Table 5.1: Characteristics of the visual tracking sensor. either analog P2V circuits or digital address encoders. The sustained response of each photoreceptor and net input current to each WTA can be read out serially, using on-chip scanners, and displayed on monitors. Additionally, photoreceptor voltages can be individually sensed, using input address decoders. The WTA analog outputs can be used to drive motors and actuators, for example on small autonomous robots. The WTA digital outputs can be used to drive the input address decoders and read the photoreceptor output of only the winning pixel. This mechanism could be exploited (e.g. using a microcontroller) to selectively read out just the regions of the image around the position of the target, rather than reading out all the raw image data. 51

62 Part IV Multi-chip Attention Systems 52

63 Chapter 6 Multi-chip models of selective attention systems The single chip neuromorphic systems of the type describe in the previous Sections have great advantages, such as size, fabrication cost and low power consumption, and extraordinary computational capabilities. However, to design systems with greater computational power and higher flexibility one needs to resort to multi-chip systems. Neuromorphic multichip systems generally consist of systems containing one or more sensory devices, such as silicon retinas, silicon cochleas or vision sensors, interfaced to one or more chips containing networks of spiking neuron circuits. These chips can process the sensory signals (e.g. detecting salient regions of the sensory space [52], learning correlations [16], etc.) and eventually transmit the processed signals to actuators, thus implementing complete neuromorphic sensory-motor systems. Specifically, using multi-chip systems it is possible to implement more elaborate models of selective attention, of the type described is Section (see also Fig. 1.1). 6.1 The Address-Event Representation Consistent with the neuromorphic engineering approach, the strategy used by neuromorphic devices to communicate analog signals across chip boundaries is inspired from the nervous system. Analog signals are converted into streams of stereotyped non-clocked digital pulses (spikes) and encoded using pulse-frequency modulation (spike rates). These digital pulses are transmitted using an asynchronous communication protocol based on the Address-Event Representation (AER) [9, 21, 74] The Address-Event I/O Interface In AER, each analog element on a sending device is assigned an address. When a spiking element generates a pulse its address is encoded and instantaneously put on a digital bus, using asynchronous logic (see Fig. 6.1). In this asynchronous representation time represents itself, and analog signals are encoded by the inter-spike intervals between the addresses of their sending nodes. Address-events are the digital pulses written on the bus. By converting analog signals into a digital representation, we can take advantage of the considerable understanding and development of high-speed digital communications, emulating the parallel, but slow, connectivity of neurons using axons with fast, but serial, connectivity through digital busses. We basically trade-off space (the number of pins and wires that would be required to transmit spikes from each individual neuron on a chip) with time, exploiting the fact that our neuromorphic circuits have typical time constants of the order of milliseconds and digital busses have bandwidths of the order of MHz. To manage collisions (cases in which two or more neurons attempt to access the AER bus simultaneously) we use on-chip digital, asynchronous arbitration circuits. As the channel only sends the addresses of active units, the system s bandwidth is devoted to those units that are spiking. Redundancy reduction in the signal (e.g. spatial and temporal adaptation) before the channel can dramatically reduce the bandwidth needed for a given population of cells. An important consequence of using a digital chip-interconnect scheme is the relative ease with which these chips are able to interface to existing digital hardware. From the simulation of input spike trains to quickly re-configuring a network s connectivity via address routers, the flexibility of software can be used to produce a more powerful modeling tool. From the engineering perspective, the translation of our analog signals into a stream of asynchronous spikes not only facilitates communication, it opens up new possibilities for the efficient implementation of both computation and memory in the spike domain. In the case of single-sender/single-receiver communication, a simple handshaking mechanism ensures that all events generated at the sender side arrive at the receiver. The address of the sending element is conveyed as a parallel word of sufficient length, while the handshaking control signals require only two lines. Systems containing more than two AER chips (e.g. 53

64 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Encode Decode Address Event Bus Inputs Source Chip Address-Event representation of action potential Outputs Destination Chip Action Potential Figure 6.1: Schematic diagram of an AER chip to chip communication example. As soon as a sending node on the source chip generates an event its address is written on the Address-Event Bus. The destination chip decodes the address-events as they arrive and routes them to the corresponding receiving nodes. with AER sensors at the input stages, AER networks on neurons for doing the computation and AER read-out modules to drive possible actuators) are constructed by implementing special purpose off-chip arbitration schemes [20, 21] Address-Event Neuromorphic Sensors The two most successful types of neuromorphic sensors developed in previous years are silicon cochleas [34, 104] and silicon retinas [10, 67, 79]. The former implement detailed models of the human cochlea, producing outputs that could be useful for artificial speech recognizers, or for hearing aids. The silicon retinas on the other hand implement models of the retina s early processing stages and typically produce images that represent local changes in contrast (see Fig. 6.2 for an example of a silicon retina image). Until recently these sensory devices transmitted their information off-chip using conventional techniques, such as multiplexers or scanners. With the advent of the Address-Event Representation we now have also AER silicon retinas and cochleas that produce streams of address-events representing the activity of each individual pixel. With these AER sensors the bandwidth used for signal transmission is allocated optimally only for those pixels that are active (as opposed for example to scanning techniques, that allocate the same bandwidth for all the pixels, independent of their activity). The address-events (spikes) generated by these sensors can then be processed by synapses and networks of spiking neurons implemented on one or more receiving AER chips. 6.2 A 1-D AER selective attention chip Here we present a 1-D AER chip that contains circuits useful for emulating in real time saliency-based selective attention systems of the type described in Section Several VLSI systems for implementing visual selective attention mechanisms have been presented in the past [13, 45, 89, 121]. These systems (as the ones described in Chapter 5) contain photo-sensing elements and processing elements on the same focal plane, and typically apply the competitive selection process to visual stimuli sensed and processed by the focal plane processor itself. Unlike these systems, the device proposed here is able to receive input signals from any type of AER device. Therefore input signals need not arrive only from visual sensors, but could represent a wide variety of sensory stimuli obtained from different sources. The selective attention chip proposed is also one of the first of its kind able not 54

6.2. A 1-D AER SELECTIVE ATTENTION CHIP Figure 6.2: Image captured from a 168 132 silicon designed by Jörg Kramer, (at the Institute of Neuroinformatics, Zurich), while the subject was moving.

65 6.2. A 1-D AER SELECTIVE ATTENTION CHIP Figure 6.2: Image captured from a silicon designed by Jörg Kramer, (at the Institute of Neuroinformatics, Zurich), while the subject was moving. only to receive AER signals, but also to transmit the result of its computation using the Address-Event Representation. With both input and output AER interfacing circuits the chip can be thought of as a VLSI cortical module able to receive and transmit spike trains. In general, decoupling the sensing stage from the processing stage and using the Address-Event Representation to transmit and receive signals has several advantages: a multi-chip AER attention system could use multiple sensors to construct a saliency map; visual input sensors could be relatively high-resolution silicon retinas and would not have the small fill factors that single-chip 2D attention systems are troubled with; top-down modulating signals could be fused with the bottom-up generated saliency map to bias the selection process; multiple instances of the same selective attention chip could be used to construct hierarchical selective attention architectures; and sensors could be distributed across different peripheral regions of the neuromorphic system, as is the case for real biological systems System Overview The 1-D selective attention chip contains a one-dimensional architecture of 32 locally coupled elements that compete globally for saliency. Global competition is achieved using a hysteretic WTA network. Each element comprises, next to the hysteretic WTA cell, synaptic circuits and integrate and fire neurons. The synapses receive off-chip address-events and integrate them into analog current signals that are sourced into the WTA network. The integrate and fire neurons are used to transmit address-events off chip, and to implement the dynamics of the selective attention model. It has been argued that neural circuits with these types of connectivity patterns are valuable models of cortical processing and can account for many response properties of cortical neurons [39, 103]. The analog circuits of the WTA network implement a simplified abstract model of these types of neural networks in which each element of the WTA network can be regarded as a local population of excitatory neurons interconnected among each other with lateral nearest-neighbor connections. From this point of view the architecture of the selective attention chip is equivalent to the neural network diagram depicted in Fig The input excitatory synapses shown in the bottom part of the figure receive spike trains from external devices and provide an excitatory current to the local populations of neurons. These populations compete among each other by means of recurrent interactions with a global inhibitory cell (not shown in the figure) and reach a steady state in which typically all populations except the one receiving the strongest net excitation are silent. Each local population projects to one of the output inhibitory neurons shown in the top row of Fig For typical operating conditions, only the inhibitory neuron connected to the winning population of cells will be active at any given time. The output neuron projects its spikes both to AER interfacing circuits, for transmitting the result of the computation to further processing stages, and to local on-chip inhibitory synapses (equivalent to those shown in the lower part of Fig. 6.3). The resulting inhibitory current is subtracted from its corresponding input excitatory current. This negative feedback loop implements the so called inhibition of return (IOR) mechanism [36, 110]: after selecting a salient stimulus, the WTA network stimulates the output neuron connected to the winning population of cells. The spikes that the output neuron generates are integrated by the corresponding inhibitory synapse. As the inhibitory current increases in amplitude, the effect of the input excitatory current is diminished and 55

66 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Output spike train Input spike trains Figure 6.3: Biologically equivalent architecture of selective attention model. Input spike trains arrive from the bottom onto excitatory synapses. The populations of cells in the middle part of the figure are modeled by a hysteretic WTA network with local lateral connectivity. Inhibitory neurons, in the top part of the figure, locally inhibit the populations of excitatory cells by projecting their activity to the inhibitory synapses in the bottom part of the figure. eventually the WTA network switches stable state, selecting a different cell as the winner. Note how the integrate and fire neurons, necessary for the Address-Event I/O interface, allowed us to implement the IOR mechanism by simply including an additional inhibitory synaptic circuit. This solution is quite elegant and compact in comparison with previously proposed alternatives [87]. Depending on the dynamics of the IOR mechanism, the WTA network will continuously switch the selection of the winner between the strongest input and the second-strongest, or between the strongest and more inputs of successively decreasing strength, thus generating focus of attention scan-paths, analogous to eye movement scan-paths [122]. The dynamics of the IOR mechanism depend on the time constants of the excitatory and inhibitory synapses, on their relative synaptic strengths, on the input stimuli and on the frequency of the output inhibitory neuron The Excitatory and Inhibitory Synapses The input pulses being received by the chip, reach, at each cell location, a current-mirror integrator [9] which models an excitatory synapse. This circuit, shown in Fig. 6.4(a), uses only 4 transistors and one capacitor. The input pulse is applied to transistor M1, which acts as a digital switch. Transistor M2 is biased by the analog voltage V w to set the weight of the synaptic strength. Similarly, the voltage V e on the source of transistor M3 can be used to set the time constant of the synapse. With each input pulse, a fixed amount of charge is stored on the capacitor and the amplitude of the output current I ex is increased. If no input is applied (i.e. no current is allowed to flow through M3), the output current I ex decays with a 1 t profile. Similarly, the inhibitory synapse integrates the spikes generated by the output neurons. As with the excitatory synapse, we implemented the inhibitory synapse using a current-mirror integrator circuit (see Fig. 6.4(b)). The principle of operation of this circuit is very similar to the one described for the excitatory synapse, with the difference that the output current of the circuit is of opposite polarity. Every time the local output neuron projecting to this synapse generates a spike, its output voltage V out rises to the positive power supply rail (see Section 6.2.4) allowing the transistor M1 to charge the capacitor connected to it. The amount of charge passed can be controlled by V q (which thus determines the strength of the synaptic weight). As for the circuit of Fig. 6.4(a), the voltage at the source node of the diode connected transistor (V i on the source of M2 in Fig. 6.4(b)) controls the time constant and the gain of the synapse. Transistor M4 is a decoupling (cascode) element, used to decrease second order effects. Specifically, it is used to minimize the effect of the Miller capacitance of transistor M3. The voltage V ca on the capacitor is a measure of the spiking history of the neuron projecting to the inhibitory synapse. It determines (with an exponential relationship) the amplitude of the output current I inh. Inhibitory currents I inh are subtracted from excitatory currents I ex coming from the input synapses to provide the net input current I in to the WTA network (see Fig. 6.6). We characterized the excitatory synapse of Fig. 6.4(a) by applying single pulses (see Fig. 6.5(a,b)) and by applying sequences of pulses (spikes) at constant rates (see Fig. 6.5(c,d)). Figure 6.5(a) shows the response of the excitatory synapse to a single spike for different values of V w. Similarly, Fig. 6.5(b) shows the response of the excitatory synapse for different values of V e. Changes in V e modify both the gain and the time constant of the synapse. To better visualize the effects of V e on the time evolution of the circuit s response, we normalized the different traces, neglecting the circuit s gain variations. Figure 6.5(c) shows the response of the excitatory synapse to a constant 50Hz spike train for different synaptic strength values. As shown, the circuit integrates the spikes up to a point in which the output current 56

67 6.2. A 1-D AER SELECTIVE ATTENTION CHIP M3 Ve Vdd M4 Iinh M1 Vout Vq Vw Vs M2 M1 Iex Vd M3 M4 Vca M2 Vi (a) (b) Figure 6.4: (a) Excitatory synapse circuit. Input spikes are applied to M1, and transistor M4 outputs the integrated excitatory current I ex. (b) Inhibitory synapse circuit. Spikes from the local output neurons are integrated into an inhibitory current I inh. (a) (b) Excitatory Synaptic Current (na) Time (s) (c) Excitatory Synaptic Current (na) Time (s) (d) Figure 6.5: (a) Response of an excitatory synapse to single spikes, for different values of the synaptic strength V w (with V e = 4.60V). (b) Normalized response to single spikes for different time constant settings V e (with V w = 1.150V). (c) Response of an excitatory synapse to a 50Hz spike train for increasing values of V w (0.6V, 0.625V, 0.65V and 0.7V from bottom to top trace respectively). (d) Response of excitatory synapse to spike trains of increasing rate for V w = 0.65V and V e = 4.6V (12Hz, 25Hz, 50Hz and 100Hz from bottom to top trace respectively). 57

68 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Vdd Vdd Vinj Vdd Vdd Vinj Vdd Vdd Vinj Iinj i-1 Iinj i Vex Vex Vex Iinj i+1 Iin i-1 Iin i Iin i+1 Inet i-1 Inet i Vinh Vinh Vinh Inet i+1 Vb Vb Vb Figure 6.6: Schematic diagram of the WTA network. Examples of three neighboring cells connected together. reaches a mean steady-state analog value, the amplitude of which depends on the frequency of the input spike train, on the synaptic strength value V w and on V e. Figure 6.5(d) shows the response of the circuit to spike train sequences of four different rates for a fixed synaptic strength value The Hysteretic Winner-Take-All Network The basic cell of the hysteretic WTA network is based on the circuits described in Section 4.2. Figure 6.6 shows the circuit schematic of three WTA cells connected together. Input is applied to each node of the network through the currents I in, corresponding to the sum of excitatory synaptic current I ex (see Fig. 6.4(a)), with te inhibitory synaptic current I inh (see Fig. 6.4(b)). If (and only if) the cell considered is the winning one, the p-type current-mirror in the top part of the circuit produces, at the same time, the output current I in j and a hysteretic current, summed through a positive feedback loop (indicated by a dashed arrow) back into the input node. The hysteretic current is a copy of the WTA bias current (set by V b on the bias transistors in the lower part of the circuit). The amplitude of the output current I in j is independent of the input current I in and can be modulated by the control voltage V in j at the source of the output transistor. The n-type current mirrors in the lower half of the figure are used both to produce an output current I net and to enhance the response of the WTA network (by producing source degeneration of the input transistor). The source degeneration technique consists of converting the current flowing through the input transistor into a voltage, by dropping it across a diode, and feeding this voltage back to the source of the input transistor, to increase its gate voltage. At the network level, source degeneration of the input transistor has the effect of increasing the circuit s winner selectivity. As the current I net represents the sum of all of the currents converging into the same cell (namely, the input current I in, the current being spread to or from the left and right nearest neighbors and the hysteretic current coming from the top p-type current mirror), it is a useful measure for visualizing the state of the WTA network. Each WTA cell is connected to its immediate neighbors through pass transistors controlled by V ex (in the upper half of the figure). and by V inh (in the lower part of the figure). In the two extreme cases they either completely decouple the network allowing each individual cell to be a winner (V inh = 0V ), or they globally connect all the cells, forcing the network to chose only one winner (V inh = V dd ). In intermediate cases, modulation of V inh determines the spatial extent of the local regions over which competition takes place, thus allowing the network to select multiple winners. In Fig. 6.7 we show examples of scanned I net measurements, having applied constant input currents to the WTA nodes. We generated a spatial input stimulus such that pixel 21 received an input current of approximately 350nA, pixels 9 through 13 received input currents of approximately 300nA and the remaining pixels received currents ranging from 150nA to 250nA. Small variations across neighboring pixels are mainly due to device mismatch effects. Figure 6.7(a) shows the state of the network in the case at which no lateral excitatory coupling is present (V ex = 0V ). The network selects pixel 21 as the winner and supplies the hysteretic current to it (which has an amplitude of approximately 300nA, with the current bias settings). Figure 6.7(b) shows the status of the network in the case in which lateral excitatory coupling is applied (V ex = 1.5V ). Lateral coupling effectively smooths spatially the input currents, thus decreasing the net input current to pixel 21. In this condition the WTA network selects pixel 9 as the winner (and sums the hysteretic current to it). This example points out the two main characteristic features of the spatially coupled hysteretic WTA network: spatial smoothing and positive feedback. Spatial smoothing is implemented by modulating the gate voltage V ex of the excitatory pass-transistors. It can 58

69 6.2. A 1-D AER SELECTIVE ATTENTION CHIP 700 WTA net input current (na) Pixel Position 700 (a) WTA net input current (na) Pixel Position (b) Figure 6.7: Net WTA input current I net values at each pixel location for a static control input. Pixels 5 through 13 have input currents slightly lower than pixel 21. All other pixels receive weaker input stimuli. (a) In the absence of lateral coupling (V ex = 0V ) the network selects pixel 21 as the winner. (b) In the presence of lateral coupling (V ex = 1.5V ) the network smooths spatially the input distribution and selects pixel 9 as the winner. be used to reduce the effect of noise and offsets in the input transistors, and to favor the selection of spatial regions with high average activity (as opposed to strongly activated isolated pixels). By adding a copy of the WTA bias current to the winning cell s input node, the positive feedback loop effectively produces a hysteretic behavior. Hysteresis is used to enforce the selection of the winner and is induced by summing a constant hysteretic current to the input of the winning pixel, through the positive feedback loop implemented with the p-type current mirrors, in the top part of Fig The Output Inhibitory Integrate-and-Fire Neuron The output inhibitory neurons implemented on this chip are circuits of the type shown in Fig They are non-leaky integrate-and-fire neurons based on circuits proposed by Mead [81] and by van Schaik [116]. Input is applied to this circuit by injecting a constant DC current I in j, sourced from the WTA network (see Fig. 6.6), into 59

70 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS C fb Vdd Vdd V pb V mem I inj C m V thr + V out V pw V rfr Vdd V pu V ior A x R x R y A y C r Figure 6.8: Circuit diagram of the local inhibitory integrate-and-fire neuron. the membrane capacitance C m. A comparator circuit compares the membrane voltage V mem (which increases linearly with time if the injection current is applied) with a fixed threshold voltage V thr. As long as V mem is below V thr, the output of the comparator is low and the neuron s output voltage V out sits at 0V. As V mem increases above threshold though, the comparator output voltage rises to the positive power supply rail and, via the two inverters, also brings V out to the rail. A positive feedback loop, implemented with the capacitive divider C f b C m, ensures that as soon as the membrane voltage C V mem reaches V thr, it is increased by an amount proportional to V f b dd C m+c [81]. In this way we avoid the problems that f b could arise with small fluctuations of V mem around V thr. When V out is high, the reset transistor at the bottom-left of Fig. 6.8 is switched on and the capacitor C m is discharged at a rate controlled by V pw, which effectively sets the output pulse width (the width of the spike). The membrane voltage thus decreases linearly with time and as soon as it falls below V thr the comparator brings its output voltage to zero. As a consequence the first inverter sets its output high and switches on the n-type transistor of the second inverter, allowing the capacitor C r to be discharged at a rate controlled by V r f r. This bias voltage controls the length of the neuron s refractory period: the current flowing into the node V mem is discharged to ground and the membrane voltage does not increase, for as long as the voltage on C r (V out ) is high enough. Figs. 6.9(a) and (b) shows traces of V mem for different amplitudes of the input injection current I in j and for different settings of the refractory period control voltage V r f r. The threshold voltage V thr was set at 2V and the bias voltage V pw was set at 0.5V, such that the width of a spike was approximately 1ms. Figs. 6.9(c) and (d) show how the firing rate of the neuron depends on the injection current amplitude. These plots are typically referred to as FI-curves. We can control the saturation properties of the FI-curves by changing the length of the neuron s refractory period. The error bars show how reliable the neuron is, when stimulated with the same injection current. We changed the injection current amplitude by modulating the control voltage V in j (see Fig. 6.6). As the injection current changes exponentially with the control voltage V in j, the firing rate of the neuron follows the same relationship. To verify that the firing rate is linear with the injection current we can view the same data using a log-scale on the ordinate axis (Fig. 6.9(d)). The 1-D AER selective attention chip can receive signals analogous to spike trains at its input interface, that can represent sensory information, in their temporal structure. In the first instance, to stimulate and test the selective attention model with well controlled input signals, we interfaced the chip to a workstation. In a more general scenario, the chip can be interfaced to analog VLSI neuromorphic sensors that use the same AER interfacing circuitry to construct more elaborate multi-chip systems [56]. 60

71 6.2. A 1-D AER SELECTIVE ATTENTION CHIP (a) (b) V inj (c) V inj (d) Figure 6.9: Integrate-and-fire neuron characteristics. (a) Membrane voltage for two different DC injection current values (set by the control voltage V in j ). (b) Membrane voltage for two different refractory period settings. (c) Firing rates of the neuron as a function of currentinjection control voltage V in j plotted on a linear scale. (d) Firing rates of the neuron as a function of V in j plotted on a log scale (the injection current increases exponentially with V in j ) Testing the 1-D selective attention chip To provide inputs to all the synapses of the chip we developed a program on a workstation that continuously addressed all the pixels in a serial fashion, exciting them at the times specified by a look-up table. The rate used to sequentially address the pixels was fast compared to the typical firing rates of input signals (chosen in a range between 10Hz and 80Hz). The input synapses, which have time constants on the order of milliseconds, thus appeared to be receiving spikes in parallel. The I/O card, in conjunction with the software we used, was able to cycle through all 32 pixels of the network at a rate of 500Hz. Control inputs In our control experiment we stimulated input synapses 1 through 9, 11 through 19 and 21 through 32, with spike trains at a constant rate of 10Hz. Synapses at pixels 10 and 22 were stimulated with spike trains at rates of 50Hz and 80Hz respectively. In Fig we show oscilloscope traces representing the scanned net input currents to the WTA network I net of all 32 pixels, in the top traces, and the inhibitory currents I inh (see Section 6.2.2) of all 32 inhibitory synapses, in the bottom traces. To illustrate the circuit s dynamics, we increased the persistence of the oscilloscope s display. Figure 6.10(a) shows the response of the system to the onset of the stimulation. As the input spike trains start to arrive, and the excitatory synapses integrate them, the net input current of each pixel increases from zero (corresponding approximately to the level at the central axis of the display) to a mean maximum steady state value (see also Fig. 6.5). The net input currents at pixels 10 and 22 increase more rapidly, compared to all other pixels, in accordance with the rate of the spike trains arriving at their input synapses (top trace). At the onset of the stimulation, all output inhibitory neurons are silent and the inhibitory synapses receiving spikes from the output neurons do not generate any increase in the amplitude of the inhibitory synaptic currents I inh (bottom trace). Ideally all synapses receiving the same inputs should integrate the spike trains to the same mean steady-state value. As shown in the figure, this is not the case; the traces in Fig show the amount of variability present in the synaptic currents due to device mismatches created in the chip s fabrication process. The offsets introduced by device mismatches are different from chip to chip, but do not change over time. Figure 6.10(b) 61

72 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS (a) (b) Figure 6.10: Scanned net input currents to the WTA network I net (top traces) and inhibitory currents I inh (bottom traces) measured, by means of an off-chip current sense-amplifier, at every pixel location. (a) Response of the system to the onset of the stimulation, with a display persistence setting of 3s (b) Response of the system after a few seconds of stimulation, with a display persistence setting of 250ms shows the response of the system after a few seconds of stimulation. As expected, all excitatory synapses reached a mean steady-state value, but the network keeps on switching from selecting pixel 10 as the winner, to selecting pixel 22 as the winner and back again. Specifically, Fig. 6.10(b) shows the situation in which pixel 22 has just been de-selected and pixel 10 selected. At the pixel position 22 the inhibitory synaptic current is in the process of decreasing back to zero (bottom trace), while at the tenth pixel position the WTA hysteretic current has just been added to the net input (top trace), the output neuron has been activated and the current of the inhibitory synapse is increasing with every output neuron s spike (bottom trace). In a second experiment we measured the membrane potential of single neurons, rather than using time-multiplexing to 62

73 6.2. A 1-D AER SELECTIVE ATTENTION CHIP scan all of the pixels outputs. The input stimulus was the same one used in the first experiment: all pixels were excited with 10Hz spike trains except for pixels 10 and 22 which were receiving spikes at rates of 50Hz and 80Hz respectively. Given that our system behaves in real-time, we were able to make multiple recordings without having to wait for the long simulation/computation times, which are typical of software algorithms modeling networks of spiking neurons. To measure statistical properties of the system and to observe the variability of the system s response to the same stimulus, we repeated the experiment 100 times. In the first 50 trials we measured the membrane voltage of output neuron 10 and in the remaining 50 trials we measured the membrane voltage of output neuron 22. The input stimulus was applied for 3 seconds per trial and there was a delay of 30 seconds between each trial to allow the system to return to its initial resting state. Figure 6.11 shows raster plots describing the responses of these two neurons to the two consecutive sessions of 50 trials each (Fig. 6.11(a) represents the activity of neuron 10 and Fig. 6.11(b) represents the activity of neuron 22). In the first 500ms to 800ms of stimulation, the input synapses have not reached their mean steady-state value (see also Fig. 6.5). As the excitatory synaptic currents reach their steady-state value, the WTA network selects either pixel 10 or pixel 22 as the winner, and excites the corresponding output neuron, continuously switching between the two. As mentioned previously, the offsets introduced in the circuits are constant over time and the single circuit elements of the system, such as the synapses and the neurons are highly reliable (see error bars of Fig. 6.9(c)). Yet the traces of Fig show a significant amount of inter-trial variability, for the same input stimulus and the same output neuron. Small variations in the input stimulus software, executed in a multiplexing environment, and thermal noise in the circuit elements are not sufficient to explain such a large amount of variability. This variability is therefore due to network effects. A possible (and probable) explanation for this phenomenon could lie in the recurrent nature of the competition mechanism that takes place in the WTA network. Computer simulations of completely deterministic models have already demonstrated that recurrent networks of reliable (software) spiking neurons can produce highly irregular firing patters [41, 101, 115, 117]. Carrying out a detailed analysis of the dynamics of the VLSI system would prove to be extremely difficult and would go beyond the scope of this paper. But this experiment is a real-time demonstration that also reliable VLSI spiking neurons, such as the integrate-and-fire neurons used here, can produce highly irregular spike trains if embedded in a WTA network containing recurrent excitatory and inhibitory pathways. Despite the highly irregular firing patterns produced by the chip s output neurons, the overall response of the system is consistent with its input: on average, as expected from the input stimulus distribution, the network selects pixel 22 more often than pixel 10. This can also be seen by the peri-stimulus time-histogram in Fig. 6.11(c). Figure 6.11(d) shows the inter-spike interval histograms for the two neurons. Both histograms show the same type of bimodal distribution. This distribution can be approximated by a superposition of two Gaussian, one centered around 7ms and the other around 12ms. The bimodal distribution arises from the fact that the output neuron is either being driven by a constant injection current I in j (in the case in which that pixel is the winner) or it is not receiving any input (in the case in which the WTA network selects a different pixel as the winner). If the pixel considered is the winner, the inter-spike interval (which is inversely proportional to the amplitude of the injection current I in j ) is constant and approximately equal to 7ms. This explains why the Gaussian centered around 7ms has a small standard deviation. On the other hand, if the neuron is not receiving any input, the inter-spike interval is not constant and depends on the time the network takes to switch from the other winning pixel back to the one considered. This time is not just a function of one parameter (as is the case for I in j ), but depends on many factors, ranging from the values of the bias voltages in the circuits, to the frequencies of the input spike trains, to the frequency of the output neuron. Therefore the Gaussian centered around 12ms has a larger standard deviation. The control experiments were useful to verify the expected behavior of the circuits, at the system level. To test the behavior of the system in the more general context of selective visual attention, we used data obtained from real-world images, processed on the workstation and sent to the chip using the same method described in Section Saliency-Map Inputs We processed static color images with an algorithm which generates a saliency map, i.e. a feature map which topographically codes for local conspicuousness over the entire scene [59]. Specifically, given the digitized input image, the algorithm computes a set of multi-scale feature maps, responding to orientation, color and intensity contrast and, after appropriately normalizing them, combines them in a bottom-up fashion. Figure 6.12(b) shows an example of a saliency map generated from the image shown in Fig. 6.12(a). The algorithm has no a priori knowledge of what is salient and what not, so the fact that the four neurons illustrated in the image appear to be salient (and the Neural Systems title appears to be absolutely not salient) is simply due to the color, spatial scale and contrast properties of those local regions in the image. As the VLSI architecture used in this work is one dimensional and the saliency map is a 2D data array, we had to choose an appropriate operator to map the 2D saliency map space onto a 1D input vector. We applied the max() operator to the pixels of each column of the saliency map, and mapped the location of the selected pixel to the 1D input vector. In case of multiple maxima with the same value, we selected the pixel associated with the first occurrence of the maximum, while scanning the column from the top part of the image to its bottom part. This type of mapping is injective: any value of the 1D input vector is associated to only one pixel of the saliency map. The top trace of Fig 6.12(c) shows the values of the 1D vector obtained by applying the mapping described above to the saliency map of Fig 6.12(b). Each component of the vector corresponds to the brightest pixel of the column in the saliency map with the matching 63

74 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Neuron #10 Neuron #22 PSTH Time (ms) (a) Time (ms) (b) Time (ms) ISI occurrence (c) Time (ms) (d) Figure 6.11: (a) Raster plots of neuron 10 in response to the control stimulus (see text for explanation). (b) Raster plots of neuron 22. (c) Peri-stimulus time histogram of neurons 10 (solid line) and of neuron 22 (dashed line). (d) Inter-spike interval distribution of neurons 10 (front bars) and 22 (rear bars). index, and determines the frequency of the input spike train (e.g. the excitatory synapse at pixel 1 receives approximately 19 spikes per second, the one at pixel 2 receives approximately 12 spikes per second, the one at pixel 9 approximately 85 spikes per second, etc.). We applied this stimulus to the chip, using the method described in Section 6.3.1, for a period of 3 seconds and recorded spike trains from the chip s output neurons. The histogram in the lower part of Fig. 6.12(c) represents the activity of these output neurons. As shown, on average the system attends pixels 9 and 10 most of the time, shifting its focus of attention to regions centered around pixels 19 and 28 quite frequently and to other regions of the image less frequently. To show the dynamical aspects of the focus of attention we plotted, in Fig. 6.12(d), the address of the pixel attended, over time. As the mapping performed from the 2D data of Fig. 6.12(b) to the 1D vector of Fig. 6.12(c) is injective, we can re-plot the 1D data of Fig. 6.12(b) onto the 2D saliency map. Figure 6.13 shows such a plot: the white dots superimposed on 64

6.2. A 1-D AER SELECTIVE ATTENTION CHIP 50 5 100 150 200 10 15 250 20 300 350 400 100 200 300 400 (a) 25 30 5 10 15 20 25 30 (b) Spike input/output distribution 80 70 60 50 40 30 20 10 0 1 3 5 7 9 11

75 6.2. A 1-D AER SELECTIVE ATTENTION CHIP (a) (b) Spike input/output distribution Pixel position (c) Winning neuron Time (s) (d) Figure 6.12: Test image with salient features. (a) Original color figure. (b) Corresponding saliency map. (c) Input spike frequencies obtained from the injective mapping describe in the text (upper trace) and distribution of the output neuron s spike counts recorded over a period of 3 seconds (lower histogram). (d) Position of the attended pixel recorded over time. the saliency map represent the locations attended by the chip, and the white solid lines join successive attended locations. Figure 6.13 resembles a visual scan-path similar to those recorded from human subjects [97, 122]; yet, the figure represents the movements of the focus of attention, which do not necessarily match on a one-to-one basis the saccadic eye movements measured in scan-paths. The focus of attention in our VLSI system tends to shift more frequently between locations which are spatially close to each other. This property, which appears to be characteristic also of human subjects [96], has been explicitly engineered into our system by implementing excitatory lateral connections between winner-take-all cells (see Section 6.2.3). Furthermore, the average time spent by the system at each location is approximately 50ms. This measurement, also in accordance with data measured from psychophysical experiments performed on human subjects, is an emergent property of the system and has not been explicitly engineered. Mapping the 2D saliency map onto a 1D vector and the 1D output of the chip back onto the 2D saliency map introduces some artifacts which do not allow us to make a fair comparison between the scan-paths obtained from the chip with scanpaths recorded from human observers. For example, the region around pixel values (260; 475) of Fig is never selected by the system precisely due to the injective mapping described above. The example of focus of attention scan-paths shown in Fig is only illustrative. Although the 1D selective attention chip can be used in several application domains [51], 2D saliency maps containing both horizontal and vertical salient features are best processed using 2D selective attention chips [49, 56]. 65

CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS 50 100 150 200 250 300 350 400 450 500 100 200 300 400 500 Figure 6.13: Mapping of the 1D data of Fig. 6.12(d) onto the re-sampled 2D saliency map data of Fig.

76 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Figure 6.13: Mapping of the 1D data of Fig. 6.12(d) onto the re-sampled 2D saliency map data of Fig. 6.12(b). Shifts along the horizontal axis are due to the selective attention chip s response. Shifts along the vertical axis are introduced artificially via the injective mapping described in the text. 66

77 6.3. A 2-D AER SELECTIVE ATTENTION CHIP To nearest neighbors AER Input A E R Excitatory Synapse Inhibitory Synapse + Iex + - Iior Hysteretic WTA P2V (X) P2V (Y) Output Neuron A E R Analog Output AER Output To nearest neighbors IOR Figure 6.14: Block diagram of a basic cell of the 8 8 selective attention architecture. 6.3 A 2-D AER selective attention chip We extended the model presented in the previous section to two dimensions and implemented a 2-D selective attention chip. The 2-D selective attention chip was fabricated using a standard 2µm CMOS technology. Its size is approximately 2mm 2mm and it contains of an array of 8 8 cells. The chip s architecture, easily expandable to arrays of arbitrary size, is laid out on a square grid, with input and output AER interfacing circuits. In a system containing AER sensors interfaced to the selective attention chip, address events would reach at the input stage of each cell of the 8 8 array excitatory synaptic circuits that convert the digital voltage pulse streams into analog input currents. Figure 6.14 shows the block diagram of one of the architeture s cells. The input current integrated by the excitatory synapse (see I ex in Fig. 6.14) is sourced into the hysteretic WTA network. The output current of each WTA cell is used to activate both an integrate and fire (I&F) neuron and two position to voltage (P2V) circuits [28]. The two P2V circuits encode both x and y coordinates of the winning WTA cell with two analog voltages, while the I&F neurons generate pulses that are used by the AER interfacing circuits to encode the address of the winning WTA cell. The neuron s spikes are also integrated by the local inhibitory synapse connected to it, to generate a current I ior that is subtracted from the current I ex (see Fig. 6.14). Figure 6.15 shows the circuit diagram of both excitatory and inhibitory synapses. As for the 1-D selective attention chip, the synaptic circuits use compact, non-linear current-mirror integrators to integrate their input spikes. The transistors in the dashed box of Fig. 6.15(a) implement the AER input interfacing circuits, and can operate correctly over a wide range of input pulse widths, ranging from a few hundred nanoseconds to milliseconds. The gain and time constants of the two current-mirror integrators are set by two pairs of control voltages (V w and V τe for the excitatory synapse and V q and V τi for the inhibitory synapse). The sum of the currents (I ex I ior ) is sourced into the input node of the hysteretic WTA cell (node V in in Fig. 6.16). Each cell is connected to its four nearest neighbors, both with lateral excitatory connections and lateral inhibitory connections (see Fig. 6.16). The inhibitory connections are modulated by the bias voltage V inh, and control the spatial extent over which competition takes place. If lateral inhibition is maximally turned on (V inh = V dd ), all WTA cells of the architecture are connected together and only one winner can be selected at a time (global inhibition). If V inh is low, the WTA network allows multiple winners to be selected, as long as they are sufficiently distant from each other (local inhibition). Similarly, lateral excitatory connections, modulated by the bias voltage V ex, control the amount of lateral facilitatory coupling between cells. If lateral coupling is enabled, the system tends to select new winners in the immediate neighborhood of the currently selected cell. When a WTA cell is selected as a winner, its output transistors source DC currents into the two P2V row and column circuits. The winning WTA cell also sources a DC current I in j into the input node V mem of the local inhibitory neuron connected to it (see Fig. 6.17). The amplitude of the injection current I in j is independent of the input current (I ex I ior ), but depends on the bias voltage V wta and on the control voltage V in j. This current, integrated onto the neuron s capacitor C m of Fig. 6.17, allows the neuron s membrane voltage V mem to increase linearly with time. As soon as V mem reaches the threshold voltage V thr, the neuron generates an action potential: the comparator and the inverters of Fig drives V out to the positive power supply rail. This activates the AER row and column request signals (R x and R y ), which produce 67

78 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Vdd Vdd V τe V w I ex I ior Vdd V ior Vdd Vdd Vdd V q V pu Q x V ack V τi Q y (b) (a) Figure 6.15: Synaptic circuits. (a) Input excitatory synapse. Address events are converted into pulses by the circuit in the dashed box. Pulses are integrated into the excitatory current I ex by the p-type current-mirror integrator. The integrator s gain and time constant are modulated by the control voltages V w and V τe ; (b) Inhibitory synapse. On-chip pulses (V ior ) are integrated into the inhibitory current I ior by the n-type current-mirror integrator. The time constant and gain of this integrator are modulated by the voltages V q and V τi. Vdd Vdd V Vdd Vdd inj V in (I ex -I ior ) I inj V ex V P2VX V P2VY V ex V ex V ex V inh V inh V inh V wta V inh Figure 6.16: Hysteretic WTA cell. Input currents are sourced into node V in and 3 copies of the output current are sent to the two P2V circuits and to the I&F neuron. an address event. The output AER circuit s acknowledge signals (A x and A y ) reset the pulse by allowing the neuron s membrane capacitance to discharge at a rate controlled by V pw. Also in this case, next to transmitting their address events off chip, the output neurons, together with the local inhibitory synapse connected to them, implement the inhibition of return (IOR) mechanism. The spikes generated by the winning cell s output neuron are integrated by its corresponding inhibitory synapse, and gradually increase the cell s inhibitory post-synaptic current I ior. As the neuron keeps on firing, the net input current to that cell (I ex I ior ) decreases until a different cell is eventually selected as the winner. When the previous winning cell is de-selected its corresponding local 68

79 6.3. A 2-D AER SELECTIVE ATTENTION CHIP C fb Vdd Vdd V pb V mem I inj C m + V thr V out V pw V rfr Vdd V pu V ior A x R x R y A y C r Figure 6.17: Local output integrate and fire neuron. When the membrane voltage V mem increases above V thr the output voltage V out is driven to V dd and an address event is generated. The transistors in the dashed box are part of the output AER circuitry. output neuron stops firing and its inhibitory synapse recovers, decreasing the inhibitory current I ior back to zero Experimental Results To characterize the behavior of selective attention chip with well controlled input signals we interfaced it to a workstation, via a National Lab-PC+ I/O card, and stimulated it using the AER communication protocol. With this setup we were able to stimulate all the 64 pixels of the network with voltage pulses (i.e. address-events) at a maximal rate of 500Hz. As the input synapses were set to have time constants of the order of milliseconds, each cell appeared to receive input spikes virtually in parallel. The handshaking between the chip and the PC was carried out at run time by the hardware present in the National I/O card. The chip s input stimuli consisted of patterns of address-events being generated by the workstation at uniform rates of different frequencies. We performed two sets of experiments, to demonstrate the chip s response properties using both the analog P2V outputs and the digital AER output. Analog P2V outputs In the first set of experiments, we used a test stimulus that excited cells (2,2) (2,7) (7,2) and (7,7) of the selective attention chip with 30Hz pulses, and cell (5,5) with 50Hz pulses. Figure 6.18(a) shows the analog output of the P2V circuits in response to 300ms of stimulation with the input saliency map described above. The system initially selects the central cell (5,5). But, as the IOR mechanism forces the WTA network to switch the selection of the winner, the system cycles through all other excited cells as well. The P2V circuits are actively driven when the WTA network is selecting a winner (i.e. when the output p-type transistors of Fig. 6.16(a) are sourcing current into the nodes P2V X and P2VY ). At the times in which no cell is winning (i.e. when all cells are inhibited), there is no active device driving the P2V circuits, and their outputs tend to drift toward zero. This is evident in Fig. 6.18(a), for example, at the position corresponding to cell (7,2) in the lower right corner of the figure. When the network selects it as its eighth target, the horizontal P2V circuit outputs approximately 4.4V and the vertical one outputs approximately 1.3V. When the IOR mechanism forces the network to de-select the winner the outputs of the P2V circuits slowly drift toward zero. As soon as inhibition decreases, the network selects the cell (7,7) as the new (ninth) winner, the position to voltage circuits are actively driven again, and their output quickly changes from approximately 3.6V and 1.2V to 4.2V and 3.5V (for the horizontal and vertical circuits respectively). Digital AER outputs To verify that the AER outputs are consistent with the analog P2V outputs, we stimulated the chip with the same pattern used for collecting the data of Fig. 6.18(a). We measured the address-events generated by the selective attention chip in response to this input stimulus using a logic analyzer, and plotted in Fig. 6.18(b) the histogram of such events. As shown, the chip s output address-events reflect, on average, the input stimulus, and are consistent with the analog outputs of the P2V circuits. 69

80 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Y Position (V) Event count X Position (V) (a) Y Address X Address (b) Figure 6.18: (a) Output of the P2V circuits of the selective attention architecture measured over a period of 300ms, in response to a test stimulus exciting four corners of the input array at a rate of 30Hz and a central cell at a rate of 50Hz; (b) Histogram of the chip s output address-events, captured over a period of 13.42s in response to the same input stimulus. The data of both Fig. 6.18(a) and (b) demonstrate how the IOR mechanism forces the network to switch the selection of the winner from one input to a different one, cycling through all sufficiently strong inputs. To demonstrate also how different IOR dynamics settings (modified for example by changing the bias voltage V τi of Fig. 6.15(b)) affect the system s behavior, we performed a second experiment with a different input stimulus. The stimulation pattern used in this experiment excited cells (2,2), (5,5) and (7,2) with pulses at uniform frequency of 50Hz, cell (7,7) with 100Hz pulses and cell (2,7) with a 150Hz pulses (see Fig. 6.19(a) for a histogram of the input address-events). Figures 6.19(b), (c) and (d) show histograms of the chip s response for three different values of the bias voltage V τi. The data of Fig. 6.19(b) was obtained by setting the time constant of the inhibitory synapse to a relatively high value (V τi = 227mV ). In this case once a cell is inhibited (after being selected as the winner), its input is suppressed for an extensive period of time and the WTA network is forced to select all other (non-suppressed) inputs. Conversely, the data of Fig. 6.19(d) was obtained by setting synapse time constant to a relatively low value (V τi = 193mV ). In this case the WTA network switches from selecting the cell receiving the strongest input to the cell receiving the second-strongest input, and back. As the selected cells are not suppressed for sufficiently long periods of time, the remaining inputs never win the WTA competition. The histogram in Fig. 6.19(c) shows the data obtained for the intermediate case of V τi = 207mV. The same data used to compute the address-event histograms of Fig can be displayed using a different representation, to show the dynamics of the WTA competition stage. In Fig we plotted the address-events measured for the intermediate case of Fig. 6.19(c) over time. The addresses of the 8 8 cells are labeled successively row by row, such that labels 0 through 7 correspond to the addresses of the cells in the first row, labels 8 through 15 correspond to the addresses of cells in the second row, and so on. Consistent with the histogram of Fig. 6.19(c), this plot shows how the system selects the cell (2,7) (labeled as 15 in Fig. 6.20) most frequently, switching occasionally to cell (5,5) (labeled as 37), and more often to cells (7,2), and (7,7) (labeled as 50 and 55). As mentioned in Section 6.2.1, the details of the switching dynamics can be controlled by setting appropriately the bias voltages of the excitatory and inhibitory synaptic circuits (see (V w,v τe ) and (V q,v τi ) in Fig. 6.15) and the neuron s firing rate (controlled by V in j of Fig. 6.16(b)). These bias voltages, together with the other ones controlling the hysteretic WTA network s behavior (namely V wta, V ex, and V inh of Fig. 6.16(a)), endow the system with a sufficient amount of flexibility to be able to use the same chip in different types of selective attention tasks. 6.4 Selective attention applications The test stimuli used in the experiments of Section were simple examples designed to demonstrate the expected behavior of the selective attention chip. They don t resemble realistic saliency maps (see Fig. 6.12and Fig. 6.21(a,b)). In practical applications saliency maps would more likely resemble the one shown in Fig. 6.21(c), or the one shown in Fig More elaborate saliency maps could be processed by 2D selective attention networks of greater size. The 8 8 architecture proposed in this paper can scale up to networks of arbitrary size: The performance of the hysteretic WTA 70

81 6.4. SELECTIVE ATTENTION APPLICATIONS Event Count Event Count Y Address (a) X Address Y Address (b) X Address Event Count Event Count Y Address (c) X Address Y Address (d) X Address Figure 6.19: Event histograms of addresses generated by the workstation sent to the chip (a) and output addresses generated by the selective attention chip (b), (c), and (d). All chip parameters are kept constant throughout the plots except for the bias parameter V τi. The histogram in (b) was obtained with V τi = 227mV, the one in (c) with V τi = 207mV, and the one in (d) with V τi = 193mV. circuits, which operate collectively in a massively parallel way, is not affected by the network s size. Similarly, given that in the selective attention system there is always one or a few winners at a time, the performance of the AER circuitry does not degrade with size (performance is affected only in architectures in which too many cells are trying to access the AER bus simultaneously). As demonstrated in Section 6.2, these types of selective attention chips can operate reliably also on elaborate saliency maps, generated from high-resolution digitized images. In practical applications, the images could come for example from a camera connected to the workstation, and the selective attention chip could be used to allocate in real-time CPU (image processing) resources only to the first n most salient regions of the image, or to scan the whole image in an intelligent way, sorting the scanning process by region saliency. Depending on the chip s bias settings, the system could also be tuned to visit each region only once, switching from region to the other slowly, or to revisit each region over and over again, switching from one region to the other quickly. Systems of this type would already benefit from the real-time response properties of the selective attention chip. But the most effective way of exploiting the computational properties of this chip would be to use it in conjunction with neuromorphic sensors that employ the AER communication protocol, such as silicon retinas or silicon cochleas [7, 34, 68]. These types of systems could be used as research tool for testing, in real-time, with real stimuli, different hypotheses on biological selective attention mechanisms [25, 90, 96]. Or they could be used as low-cost alternatives to implement visual/auditory tracking or monitoring systems. For example, rather than using several fixed high-resolution (high-cost) cameras to monitor an environment, one could use a single, motorized, high-resolution camera driven by a selective attention system, comprising an AER silicon retina with a wide-field of view lens interfaced 71

82 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Neuron Address Time (ms) Figure 6.20: Output address events of the selective attention chip biased with V τi = 207mV. The 2D address space of the chip s architecture is mapped into the plot s 1D ordinate vector by labeling each address successively, row by row. (a) (b) (c) Figure 6.21: Image representations of saliency maps. (a) Saliency map corresponding to the input stimulus used for the experiment of Fig. 6.18; (b) Saliency map used for the experiment of Fig. 6.19; (c) Fictitious example resembling a realistic saliency map. to the selective attention chip. In the next section we describe a first attempt at making a system of this type. 6.5 An active AER selective attention system We constructed an active vision system using an AER image sensor mounted on a motorized pan-tilt unit, and the 2-D AER selective attention chip interfaced to a workstation. The selective attention chip receives input from an AER imaging sensor [67], and transmits the address of the winning pixel to the workstation, that is used to drive the pan-tilt unit on which the sensor is mounted. A standard CCD camera is mounted next to the sensor, to visualize the sensor s field of view. The AER sensor responds to contrast transients and its address events report the position of moving objects. The selective attention chip selects the locations with highest contrast moving objects and cycles through them, while the workstation drives the pan-tilt unit centering the selected locations with the sensor s imaging array. A block diagram of the selective attention sensory-motor system and the correspondence between the system s computational blocks and their biological counterparts is shown in Fig. 6.22(a). A schematic diagram of the system s setup illustrating how the individual components are connected together is shown in Fig. 6.22(b). At the input stage we use a neuromorphic imager that is sensitive to temporal changes in illumination (transients) and extract motion or flicker as features. Since our system in its current state extracts only one feature map, the saliency map is identical to the extracted feature map. In this case no feature combination stage is necessary. The transient imager chip transmits its output data directly to the selective attention chip. Based on its inputs, the selective attention chip computes the location of the focus of attention and sends address events encoding this location to the host computer. In addition to managing the communication with the selective attention chip, using the AER communication protocol, the host computer is used for data logging and, more importantly, for driving the 72

6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM Real world scene Transient Imager CCD Camera Biology: Retina, LGN, V1 Lens Eye muscles HOST Model: Transient imager Pan-tilt unit Function: Image input

83 6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM Real world scene Transient Imager CCD Camera Biology: Retina, LGN, V1 Lens Eye muscles HOST Model: Transient imager Pan-tilt unit Function: Image input and feature map calculation Eye movements Pulvinar, primary visual cortex, superior colliculus Superior colliculus Selective attention chip Saliency map processing and focus of attention computat. Software algorithm Motor control for eye movements Pan-Tilt Unit (b) Selective Attention Chip (a) Figure 6.22: (a) Block diagram of the sensory-motor selective attention model. The figure shows the basic computational blocks used, as well as the corresponding biological analogues and their function. (b) Schematic diagram of the active vision setup: The neuromorphic imager, mounted on a pan-tilt unit, transmits its output to the selective attention chip. The latter sends the results of its computations to a host computer which uses this data to drive the pan-tilt unit s motors. Figure 6.23: Selective attention active vision system. The selective attention chip processes sensory data coming from an AER imaging sensor and transmits its output to a workstation that drives the pan-tilt unit on which the sensor is mounted. A standard CCD camera is mounted next to the AER sensor to visualize the sensor s filed of view. motors of a commercial pan-tilt unit 1 on which the transient imager is mounted (see Fig. 6.23). The pan-tilt unit is used to orient the imager chip such that the location of the focus of attention lies in its central region. The system proposed here uses a single-sender/single-receiver point-to-point AER protocol. The sender chip is a transient imager that contains a two-dimensional array of adaptive photoreceptors with an AER arbiter circuit that serially processes the requests from the different pixels in the order of their activation, latches their addresses onto the AER bus in the same order, and sends acknowledge pulses to the corresponding pixels [9]. As soon as a new address is ready on the bus, the handshaking cycle with the receiver chip is initiated, in the course of which the address of the sending pixel is transmitted. The transient imager transmits its address events to the selective attention chip using a topographic mapping. 1 fabricated by Directed Perception, Inc. 73

84 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Transient Detector ON Interface OFF Interface ON request ON acknowledge OFF request OFF acknowledge Figure 6.24: Block diagram of irradiance transient detector with event-based communication interface. As the sender has pixels and the receiver only 8 8 we map the addresses of 2 2 neighboring pixels on the sender to the same pixel on the receiver. This mapping was accomplished by simply discarding the least significant bit of the sender address, for each dimension The Transient imager chip The transient imager is a pixel array of irradiance transient detectors that is used to generate the events that drive the system. Each pixel responds with binary pulses in real time to a local change of a brightness distribution projected through a lens onto its surface. These pulses are used as the request signals to the AER communication interface. Figure 6.24 shows a block diagram of the pixel circuitry. The transient detector comprises an adaptive photo-receptor [23] with a rectifying temporal differentiator [67] in the feedback loop. Positive irradiance transients, corresponding to dark-to-bright or ON transitions, and negative irradiance transients, corresponding to bright-to-dark or OFF transitions, appear at different output terminals. The ON and OFF responses are separately amplified with tunable gains, each generating a request pulse to the on-chip arbiter if it exceeds a chosen threshold. By appropriately setting the threshold and the respective gain factors, the circuit can be made to respond only to ON transients or only to OFF transients or to both types of transients. Each acknowledge pulse from the arbiter triggers a reset pulse at the requesting terminal, whose duration determines a refractory period for the succeeding request from the same terminal. Depending on the chosen refractory period and the magnitude and duration of the irradiance transient, the pixel responds with a single spike or a burst of spikes. In the present application, a short refractory period of 140µs was chosen to obtain bursts, and only the OFF response was used to stimulate the selective attention chip. The pixels are arranged on a square grid. The position of a pixel along a row is encoded with a 4-bit column address and its position along a column with a 4-bit row address. An additional address bit is used to distinguish between ON and OFF transients The Motor Control Algorithm The control algorithm that the host computer executes is responsible for driving the motors of the pan-tilt unit in such a way as to center the location picked by the selective attention chip within the central region of the transient imager chip. This algorithm represents a first attempt at modeling the bottom-up, stimulus driven neural mechanism that generates saccadic eye movements which center the fovea with respect to the location of the focus of attention. To evaluate quantitatively the response properties of the system and test the motor control algorithm, we mounted a standard CCD camera next to the transient imager chip and captured images on the host computer (see also Fig. 6.22(b)). This allowed us to see in real-time the images projected onto the focal plane of the transient imager chip, as shown in Fig We calibrated the system so that the image projected onto the transient imager array corresponds to the central part of the image captured by the CCD camera, shown as the outer square in the center of Fig The inner square drawn in the center of Fig represents the part of the scene being projected on the central 4 by 4 region of the transient imager array. The location selected by the selective attention chip is represented by a small cross, superimposed onto the CCD image. The control algorithm produces motor commands that depend on the current position of the selected location and its recent history: if the cross lies within the inner square of the image, no camera movements are triggered (the camera is already foveating the salient feature). If the cross shifts to a location outside the inner frame, the algorithm records the address of the location and increases a counter associated with that address. As soon as the counter for a particular address reaches a threshold n (i.e. when the cross revisits the same location n times), the algorithm generates a camera movement that centers the selected location within the central region of the transient imager array (the camera saccades to the persistent salient stimulus). In this way camera movements are generated only if a salient location is visited more than once. The revisiting constraint ensures that the system does not saccade to all locations picked by the selective attention chip, but 74

6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM Figure 6.25: Image captured from the CCD camera mounted next to the transient imager.

The cross to the bottom right of the image center represents the location of the focus of attention currently computed by the selective attention chip. (a) (b) Figure 6.

The LED stimulating the region around pixel (5,9) has higher contrast than the other LED.

85 6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM Figure 6.25: Image captured from the CCD camera mounted next to the transient imager. The outer frame shown in the image corresponds to the field of view of the transient imager, whereas the inner frame is drawn to evidence the transient imager s central region. The cross to the bottom right of the image center represents the location of the focus of attention currently computed by the selective attention chip. (a) (b) Figure 6.26: (a) Histogram of events generated by the transient imager pixels in response to two diffused flashing LEDs. The LED stimulating the region around pixel (5,9) has higher contrast than the other LED. (b) Histogram of events generated by the selective attention chip in response to the events generated by the transient imager chip. orients its gaze only toward persistent salient stimuli. In the examples shown in Section 5.6.4, n was set to 5. The value of n was chosen to reproduce the characteristics of biological selective attention systems, as reported in the neuroscience literature [96]: while the focus of attention shifts 15 to 20 times per second, saccadic eye movements are made only 3 to 5 times per second [96]. Another important function implemented by the motor control algorithm is that of saccadic suppression. During a camera movement the images projected on the focal plane of the transient imager array generate a large amount of address events. These events are not relevant for the analysis of the scene once the camera stops moving. In biology this problem is solved by suppressing all inputs arriving from the retinas during saccadic eye movements (indeed, we are effectively blind during a saccade). In the current version of our system, the addresses generated by the transient imager chip are hardwired into the selective attention chip (see Fig. 6.22(b)). There is no way of suppressing these events at source. During a camera movement the selective attention chip receives and processes all spurious events from the imager and the addresses generated by the selective attention chip are transmitted to the host computer. The control algorithm ignores the effect of these events, by resetting all address counters to zero after each camera movement. In this way, the recent history of all selected positions is canceled and normal operation of the control algorithm can be resumed System response in absence of camera movements Initially, we tested the system with the motors of the pan-tilt unit turned off. The input images consisted of a laboratory scene with two flashing LEDs in the foreground. The two LEDs were blinking in phase, with a frequency of 1Hz and a 75

86 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Neuron address Time (s) Figure 6.27: Raster plot of the activity of the neurons of both transient imager chip (dots) and selective attention chip (circles) in response to the flashing LEDs. To plot the data from both chips using an address space with the same resolution, we sub-sampled the addresses of the transient imager chip. The LEDs flashed approximately at 0.25s, 1.25s and 2.25s. duty-cycle of 50%. As the transient imager responds only to local changes in illumination, the blinking LEDs proved to be a reliable and well controlled stimulus. The static background did not contribute to the generation of address events. We placed a diffusion glass in front of the transient imager s lens, to diffuse the projection of the two LEDs on the imager s focal plane. In this way we were able to stimulate several pixels of the imaging array with each LED. Fig. 6.26(a) shows the histogram of the address events generated by the transient imager array in response to the flashing LEDs, captured over a period of 2s. The two regions with the highest occurrence of events (around pixels (5,9) and (11,11)) correspond to the locations of the LEDs. Fig. 6.26(b) shows the histogram of address events generated by the selective attention chip. As shown, on average, the selective attention chip visited pixels (3,5), (3,4) and (6,6), (6,5) most often. While the event histogram shows that the selective attention chip acts on average like a threshold filter, picking only inputs with a high mean frequency, it does not show the more interesting aspect of the computation carried out by the chip: its dynamics. To show the dynamical aspect of the selective attention chip s response, we plotted in Fig a raster plot. This plot shows the activity of the transient imager and of the selective attention chip neurons over time, in response to the flashing LEDs. The 8 by 8 neurons of the selective attention chip are labeled successively, row by row (1 through 64), and the events that they generated are plotted with circles. To show the events of the transient imager pixels on the same scale, we sub-sampled their addresses taking into consideration only their three most significant bits (in the same way we implemented the mapping of addresses from the transient imager pixels to the selective attention ones, as described in Section 6.1.1). The high density of events around time instants 0.5s, 1.5s and 2.5s is due to the flashing of the LEDs. Within a single flash, the focus of attention shifts approximately four times, moving from one region of high saliency to another. The proportion between events generated by the two chips is consistent with the data of Fig By looking at the selective attention chip data of Fig one can extrapolate the focus of attention s scanpaths. Note how these scanpaths tend to repeat themselves over time. This characteristic will be even more evident in Section 6.5.5, when we analyze the response of the system to natural stimuli System response in presence of camera movements To allow the system to make camera movements we activated the motors of the pan-tilt unit on which the imager was mounted. The input stimulus consisted again of two flashing LEDs, but this time not in phase. Furthermore we removed the diffusion filter from the transient imager s lens, so that the two LEDs stimulated only a few pixels of the imaging array. As described in Section 6.5.2, the selective attention chip was driving the pan-tilt unit to orient the imager towards the attended location. Figure 6.28 shows a sequence of images captured by the CCD camera mounted on the pan-tilt unit, while the system was engaged in selecting and tracking the LEDs. Initially only the top LED was flashing, and the system selected it and oriented the central region of the imager to that location (see Fig. 6.28(a)). As we turned on the bottom LED, the system changed the focus of attention location (see Fig. 6.28(b)) and made a camera movement centering the attended stimulus on the central region of the imager (see Fig. 6.28(c)). The raster plot of Fig shows in detail the sequence of events that lead to the camera movement. The arrangement of the neuron addresses on the figure axis is the same as in Fig Initially the selective attention chip was attending the region of transient imager pixels that project to its 35 th pixel. As the second LED flashed, the imager pixels excited also the 20 th selective attention chip pixel. After approximately 1s, the WTA network of the selective attention chip switched and selected the second LED as the winner. After having attended to that location for approximately 2.5s, the system made 76

6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM (a) (b) (c) Figure 6.28: Sequence of images showing the selection of a salient stimulus prior to and after a saccadic eye movement.

(c) The system performed a saccade toward the bottom LED, and is currently attending it. Neuron address 64 56 48 40 32 24 16 8 0 1 2 3 4 5 6 7 8 9 Time (s) Figure 6.

87 6.5. AN ACTIVE AER SELECTIVE ATTENTION SYSTEM (a) (b) (c) Figure 6.28: Sequence of images showing the selection of a salient stimulus prior to and after a saccadic eye movement. (a) The system is attending the top LED, already centered on the central part of the imaging array. (b) The system selects the bottom LED, outside the central region of the imager. (c) The system performed a saccade toward the bottom LED, and is currently attending it. Neuron address Time (s) Figure 6.29: Raster plot of the activity of the neurons of the transient imager chip (dots) and of the selective attention chip (circles) in response to two flashing LEDs. The focus of attention shifts from a central region of the imaging array to a peripheral one (see circles at 2s t < 6s). Consequently, the system makes a camera movement, at the time indicated by the vertical arrow, and re-centers the attended location. an abrupt camera movement (saccade), and centered the attended stimulus on the imaging array System response to natural stimuli In this section we show how the system is able to select and attend natural stimuli, that were not explicitly engineered to optimally drive the imaging array. As we did in Sections and 6.5.4, we initially tested the system in the absence of camera movements and subsequently tested it with the motor output activated. Figure 6.30 shows the location of the focus of attention, as measured by the P2V circuits of the selective attention chip (see Fig. 6.14), in response to the fluttering fingers of the experimenter, over a period of 500 ms. The x-component and y-component of the focus of attention are plotted against each other, and superimposed onto an image taken by the CCD camera during the experiment. Although the resolution of the selective attention chip is 8 8 pixels, the data of Fig seems to belong to a much higher resolution architecture. This is due to the fact that the output of the P2V circuits is analog and is affected by noise [49]. These analog output signals might not be appropriate for precise quantitative measurements, but could be used to drive, via buffers or power-amplifiers, motors and actuators to implement (negative feedback) sensorymotor loops [49]. Figure 6.31 shows the response of the system to the same stimulus as Fig. 6.30, with the motors engaged. Fig. 6.31(a) shows the beginning of the experiment: the motors had just been activated, the imager was still in its initial position and the selective attention chip chose a pixel in the top left region of the transient imager array as the focus of attention. After the selective attention chip transmitted the same pixel address to the host computer for a set number of times, specified 77

CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Figure 6.30: Output of the P2V circuits of the selective attention chip (see Fig. 6.14), representing the scanpath of the focus of attention, switching back and forth between the fluttering fingers of both of the experimenter s hands.

(a) CCD camera snapshot taken before the saccadic eye movement (the focus of attention has just switched from one hand to the other).

by the motor control algorithm (see Section 6.5.2), the control algorithm generated a camera movement and centered the focus of attention with respect to the transient imager array (see Fig. 6.31(b)).

88 CHAPTER 6. MULTI-CHIP MODELS OF SELECTIVE ATTENTION SYSTEMS Figure 6.30: Output of the P2V circuits of the selective attention chip (see Fig. 6.14), representing the scanpath of the focus of attention, switching back and forth between the fluttering fingers of both of the experimenter s hands. The scanpath data is superimposed onto a snapshot taken from the CCD camera during the experiment. (a) (b) Figure 6.31: Saccadic eye movements in response to moving fingers. (a) CCD camera snapshot taken before the saccadic eye movement (the focus of attention has just switched from one hand to the other). (b) CCD camera snapshot taken just after the the saccadic eye movement (the focus of attention and the salient stimulus are now in the center of the imaging array). by the motor control algorithm (see Section 6.5.2), the control algorithm generated a camera movement and centered the focus of attention with respect to the transient imager array (see Fig. 6.31(b)). If the salient stimuli were persistent (e.g. if the fingers kept on moving) and remained in the field of view of the imager, the system continuously shifted its gaze from one salient stimulus to the other. This behavior has proven to be extremely reliable and robust. The system s response is largely invariant to illumination conditions, stimulus speed and (static) background conditions. 78

Winner-Take-All Networks with Lateral Excitation

Analog Integrated Circuits and Signal Processing, 13, 185 193 (1997) c 1997 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Winner-Take-All Networks with Lateral Excitation GIACOMO