Hardware Implementation of a PCA Learning Network by an Asynchronous PDM Digital Circuit

Similar documents
A Simple Design and Implementation of Reconfigurable Neural Networks

Sonia Sharma ECE Department, University Institute of Engineering and Technology, MDU, Rohtak, India. Fig.1.Neuron and its connection

Drum Transcription Based on Independent Subspace Analysis

A Parallel Analog CCD/CMOS Signal Processor

John Lazzaro and John Wawrzynek Computer Science Division UC Berkeley Berkeley, CA, 94720

Keywords : Simultaneous perturbation, Neural networks, Neuro-controller, Real-time, Flexible arm. w u. (a)learning by the back-propagation.

Fixed Point Lms Adaptive Filter Using Partial Product Generator

Analog Implementation of Neo-Fuzzy Neuron and Its On-board Learning

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Stochastic Mixed-Signal VLSI Architecture for High-Dimensional Kernel Machines

CHAPTER 4 MIXED-SIGNAL DESIGN OF NEUROHARDWARE

TIME encoding of a band-limited function,,

Implementation of High Precision Time to Digital Converters in FPGA Devices

PROGRAMMABLE ANALOG PULSE-FIRING NEURAL NETWORKS

An Analog VLSI Model of Adaptation in the Vestibulo-Ocular Reflex

FPGA Implementation Of LMS Algorithm For Audio Applications

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

A Mixed Mode Self-Programming Neural System-on-Chip for Real-Time Applications

Separation and Recognition of multiple sound source using Pulsed Neuron Model

Merging Propagation Physics, Theory and Hardware in Wireless. Ada Poon

CMOS Architecture of Synchronous Pulse-Coupled Neural Network and Its Application to Image Processing

TESTABLE VLSI CIRCUIT DESIGN FOR CELLULAR ARRAYS

A GENERAL SYSTEM DESIGN & IMPLEMENTATION OF SOFTWARE DEFINED RADIO SYSTEM

Implementation of decentralized active control of power transformer noise

Keywords: Adaptive filtering, LMS algorithm, Noise cancellation, VHDL Design, Signal to noise ratio (SNR), Convergence Speed.

Figure 1: A typical Multiuser Detection

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

VLSI Implementation of Cascaded Integrator Comb Filters for DSP Applications

Artificial Neural Networks. Artificial Intelligence Santa Clara, 2016

VLSI Implementation of a Neuromime Pulse Generator for Eckhorn Neurons. Ben Sharon, and Richard B. Wells

A Novel Adaptive Method For The Blind Channel Estimation And Equalization Via Sub Space Method

Vocal Command Recognition Using Parallel Processing of Multiple Confidence-Weighted Algorithms in an FPGA

Supplementary Figures

DESIGN AND IMPLEMENTATION OF ADAPTIVE ECHO CANCELLER BASED LMS & NLMS ALGORITHM

High-speed Noise Cancellation with Microphone Array

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

Joint Transmitter-Receiver Adaptive Forward-Link DS-CDMA System

Systolic modular VLSI Architecture for Multi-Model Neural Network Implementation +

CHAPTER 6 BACK PROPAGATED ARTIFICIAL NEURAL NETWORK TRAINED ARHF

A LOW-COST SOFTWARE-DEFINED TELEMETRY RECEIVER

Digital Controller Chip Set for Isolated DC Power Supplies

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

DESIGN AND IMPLEMENTATION OF TWO PHASE INTERLEAVED DC-DC BOOST CONVERTER WITH DIGITAL PID CONTROLLER

DURING the past several years, independent component

Performance Analysis of a 64-bit signed Multiplier with a Carry Select Adder Using VHDL

Time and Frequency Corrections in a Distributed Network Using GNURadio

AN IMPROVED NEURAL NETWORK-BASED DECODER SCHEME FOR SYSTEMATIC CONVOLUTIONAL CODE. A Thesis by. Andrew J. Zerngast

Segmentation of Fingerprint Images

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

Efficient Learning in Cellular Simultaneous Recurrent Neural Networks - The Case of Maze Navigation Problem

Study on Digital Multiplier Architecture Using Square Law and Divide-Conquer Method

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Lecture 13 Read: the two Eckhorn papers. (Don t worry about the math part of them).

NEURAL NETWORK BASED MAXIMUM POWER POINT TRACKING

Adaptive Antennas in Wireless Communication Networks

Efficient Implementation of Parallel Prefix Adders Using Verilog HDL

SUPERVISED SIGNAL PROCESSING FOR SEPARATION AND INDEPENDENT GAIN CONTROL OF DIFFERENT PERCUSSION INSTRUMENTS USING A LIMITED NUMBER OF MICROPHONES

Design and Estimation of delay, power and area for Parallel prefix adders

A variable step-size LMS adaptive filtering algorithm for speech denoising in VoIP

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Neuromazes: 3-Dimensional Spiketrain Processors

ATA Memo No. 40 Processing Architectures For Complex Gain Tracking. Larry R. D Addario 2001 October 25

SYNTHESIS OF CYCLIC ENCODER AND DECODER FOR HIGH SPEED NETWORKS

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

MINE 432 Industrial Automation and Robotics

Analysis on Extraction of Modulated Signal Using Adaptive Filtering Algorithms against Ambient Noises in Underwater Communication

Adaptive Correction Method for an OCXO and Investigation of Analytical Cumulative Time Error Upperbound

The Basic Kak Neural Network with Complex Inputs

CODE division multiple access (CDMA) systems suffer. A Blind Adaptive Decorrelating Detector for CDMA Systems

Decision Based Median Filter Algorithm Using Resource Optimized FPGA to Extract Impulse Noise

Networks for the Separation of Sources that are Superimposed and Delayed

I hope you have completed Part 2 of the Experiment and is ready for Part 3.

An Auditory Localization and Coordinate Transform Chip

CHAPTER 3 MAXIMUM POWER TRANSFER THEOREM BASED MPPT FOR STANDALONE PV SYSTEM

Analysis of Parallel Prefix Adders

FIR Filter Design on Chip Using VHDL

VLSI IMPLEMENTATION OF BACK PROPAGATED NEURAL NETWORK FOR SIGNAL PROCESSING

Enhanced MLP Input-Output Mapping for Degraded Pattern Recognition

16.2 DIGITAL-TO-ANALOG CONVERSION

Artificial Neural Network Engine: Parallel and Parameterized Architecture Implemented in FPGA

FPGA IMPLEMENTATION OF POWER EFFICIENT ALL DIGITAL PHASE LOCKED LOOP

Use of Neural Networks in Testing Analog to Digital Converters

An Efficent Real Time Analysis of Carry Select Adder

Noise Reduction using Adaptive Filter Design with Power Optimization for DSP Applications

PROMINENT SPEED ARITHMETIC UNIT ARCHITECTURE FOR PROFICIENT ALU

An Effective Implementation of Noise Cancellation for Audio Enhancement using Adaptive Filtering Algorithm

A VLSI Convolutional Neural Network for Image Recognition Using Merged/Mixed Analog-Digital Architecture

A Neural Network Facial Expression Recognition System using Unsupervised Local Processing

A HYBRID ALGORITHM FOR FACE RECOGNITION USING PCA, LDA AND ANN

Digital Dual Mixer Time Difference for Sub-Nanosecond Time Synchronization in Ethernet

SWITCHED CAPACITOR BASED IMPLEMENTATION OF INTEGRATE AND FIRE NEURAL NETWORKS

Digital-Analog Hybrid Synapse Chips for Electronic Neural Networks

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

The Digital Linear Amplifier

Finite Word Length Effects on Two Integer Discrete Wavelet Transform Algorithms. Armein Z. R. Langi

Study Of Sound Source Localization Using Music Method In Real Acoustic Environment

MIMO-LTI Feedback Controller Design -Status report-

+ C(0)21 C(1)21 Z -1. S1(t) + - C21. E1(t) C(D)21 C(D)12 C12 C(1)12. E2(t) S2(t) (a) Original H-J Network C(0)12. (b) Extended H-J Network

Swarm Intelligence W7: Application of Machine- Learning Techniques to Automatic Control Design and Optimization

FPGA Implementation of Wallace Tree Multiplier using CSLA / CLA

Transcription:

Hardware Implementation of a PCA Learning Network by an Asynchronous PDM Digital Circuit Yuzo Hirai and Kuninori Nishizawa Institute of Information Sciences and Electronics, University of Tsukuba Doctoral Program in Engineering, University of Tsukuba 1-1-1 Ten-nodai, Tsukuba, Ibaraki 35-8573 Japan E-mail : hirai@is.tsukuba.ac.jp, kuninori@viplab.is.tsukuba.ac.jp Abstract We have fabricated a PCA (Principal Component Analysis) learning network in a FPGA (Field Programmable Gate Array) by using an asynchronous PDM (Pulse Density Modulation) digital circuit. The generalized Hebbian algorithm is expressed in a set of ordinary differential equations and the circuits solve them in a fully parallel and continuous manner. The performance of the circuits was tested by a network with two-microphone inputs and two-speaker outputs. By moving a sound source right and left in front of the microphones, the first principal weight vector could continuously track the sound direction in real time. 1 Introduction A primary object of the hardware implementation of neural networks is to realize real-time operation of neural functions in VLSI chips. Since the mid eighties, many VLSI neural chips and systems have been reported in the literature, e.g. [1, 2]. We also developed a hardware system that consisted of 1,8 neurons fully interconnected via more than one million 7-bit physical synapses [3]. An asynchronous digital circuit was used to implement the neuron circuits. The behavior of each neuron faithfully obeys a nonlinear first-order differential equation, so that the system solves more than one thousand simultaneous differential equations in a fully parallel and continuous manner. The magnitude of neuron output is encoded by a pulse density as our real neurons do. Synaptic weights can be down loaded from a host computer after learning has been finished on it. The processing speed is ten thousand times faster than a latest workstation. Most of the research on on-chip learning have focused on supervised learning, e.g. [4]. There were a few cases for unsupervised or self-organizing learning networks including analog chips for independent component analysis reported in [5]. In this paper an asynchronous PDM digital circuit is applied to a principal component analysis (PCA) learning network. PCA is a standard technique widely used for dimension reduction in statistical pattern recognition. The task of PCA is to find a set of eigenvectors whose eigenvalues are the largest among those obtained from the correlation matrix of input data [6]. Oja [7] devised an on-line Hebbian learning algorithm which could find the first principal component of the input without recourse to the correlation matrix. Sanger [8] proposed a generalized Hebbian algorithm (GHA) which could find the first m principal components at the m output neurons in a single-layer feedforward linear network. Kung and Diamantaras [9] also invented a PCA learning network, but their network included an anti-hebbian learning rule as well as a Hebbian rule [6]. In this paper GHA is chosen as the target because the structure of the learning algorithm, especially its local implementability, is suited for hardware implementation as compared with other candidates. The discrete form of the original GHA is directly converted to a continuous form, since our circuits operate continuously in time. This does not mean to use ordinary differential equations, which will appear in the stability analysis of stochastic difference equations, because they contain a correlation matrix of multivariate als. In order to check if our circuit can find principal components in real time, a small learning network with two inputs and two outputs was fabricated in a FPGA. It was observed that the circuit could continuously track the correct directions of principal weight vectors in real time.

2 Generalized Hebbian Algorithm Here we consider a single-layer feedforward network with M inputs and N outputs. The output of the ith neuron at discrete time T is given by V i (T ) = M w ij (T )ξ j (T ) (1) j=1 where w ij (T ) is a synaptic weight from the jth input to the ith output neuron and ξ j (T ) is the jth input at time T. According to GHA [8] the synaptic weight w ij (T ) is updated by a small change as given by [ ] i w ij (T ) = η V i (T )ξ j (T ) V i (T ) w ik (T )V k (T ) (2) where η is the parameter which determines the learning rate. The prominent feature of GHA is in its local implementability. By rewriting Eq.(2) the following iterative equations can be obtained: k=1 w ij (T ) = ηv i (T )µ ij (T ), and (3) µ ij (T ) = µ i 1,j (T ) V i (T )w ij (T ), (4) where µ j (T ) = ξ j (T ). As seen in the above equations µ ij (T ) can be considered as a modified input for a Hebbian rule, and it can be calculated by using a learning al µ i 1,j (T ) supplied from its neighbor and a local negataive term V i (T )w ij (T ). Convergence properties of this algorithm were rigorously investigated in [1]. 3 Implementation in an Asynchronous PDM Digital Circuit We employed an asynchronous PDM digital circuit to implement the GHA because (1) by using asynchronous circuits, it is not necessary to supply a common clock to the entire circuits and a large system can be deed easily, (2) by using pulse stream, faithful analog data transmission by a single al line will be achieved as actual neurons do, which relaxes wiring problem, and (3) by using digital circuits, standard CMOS technologies such as FPGAs can be used and their rapid developments can be incorporated into the de of system. All of these merits have already been verified in the development of our 1,-neuron system [3]. 3.1 Continuous Learning Algorithm Since in PDM architecture an output from each neuron is expressed as a pulse stream, integration is necessary to recover the analog value from the pulse stream. We invented a PDM digital circuit which could perform leaky integration and showed that it could solve first-order differential equations in a continuous manner. In order to apply GHA to the circuit, we modified the learning equations from Eq.(1) to Eq.(4) in the following differential equations: where µ j (t) = ξ j (t), and dw ij (t) dt dµ ij (t) τ µ dt τ V dv i (t) dt = η ij V i (t)µ ij (t), (5) = µ ij (t) + (µ i 1,j (t) V i (t)w ij (t)), (6) = V i (t) + N w ij (t)ξ j (t), (7) where τ µ and τ V are time constants of a learning al and a neuron output, respectively. The time constant τ µ must be faster than τ V because as seen in Eq.(6) the learning al µ i 1,j (t) is used in w ij (t) and it should propagate fast enough to ensure the correct interaction between V i (t) and µ ij (t) as seen in Eq.(5). Although the number of principal components that the network can find will be limited by this propagation delay, the time constants can be adjusted as described below. This limitation is not explicit in the discrete GHA. j=1

ξ 1 W11 µ 11 W21 µ21 µμ 1,1 WM1 Four-quadrant multiplier ξ 2 ξ N W12 W1N µ12 W22 µ22 µμ 1,2 Inhibitory Dendrite Excitatory Dendrite Inhibitory Dendrite Excitatory Dendrite µ1ν W2N µ2ν µμ 1,Ν C.B. C.B. WM2 WMN Excitatory Dendrite C.B. Inhibitory Dendrite Input pulse frequency w = Sign f rate value 2 9 rate value Rate Multiplier w f Weight register < 1 w 9 bits plus To excitatory dendrite To inhibitory dendrite w f V 1 Figure 1: Structure of the network. Details are described in text. V 2 V Μ Figure 2: Structure of the synapse circuit. The symbol deates a rate multiplier and deates an exclusive OR circuit. Vi V i µ i-1,j µ i-1,j Up/Down Counter W ij w Sync. Sync. Up/Down Counter µ ij f main 2 ηij Excitatory Input Up Up/Down Control Circuit Counter Inhibitory Input Down (1 bit) Sign Output f µ ij µ ij f fmain 2 Figure 3: Structure of the circuit which updates synaptic weight. The notations are the same as in Figure 2. Figure 4: Structure of the cell body circuit. The notations are the same as in Figure 2. The control circuit resolves the conflict between simultaneous upand down-input to the counter. 3.2 Structure of the Network The structure of PCA learning network is schematically illustrated in Figure 1. The network consists of three circuits. They are synapse circuit, dendrite circuit, and cell body circuit. The input denoted by ξ j is fed to each neuron via the synaptic weight denoted by w ij. Learning al µ i 1,j that is calculated in the synapse w i 1,j is sent to the next synapse w ij to make the learning al µ ij as described in Eq.(6). Linear output V i of each neuron is fed back to all of its synapses to calculate a learning al in each synapse. 3.2.1 Circuit Each synapse circuit consists of two parts. One is to multiply a synaptic weight to the input. This multiplication is carried out by transforming input pulse frequency to the frequency which is proportional to the synaptic weight as shown in Figure 2. This transformation is carried out by a 9-bit rate multiplier and output frequency is given by rate value f output = 2 9 f input. (8) Since rate value is smaller than 2 9, the magnitude of the synaptic weight is smaller than 1. Since GHA uses a linear network, four-quadrant multiplication is necessary. Multiplication between the absolute value of an input and that of a synaptic weight is carried out by a rate multiplier. The result is fed to an excitatory dendrite circuit when the s of input and weight are the same or is fed to an inhibitory one when both s are different.

The other part is a learning circuit which consists of four rate multipliers and two up-down counters as shown in Figure 3. It faithfully realizes the computation defined by Eq.(5) and Eq.(6). In this circuit Eq.(6) is solved by the following integral form: µ ij (t) = 1 τ µ t [ µ ij (τ) + (µ i 1,j (τ) V i (τ)w ij (τ))] dτ + µ ij (), (9) where µ ij () is the initial value of learning al. The integration is carried out as follows. First we find V i w ij by four-quadrant multiplication and the result is fed either to up or down input of the up-down counter denoted by µ ij in the figure. The learning al denoted by µ i 1,j from the jth synapse of the (i 1)th neuron is also fed to either up or down input of the counter according to the. The absolute value of the counter is transformed to a pulse stream whose frequency is proportional to it. The negative feedback term µ ij in the above equation is realized by feeding the output pulses to the up input of the counter when the content is negative or to the down input when it is positive. The circuit block denoted by Sync. in the figure resolves the conflict between simultaneous upand down-input to the counter. Since we use 6-bit up-down counter, the time constant τ µ is given by τ µ = 26 = 26 = 6.4 µsec, (1) f max 1MHz where f max is the maximum output frequency. In realizing Eq.(5) by a digital circuit, a pulse stream of V i is fed to another rate multiplier where a rate value is given by a binary number of µ ij from the up-down counter. In order to multiply a learning constant, the output pulse is fed to another rate multiplier whose rate value is given by the register denoted by η ij. Then, the output pulse from this rate multiplier is fed to either up or down input of the other up-down counter, according to the s of V i and µ ij, that is denoted by w ij and that stores the weight. In the following, every η ij is set to 1 8. 3.3 Dendrite Circuit A dendrite circuit spatially sums synaptic output pulses by OR gates. Because the pulse stream does not have polarity, spatial summation of excitatory and inhibitory synaptic output pulses must be done separately. Although when more than one pulse come simultaneously, they are counted as one and linear summation cannot be taken place, asynchronous circuits can relax this problem. By driving each neuron by individual clock, output pulses from different synapses tend to be asynchronous especially when their frequencies are low. We have already theoretically and experimentally analyzed the summation characteristics and it has been shown that the linear summation occurs in a wide range of input frequencies [3]. 3.4 Cell Body Circuit In cell body circuit Eq.(7) is solved in the following integral form in the same way as the learning circuit: V i = 1 [ ] t N V i + w ik ξ k dτ + V i (). (11) τ V k=1 The structure of the circuit is shown in Figure 4. Since each neuron output must be linear, not only the output pulse stream whose frequency is proportional to the absolute value of the counter, but also the al is sent to the other components of the network. Since we used a 1-bit up-down counter, the time constant τ V is 12.4 µsec. 4 Performance of the Network 4.1 Structure of the Network The circuits described above were deed by VHDL (Very-high-speed-integrated-circuits Hardware Description Language) and were fabricated in a FPGA (Xilinx XC485XL). In order to evaluate the real-time

Sound Source Microphone 1 PCA learning Speaker 1 w 11 w 12 w 21 Microphone 2 Speaker 1 Microphone 1 Microphone 2 w 22 Speaker 2 Figure 5: A PCA learning network with two-microphone inputs and two-speaker outputs. performance of the circuits, a network with two-microphone inputs and two-speaker outputs is created as illustrated in Figure 5. The total number of gates that the circuits used was about 12,. A 2 MHz common clock drives the cell body circuit of each output neuron and the synapse circuits which are connect to the cell body circuit. Two output neurons are driven by different crystal oscillators so that they operate asynchronously. A sixteen-bit A/D converter, which samples microphone input at 44.1 KHz, is used for each input, but the al is resampled at 4 KHz and 1 MSBs among sixteen bits are used as the input to the learning network. A sixteen-bit D/A converter is used to produce analog output al for each speaker. 1 MSBs are used as in the case of A/D conversion. Since in this case a single sound source is used, a composite vector of the two microphone als will move along a diagonal line in the two-dimensional input space as shown in the right part of the figure. When the sound source is moved left and right in front of the microphones, the angle of the composite vector will change accordingly. By applying PCA learning to this input space, the eigenvector corresponding to the first principal component will indicate the direction of the diagonal line and will follow the change in the direction caused by the sound source movement. In this case the second principal component will, however, degenerate because variance in the orthogonal direction to the first principal vector is minimal. Therefore, the output sound will always appear at Speaker 1 and will be minimal at Speaker 2. It was shown that the network could find the second principal component in nonsingular cases successfully [11]. 4.2 The Performance An example of the time evolution of four synaptic weights in the network is shown in Figure 6. In this case identical als are supplied to both inputs. The weights w 11 and w 12 constituting the first principal component converged within 1 msec. They consist of a vector with unit length. On the other hand, the weights w 21 and w 22 constituting the second principal component were degenerated. They converged to zero weights at a slower rate than the first principal component. Even when the second principal component is not degenerated, the convergence takes a much longer time than that for the first principal component [11]. The loci of weight vectors for the first and the second principal components are shown in Figure 7 when the sound source was moved left and right in front of the microphones. The first and the second principal vectors obtained at the same sound position are deated by the same symbols. The loci on the unit circle are those for the first principal components and those inside the circle are for the second principal components. In all cases the output sound appeared only at Speaker 1. 5 Conclusions We have deed and fabricated a PCA learning algorithm in a FPGA by using an asynchronous PDM digital circuit. The performance of the circuit was evaluated by a learning network with two inputs and two outputs. It was verified that the circuit could find principal components and could follow changes in the statistical nature of multivariate inputs in real time. We are planning to apply the circuit to high-dimensional

1.5 1 1.5 Weight.5 -.5 w 12 w 11 w 21 w 22 w 12 & w 22 -.5-1 A B C D B E A F C D E F -1 5 1 15 2 25 3 35 4 45 5 t (ms) Figure 6: Time evolution of synaptic weights during PCA learning. -1.5-1.5-1 -.5.5 1 1.5 w 11 & w 21 Figure 7: Loci of weight vectors following environmental changes in real time. input cases and to al processing functions such as a real-time whitening filter. References [1] Mead, C.: Analog VLSI and Neural Systems. Addison-Wesley Publishing Company, Massachusetts, 1989. [2] Przytula, K.W. and Prasanna, V.K., Eds.: Parallel Digital Implementations of Neural Networks. Prentice Hall, New Jersey, 1993. [3] Hirai, Y.: A 1,-Neuron System with One Million 7-bit Physical Interconnections. In Advances in Neural Information Processing Systems 1, ed. Jordan, M.I., Kearns, M.J. and Solla, S.A., pp.75 711, The MIT Press, 1998. Web site: http://www.viplab.is.tsukuba.ac.jp/ [4] G. Cauwenberghs: A learning analog neural network chip with continuous-time recurrent dynamics. In J. D. Cowan, G. Tesauro and J. Alspector, Eds., Advances in Neural Information Processing Systems 6, Morgan Kaufmann Publishers, San Mateo, CA, pp.858-865, 1994 [5] Jutten, C. and Herault, J.: Blind separation of sources, Part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, Vol.24, pp.1 1, 1991. [6] Haykin, S.: Neural Networks: A Comprehensive Foundation. 2nd edition, Prentice Hall, New Jersey, 1999. [7] Oja, E.: A simplified neuron model as a principal component analyzer. J. Math. Biology, Vol.15, pp.267 273, 1982. [8] Sanger, T.D.: Optimal unsupervised learning in a single-layer linear feedforward neural network, Neural Networks, Vol.12, pp.459 473, 1989. [9] Kung, S.Y. and Diamantaras, K.I.: A neural network learning algorithm for adaptive principal component extraction (APEX). Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal processing, Vol.2, pp.861 864, 199. [1] Chatterjee, C., Roychowdhury, V.P. and Chong, E.K.P.: On relative convergence properties of principal component analysis algorithms. IEEE Trans. on Neural Networks, Vol.9, No.2, pp.319 329, 1998. [11] Nishizawa, K. and Hirai, Y.: Hardware implementation of PCA neural network. Proceedings of ICONIP 98, pp.85 88, 1998.