Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel

Similar documents
On Evolution of Relatively Large Combinational Logic Circuits

A Divide-and-Conquer Approach to Evolvable Hardware

An Evolutionary Approach to the Synthesis of Combinational Circuits

The Input Pattern Order Problem II: Evolution of Multiple-Output Circuits in Hardware

Gate-Level Optimization of Polymorphic Circuits Using Cartesian Genetic Programming

Using Genetic Algorithm in the Evolutionary Design of Sequential Logic Circuits

Image Filter Design with Evolvable Hardware

Design Methods for Polymorphic Digital Circuits

Incremental evolution of a signal classification hardware architecture for prosthetic hand control

EHW Architecture for Design of FIR Filters for Adaptive Noise Cancellation

Evolving and Analysing Useful Redundant Logic

Evolving Digital Logic Circuits on Xilinx 6000 Family FPGAs

Intrinsic Evolution of Analog Circuits on a Programmable Analog Multiplexer Array

Vol. 5, No. 6 June 2014 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Evolvable Hardware in Xilinx Spartan-3 FPGA

Co-evolution for Communication: An EHW Approach

Acceleration of Transistor-Level Evolution using Xilinx Zynq Platform

Abstract of PhD Thesis

STIMULATIVE MECHANISM FOR CREATIVE THINKING

Preface. Julian Francis Miller

Evolutionary Electronics

Artificial Neural Network Engine: Parallel and Parameterized Architecture Implemented in FPGA

Evolutionary Approach to Approximate Digital Circuits Design

Analog Electric Circuits Synthesis using a Genetic Algorithm Approach

Vesselin K. Vassilev South Bank University London Dominic Job Napier University Edinburgh Julian F. Miller The University of Birmingham Birmingham

Evolvable Hardware: From On-Chip Circuit Synthesis to Evolvable Space Systems

Easily Testable Image Operators: The Class of Circuits Where Evolution Beats Engineers

Mehrdad Amirghasemi a* Reza Zamani a

Wave Pipelined Circuit with Self Tuning for Clock Skew and Clock Period Using BIST Approach

Bridging the Gap Between Evolvable Hardware and Industry Using Cartesian Genetic Programming

A Flexible Model of a CMOS Field Programmable Transistor Array Targeted for Hardware Evolution

Design and Implementation of Complex Multiplier Using Compressors

VHDL based Design of Convolutional Encoder using Vedic Mathematics and Viterbi Decoder using Parallel Processing

Hardware Evolution. What is Hardware Evolution? Where is Hardware Evolution? 4C57/GI06 Evolutionary Systems. Tim Gordon

SYNTHESIS OF ADDER CIRCUIT USING CARTESIAN GENETIC PROGRAMMING

CHAPTER 5 NOVEL CARRIER FUNCTION FOR FUNDAMENTAL FORTIFICATION IN VSI

Using a Hardware Simulation within a Genetic Algorithm to Evolve Robotic Controllers

Automated FSM Error Correction for Single Event Upsets

SECTOR SYNTHESIS OF ANTENNA ARRAY USING GENETIC ALGORITHM

A High Definition Motion JPEG Encoder Based on Epuma Platform

Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

A Survey on Power Reduction Techniques in FIR Filter

Understanding Coevolution

UNIT-II LOW POWER VLSI DESIGN APPROACHES

Video Enhancement Algorithms on System on Chip

Implementation of Face Detection System Based on ZYNQ FPGA Jing Feng1, a, Busheng Zheng1, b* and Hao Xiao1, c

The Behavior Evolving Model and Application of Virtual Robots

Design of a High Throughput 128-bit AES (Rijndael Block Cipher)

Fault-Tolerant Evolvable Hardware Using Field-Programmable Transistor Arrays

International Journal of Scientific and Technical Advancements ISSN:

Pareto Evolution and Co-Evolution in Cognitive Neural Agents Synthesis for Tic-Tac-Toe

CYCLIC GENETIC ALGORITHMS FOR EVOLVING MULTI-LOOP CONTROL PROGRAMS

Implementation of FPGA based Decision Making Engine and Genetic Algorithm (GA) for Control of Wireless Parameters

On Built-In Self-Test for Adders

TAC Reconfiguration for Paging Optimization in LTE-Based Mobile Communication Systems

Real-Time License Plate Localisation on FPGA

Reconfigurable Hardware Implementation and Analysis of Mesh Routing for the Matrix Step of the Number Field Sieve Factorization

New Genetic Operators to Facilitate Understanding of Evolved Transistor Circuits

A HIGH PERFORMANCE HARDWARE ARCHITECTURE FOR HALF-PIXEL ACCURATE H.264 MOTION ESTIMATION

Computer Architecture Laboratory

Performance Analysis of Multipliers in VLSI Design

Body articulation Obstacle sensor00

Evolutionary Programming Optimization Technique for Solving Reactive Power Planning in Power System

A Genetic Algorithm-Based Controller for Decentralized Multi-Agent Robotic Systems

Chapter 1 Introduction

Performance Enhancement of the RSA Algorithm by Optimize Partial Product of Booth Multiplier

Evolutionary Optimization for the Channel Assignment Problem in Wireless Mobile Network

The Application of System Generator in Digital Quadrature Direct Up-Conversion

Mixed Synchronous/Asynchronous State Memory for Low Power FSM Design

Applying Mechanism of Crowd in Evolutionary MAS for Multiobjective Optimisation

Hardware Implementation of BCH Error-Correcting Codes on a FPGA

Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun s Multipliers

Scope. Evolution of digital circuits. Digital Circuits - Combinational. Agenda

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver

A-B NODES CLASSIFICATION FOR POWER ESTIMATION. Elías Todorovich and Eduardo Boemo *

Design of an optimized multiplier based on approximation logic

Optimizing the State Evaluation Heuristic of Abalone using Evolutionary Algorithms

An Optimized Design for Parallel MAC based on Radix-4 MBA

USING EMBEDDED PROCESSORS IN HARDWARE MODELS OF ARTIFICIAL NEURAL NETWORKS

Signal Processing and Display of LFMCW Radar on a Chip

A Survey on A High Performance Approximate Adder And Two High Performance Approximate Multipliers

The Application of Multi-Level Genetic Algorithms in Assembly Planning

Systolic array for computing the pixel purity index (PPI) algorithm on hyper spectral images

Low Power 32-bit Improved Carry Select Adder based on MTCMOS Technique

High Speed Speculative Multiplier Using 3 Step Speculative Carry Save Reduction Tree

DESIGN OF LOW POWER MULTIPLIER USING COMPOUND CONSTANT DELAY LOGIC STYLE

32-Bit CMOS Comparator Using a Zero Detector

An Optimized Performance Amplifier

CHAPTER 5 DESIGN OF COMBINATIONAL LOGIC CIRCUITS IN QCA

2. Simulated Based Evolutionary Heuristic Methodology

Decision Based Median Filter Algorithm Using Resource Optimized FPGA to Extract Impulse Noise

Implementing Logic with the Embedded Array

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

Neuromazes: 3-Dimensional Spiketrain Processors

Creating a Dominion AI Using Genetic Algorithms

Implementation of Space Time Block Codes for Wimax Applications

Design of Multiplier Less 32 Tap FIR Filter using VHDL

A New network multiplier using modified high order encoder and optimized hybrid adder in CMOS technology

An Efficient Baugh-WooleyArchitecture forbothsigned & Unsigned Multiplication

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Transcription:

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel Jin Wang 1, Chang Hao Piao 2, and Chong Ho Lee 1 1 Department of Information & Communication Engineering, Inha University, Incheon, Korea wangjin_liips@yahoo.com.cn 2 Department of Automation Engineering, ChongQing University of Posts and Telecommunications, Chongqing, China Abstract. To conquer the scalability issue of evolvable hardware, this paper proposes a multi-virtual reconfigurable circuit (VRC) cores-based evolvable system to evolve combinational logic circuits in parallel. The basic idea behind the proposed scheme is to divide a combinational logic circuit into several subcircuits, and each of them is evolved independently as a subcomponent by its corresponding VRC core. The virtual reconfigurable circuit architecture is designed for implementing real-world applications of evolvable hardware (EHW) in common FPGAs. In our approach, all the VRC cores are realized in a Xilinx Virtex xcv2000e FPGA as an evolvable system to achieve parallel evolution. The proposed method is evaluated on the evolutions of 3-bit multiplier and adder and compared to direct evolution and incremental evolution in the terms of computational effort and hardware implementation cost. Keywords: Intrinsic evolvable hardware, scalability, parallel evolutionary algorithm, incremental evolution. 1 Introduction As an alternative to conventional specification-based circuit design method, EHW has been introduced as an important paradigm for automatic circuit design in the last decade. However, there is still a long way to go before EHW become a real substitute to human designers. One of the problems appearing is that most of the evolved circuits are size limited [1, 2]. This is named as the scalability issue of EHW. In literature [2], Yao indicated that the existing evolvable systems were generally not scalable due to two reasons: (1) Chromosome string length of EHW, which increases with the target circuit size. A long chromosome string is required for representing a complex system. This often makes the search space too large that is difficult to explore even with evolutionary techniques. (2) Computational complexity of an evolutionary algorithm (EA), which is more pivotal factor than chromosome string length. Generally, the number of individual evaluations required to find a desired solution can increase drastically with the increased complexity of the target system. This paper focus on the investigation of scalability issues applied to the evolutionary design of combinational logic circuits. In our proposal, a combinational logic circuit will be decomposed as several sub-circuits. An evolvable system L. Kang, Y. Liu, and S. Zeng (Eds.): ICES 2007, LNCS 4684, pp. 23 34, 2007. Springer-Verlag Berlin Heidelberg 2007

24 J. Wang, C.H. Piao, and C.H. Lee including several VRC cores is employed to evolve the separate sub-circuits in parallel. Finally, the evolved sub-circuits interact to perform the expected top level logic function. Our proposed method approaches the scalability issue of EHW by speeding up the EA computation, shortening the chromosome length, and reducing the computational complexity of the task. Experiments of evolving 3-bit multiplier and adder are conducted in this paper to compare the execution time and the hardware cost of proposed evolutionary strategy with direct evolution and incremental evolution [3, 4, 5]. The rest of this paper is organized as follows: Section 2 briefly introduces the existed approaches to the scalability issue of EHW. The proposed scheme implementing parallel evolution with multi-vrc cores is presented in Section 3. Section 4 describes the hardware realization of the proposed evolvable system. Experimental results are summarized in Section 5 and discussed in Section 6. Section 7 concludes our work. 2 Previous Scalable Approaches to EHW Various approaches have been proposed to solve the scalability issue of EHW. Murakawa et al. [6] tackled the problem of scale in the evolved circuits by using function level evolution. By employing higher level functions as building block rather than multi-input gates, an important property of function level approach is that the size of chromosome remains limited while the complexity of circuits can grow arbitrarily. This approach in itself is reasonable, and it has also been considered in the application of evolving spatial image operators [7, 8]. While higher level functions allowed the designer to reduce the EA search space and to make evolution easier, its disadvantage could be that the evolved solutions do not exhibit any innovation in their structure [9]. On the other hand, to identify the suitable function blocks and thus to evolve efficient electronic circuits is a difficult and time-consuming task. Parallel genetic algorithms as one of the most promising choices to improve the computational ability of EA have been presented in different types [10, 11]. Using parallelism to increase the speed of evolution seems to be an answer to combat the high computational cost. While parallel evolution does offer limited relief on the high computational cost problems, it does not provide any new capabilities from the standpoint of computational complexity theory. For example, the computational complexity of evolving combinational logic circuits which grow exponentially with the size of the circuit inputs on traditional genetic algorithm still grow exponentially with the size of circuit inputs on parallel genetic algorithm. When billions of candidate circuits are evaluated to evolve even small combinational logic circuits (e.g. 4-bit multiplier), relying too much on sheer speedup of the EA itself seems not a reasonable solution to the computational complexity issue of EHW. A possible way to reduce the computational complexity of EHW is incremental evolution which is based on the principle of divide-and-conquer. Incremental evolution was first introduced by Torresen [3] as a scalability approach to EHW. According to incremental evolution strategy, different non-trial circuits have been successfully evolved by using both extrinsic [3, 4] and intrinsic EHW [5]. In this approach, a circuit is evolved through its smaller components. This means the

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel 25 evolution is first undertaken individually and serially on a set of sub-circuits, and then the evolved sub-circuits are employed as the building blocks which are used to further evolution or structure of the target circuit. A variation of incremental evolution, named Multiobjective genetic algorithm, has been suggested by Coello Coello et al. in [12]. In their case, each of the outputs of a combinational logic circuit is handled as an objective which is evolved in its corresponding subcomponent independently. Another scheme related to the idea of system partition is cooperative coevolution proposed by Potter and De Jong in literature [13]. It consists of serial evolution of coadapted subcomponents which interact to perform the function of the top system. 3 Description of the Proposed Approach Our proposed scheme approaches the scalability issue of EHW from three aspects: speeding up the EA, limiting the chromosome length, and decreasing the computational complexity of the problem. The main idea behind our proposed approach is to use parallel intrinsic evolution to handle subcomponents which are the decomposed versions of the top system. The circuit decomposition and assembly is inspired by the principle of divide-and-conquer which has been introduced in [3, 4, 5, 12, 14] to limit the computational complexity of EHW. A generalized hardware architecture for evolving decomposed subcomponents in parallel is introduced in this paper. This architecture, which we call multi-vrc cores, can realize parallel evolution in a single commercial FPGA. The feature of parallel intrinsic evolution of subcomponents is the most significant difference between our approach and the existing extrinsic EHW approaches [3, 4, 12, 13, 14], wherein subcomponents were evolved in a serial pattern by software simulation. 3.1 Decomposition of Logic Circuit There are commonly two strategies which have been introduced to decompose combinational logic circuits: Shannon decomposition and system output decomposition [4, 14]. In this paper, for the purpose of simplifying hardware implementation, only the second scheme-decomposition of system output is employed. Fig. 1 shows the system output decomposition approach for evolving a 2- bit multiplier which includes 4-bit input and 4-bit output. In this scenario, according to the system output decomposition strategy, the 4-bit output of multiplier is assigned into two groups as the vertical line in truth table indicates. Each partitioned 2-bit output is applied to evolve its corresponding subcomponents (subcomponent 1 and 2). Each of them is a 4-in/2-out circuit. Further, the evolved subcomponents can be assembled together (as shown in Fig. 1) to perform a correct multiplier function. Although this particular illustration shows 2 subcomponents, the actual number of subcomponents may be more. E.g. we can evolve a separate circuit for only 1-bit output, so four subcomponents with 4-bit input and 1-bit output would be required in this case.

26 J. Wang, C.H. Piao, and C.H. Lee 3.2 Evolutionary Algorithm Fig. 1. Partitioning output function of 2-bit multiplier In this work, a kind of intrinsic EHW which is built on the multi-vrc cores structure is used to evolve the partitioned subcomponents in parallel. Virtual reconfigurable circuit architecture was first proposed by Sekanina [8] for the purpose of implementing real-world applications of EHW in common FPGAs. The structure of the VRC is flexible, which can be designed for a given problem to fit the application requirements. For the application of evolving combinational logic circuits, Cartesian genetic programming (CGP)-based geometric structure has been implemented by Sekanina on VRC [15]. CGP was first introduced by Miller et al in [16], whose phenotype is a two-dimensional array of logic cells. In our approach, to reduce the chromosome length, a revised two-dimensional gates array which introduces more connection restrictions than standard CGP is employed. Compared with CGP in which each cell can get its inputs from the external inputs of cells array or the cell output in its previous layers, each gate in our proposed array is limited to connect to the gate outputs in its previous one layer. Some very similar gates array structures have been reported by Torresen [3] and Coello Coello [12] for learning combinational logic circuits in different literatures. The basic frame of the parallel evolutionary algorithm employed in our evolvable system is illustrated in Fig. 2. Though this particular illustration shows a parallel evolutionary algorithm designed for evolving two subcomponents, the actual number of subcomponents can be more. In this model, the population is divided into two subpopulations to maintain two decomposed subcomponents. The evolution of each subcomponent is performed according to the 1+ λ evolutionary strategy, where λ =2. Evolutionary operations are only based on selection and mutation operators. At the beginning phase, each subpopulation including λ individuals is created at random. Once the fitness of each individual is evaluated, the fittest individual is selected as the parent chromosome. The next generation of subpopulation is generated by using the fittest individual and its λ mutants. This process will repeat until the stop criteria of each subpopulation are achieved, which are defined as: (1) EA finds the expected solution for its corresponding subcomponent; (2) Predefined generation number is exhausted. In the evolutionary process, each VRC core maintains the evolution of its corresponding subpopulation independently.

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel 27 Fig. 2. Flow diagram of proposed parallel evolutionary algorithm for evolving two subcomponents Each subcomponent processes different decomposed system output function, so the fitness value of each individual in different subpopulation is calculated by comparing the subcomponent output with its corresponding partitioned desired system output as follows: Fitness = x ; where vector output 1 x = 0 output = expect output expect For each partitioned output vector, each processed single bit output is compared with its corresponding system expected output (which is labeled as expect). If they are equal, the variable x will be presented as 1 and be added to the fitness function. Fitness is the sum values for the compared results of all outputs (output) in the processed partitioned output vectors (vector) set with the truth table. (1) 4 Hardware Implementation Celoxica RC1000 PCI board [17] equipped with a Xilinx Virtex xcv2000e FPGA [18] (see Fig. 3) which has been successfully applied as a high performance, flexible and low cost FPGA-based hardware platform for different computationally intensive applications [19, 20] is employed as our experimental platform for the implementation and verification of the proposed multi-vrc cores architecture. The proposed evolvable system is composed of two main components (as shown in Fig. 3): Control and interface and several VRC cores. In the evolvable system, all operations of VRC cores are controlled by the control and interface which executes the commands from host PC and connects with the on board 8Mbytes SRAM. Each VRC core in the proposed evolvable system corresponds to a decomposed subcomponent which is defined in previous section. A VRC core consists of a Virtual reconfigurable circuit unit, an EA unit, and a Fitness unit. The EA unit implements the genetic operations and generates configuration bits string (chromosomes) to

28 J. Wang, C.H. Piao, and C.H. Lee configure the virtual reconfigurable circuit unit. The virtual reconfigurable circuit unit, whose function is virtual reconfigurable, processes the input data from four memory banks. Fitness unit calculates individual fitness by comparing the output from the virtual reconfigurable circuit unit with its corresponding partitioned output in truth table. Fig. 3. Organization of the proposed evolvable system with multi-vrc cores Every virtual reconfigurable circuit unit in one VRC core can be considered as a digital circuit which acts as a decomposed subcomponent of the top system. Fig. 4 presents a virtual reconfigurable circuit unit designed for evolving a 6-input/3-output subcomponent as an example. It consists of 43 uniform function elements (FE) allocated in an array of 6 columns and 8 rows. In the last column of FE array, the amount of FEs is equal to the number of system outputs, and each FE corresponds to one system output. Every FE has two inputs and can be configured to perform one of 8 logic functions listed in Fig. 4. The input connections of each FE are selected by its two equipped multiplexers. An input of the FE in column 2 (3, 4, 5 or 6) can be connected to any one output of FE in its previous one column. In column 1, each input of FE can be connected to any one of the system inputs or defined as a bias of value 1 or 0. Each FE is equipped with a Flip-flop to support pipeline processing. A column of FEs is considered as a single stage of the pipeline. Each FE needs 9 bits to determine its input connections (3+3 bits) and function (3 bits). Although the number of configuration bits required in column 6 is lower than 72 bits, the configuration memory still employs 72 bits per column in our implementation to simplify hardware design. Hence, the configuration memory process 72 6=432 bits. In our approach, eight FEs in the same column are configured simultaneously, so the 432-bit data in the configuration memory is divided and stored in 6 configuration banks (cnfbank) of 72 bits.

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel 29 Fig. 4. Virtual reconfigurable circuit unit for the evolution of the 6-in/3-out subcomponent Fig. 5. Architecture of EA unit Fig. 5 describes the architecture of the EA unit designed for the evolution of a 6- input/3-output subcomponent. When the EA is activated, the population memory is filled by two chromosomes, which are the mutated versions of two 6 72 bits random numbers generated by the Random Number Generator (RNG) with linear cellular automata [21]. The mutation unit processes 6 72 bits data in 6 clocks per 72 bits. Only randomly selected bits are inverted and the number of mutated bits is decided by the predefined mutation rate (which is 0.8% in this work). After all chromosomes of the initial population have been evaluated, the best chromosome is chosen as the parent chromosome to be stored in the best chromosome memory. The new population is generated using the parent chromosome and its 2 mutants. If the fitness

30 J. Wang, C.H. Piao, and C.H. Lee of the offspring chromosome is better than that of the parent, the parent chromosome that is stored in the best chromosome memory will be replaced. Fitness calculation is realized in the Fitness unit. The input training vectors set is loaded from onboard SRAM and processed as the input of the VRC unit. The output vectors of the VRC unit are sent to the Fitness unit and compared with the partitioned expected output set specified in a truth table (that are also stored in onboard SRAM). The fitness value is increased for every output bit match. Therefore, the maximal fitness value is 64 3 (the size of the decomposed system output in this scenario is 3-bit). 5 Experimental Results Our proposed multi-vrc cores-based evolvable system was designed by using VHDL and synthesized into Virtex xcv2000e FPGA using Xilinx ISE 8.1. According to our synthesis report, in all cases, the proposed evolvable system can be operated more than 90MHz. However, the actual hardware experiment was run at 30MHz because of easier synchronization with PCI interface which can operate correctly with PCI bus clocks from zero to 33MHz. In this paper, 3-bit multiplier and 3-bit adder were employed as evolutionary targets to illustrate our proposed scheme. Three different evolutionary strategies were used in the experiments: (1) direct evolution which was employed in [15, 22]; (2) incremental evolution proposed by Torresen [4] with partitioned training vector strategy only; (3) our proposed multi- VRC cores-based scheme. The maximum number of generations of one EA run was set to 2 27 in all evolutionary strategies. To achieve the reasonable hardware cost and performance, a uniform 8 6 FE array was employed to evolve all the decomposed subcomponents in incremental evolution and our proposed strategy. A 12 6 FE array was selected for the direct evolution of 3-bit multiplier, which was depended on our previous experiments. No feasible 3-bit multiplier solution can be evolved using smaller size FE arrays (e.g. 10 6, 8 8). To simplify hardware design, this 12 6 FE array was also employed to directly evolve 3-bit adder. In the evolution of 3-bit multiplier, with the proposed scheme, two and three subcomponents-based system partitions were implemented individually. To achieve more symmetrical computational complexity in each decomposed subcomponent, the system output partitions were executed as follows: (1,3,5), (2,4,6) for 2 subcomponents and (1,4), (2,5), (3,6) for 3 subcomponents. The same system decomposition rule was also employed in incremental evolution. The comparisons of the device cost, the chromosome length in each subcomponents, the times of successful EA runs to find feasible logic circuits (times of success), the average and standard deviation values of the number of generations, and the average total evolution time produced by direct evolution, incremental evolution, and multi-vrc cores-based approach are shown in table 1. We performed 100 runs for each case. The evolvable 3-bit adder includes 6-input and 4-output, wherein input carry is not considered in this work. Only two subcomponents-based system decomposition was implemented in our experiment. Table 2 summarizes all results for evolving 3-bit adder under different settings. All average experimental results are from 100 independent EA runs for each case.

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel 31 Table 1. The results of evolving 3-bit multiplier with different evolution strategies EA type Direct evolution Chromosome Number of generations (bits) avg. std. dev. Incremental evolution Multi- VRC cores Divided outputs Device cost (slices) Total evolution time (avg.) Times of success 1-6 4274 792 18081238 19461400 77.147 sec 50 1,3,5 2,4,6 1,4 2,5 3,6 1,3,5 2,4,6 1,4 2,5 3,6 2984 432 3423 432 1526060 2122824 505963 391404 133943 1803625 2297710 494748 353602 165753 15.569 sec 4.400sec 58 61 70 67 60 4505 432 2450171 2326822 10.454 sec 61 6422 432 610772 409938 2.606 sec 57 Table 2. The results of evolving 3-bit adder with different evolution strategies EA type Direct evolution Incremental evolution Multi-VRC cores Divided outputs Device cost (slices) Chromosome Number of generations (bits) avg. std. dev. Total evolution time (avg.) Times of success 1-4 4130 792 380424 465868 1.623 sec 47 1,3 2,4 1,3 2,4 2948 432 48011 58684 63766 70317 0.455 sec 4460 432 69256 62631 0.295 sec 51 56 57 6 Discussion We have presented the results of our initial experiments on the multi-vrc coresbased evolvable system. The analysis is conducted on the two examples presented in this paper. In all cases, the times of successful EA runs to find feasible logic circuits in direct evolution, incremental evolution, and our proposed approach are comparable. Since our main motive of the approach proposed in this paper is to develop an efficient evolvable system for conquering the scalability issue of EHW, we need to perform another comparison wherein we analyze the computational cost required by the three mentioned approaches. The computational costs of different types of evolutionary strategies can be evaluated by using the average total system evolution time. It can be clearly appreciated that the multi-vrc cores-based EHW outperforms other approaches in all cases. All results indicated the execution time for EA learning significantly depends on the levels of the system decomposition selected. For the evolution of 3-bit multiplier, the speedup obtained by means of multi-vrc cores with two decomposed subcomponents is 7.4 (against direct evolution) and 1.5 (against incremental evolution with two subcomponents). With three subcomponents-based

32 J. Wang, C.H. Piao, and C.H. Lee decomposition, the speedup of multi-vrc cores-based EHW is 29.6 (against direct evolution) and 1.7 (against incremental evolution with three subcomponents). Similar compared results can be obtained in the evolutions of 3-bit adder. We can consider the proposed multi-vrc cores-based EHW is a hybridization of parallel intrinsic evolution and divide-and-conquer approach-based incremental evolution. The better performance related to computational costs obtained by our approach is mainly due to two new features of our proposed evolvable system: (1) Parallel implemented multi-vrc cores with powerful computational ability. In our approach, all the RC cores are implemented in a FPGA, which executes the evolution of subcomponents (e.g. fitness evaluation and genetic operations) in parallel. The most obvious advantage of this implementation is that the evolution of each subcomponent is completely pipelined and parallel. The proposed hardware implementation gives us a promise to conquer the overhead introduced by slow inter processor communications, setup time issues in general multi-processors based parallel evolution. (2) System decomposition strategy. The main advantage of decomposition system output is that evolution is performed in some smaller subcomponents with less output than top system. The number of gates required for implementation each subcomponent can be reduced for the smaller size of system output. A shorter chromosome can be employed to present each subcomponent. For example, in our experiments, the chromosome length in each subcomponent was reduced from 792 to 432. On the other hand, decomposed output function also reduces the computational complexity of the problems to be solved. Therefore, by partitioning system output, a simpler and smaller EA search space can be achieved in the evolution of each subcomponent. Another interesting observation is the hardware implementation cost. With the introduction of system decomposition strategy, the device cost is larger than that needed for traditional direct evolution approach. More VRC cores are required in our proposed approach than in direct evolution (which employed only one VRC). However, the increase of device cost is not very significant in multi-vrc cores-based EHW. From our synthesis results, the hardware cost in two-vrc cores-based approach is very close to the result in direct evolution. In three-vrc cores-based approach, the device cost increase is 1.5 times as compared to direct evolution. This feature is due to smaller inputs/outputs combination is employed in the decomposed subcomponents. Corresponding to the Shannon s effect [15], we can employ smaller size of FE array and chromosome memory to implement each partitioned subcomponents. On the other hand, we need to remind that our original motive is to conquer the existed scalability issue of EHW. Device cost is not considered as a serious issue in our work, because the number of transistors becoming available in circuits increases according to Moore s Law. 7 Conclusion In this paper, we have presented a novel scalable evolvable hardware, known as multi-vrc cores architecture, for synthesizing combinational logic circuits. The proposed approach uses a divide-and-conquer-based technique to decompose the top system into several subcomponents. Then, all subcomponents are independently evolved by their corresponding VRC cores in parallel. The experimental results show

Implementing Multi-VRC Cores to Evolve Combinational Logic Circuits in Parallel 33 that our proposed scheme performs significantly better than direct evolution and incremental evolution in term of the EA execution time. Both of 3-bit multiplier and adder are able to be evolved in less than 3 seconds, which is untouchable by any other reported evolvable systems. Future work will be devoted to apply this scheme to other more complex real-world applications. Acknowledgments. This work was supported by the Korean MOCIE under research project 'Super Intelligent Chip Design'. References 1. Torresen, J.: Possibilities and Limitations of Applying Evolvable Hardware to Real-world Application. In: Proc. of the 10th International Conference on Field Programmable Logic and Applications, FPL-2000, Villach, Austria, pp. 230 239 (2000) 2. Yao, X., Higuchi, T.: Promises and Challenges of Evolvable Hardware. IEEE Transactions on Systems, Man, and Cybernetics 29(1), 87 97 (1999) 3. Torresen, J.: A Divide-and-Conquer Approach to Evolvable Hardware. In: Sipper, M., Mange, D., Pérez-Uribe, A. (eds.) ICES 1998. LNCS, vol. 1478, pp. 57 65. Springer, Heidelberg (1998) 4. Torresen, J.: Evolving Multiplier Circuits by Training Set and Training Vector Partitioning. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 228 237. Springer, Heidelberg (2003) 5. Wang, J., et al.: Using Reconfigurable Architecture-Based Intrinsic Incremental Evolution to Evolve a Character Classification System. In: Hao, Y., Liu, J., Wang, Y.-P., Cheung, Y.-m., Yin, H., Jiao, L., Ma, J., Jiao, Y.-C. (eds.) CIS 2005. LNCS (LNAI), vol. 3801, pp. 216 223. Springer, Heidelberg (2005) 6. Murakawa, M., et al.: Hardware Evolution at Function Level. In: Ebeling, W., Rechenberg, I., Voigt, H.-M., Schwefel, H.-P. (eds.) PPSN IV 1996. LNCS, vol. 1141, pp. 62 71. Springer, Heidelberg (1996) 7. Zhang, Y., et al.: Digital Circuit Design Using Intrinsic Evolvable Hardware. In: Proc. Of the 2004 NASA/DoD Conference on the Evolvable Hardware, pp. 55 63. IEEE Computer Society Press, Los Alamitos (2004) 8. Sekanina, L.: Virtual Reconfigurable Circuits for Real-World Applications of Evolvable Hardware. In: Tyrrell, A.M., Haddow, P.C., Torresen, J. (eds.) ICES 2003. LNCS, vol. 2606, pp. 186 197. Springer, Heidelberg (2003) 9. Sekanina, L.: Evolutionary Design of Digital Circuits: Where Are Current Limits? In: Proc. of the First NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2006, pp. 171 178. IEEE Computer Society Press, Los Alamitos (2006) 10. Gordon, V.S., Whitley, D.: Serial and Parallel Genetic Algorithms as Function Optimizers. In: Proc. of the Fifth International Conference on Genetic Algorithms, pp. 177 183. Morgan Kaufmann, San Mateo, CA (1993) 11. Cantu-Paz, E.: A Survey of Parallel Genetic Algorithms. Calculateurs Parallels 10(2), 141 171 (1998) 12. Coello Coello, C.A., Aguirre, A.H.: Design of Combinational Logic Circuits Through an Evolutionary Multiobjective Optimization Approach. Artificial Intelligence for Engineering, Design, Analysis and Manufacture 16(1), 39 53 (2002) 13. Potter, M.A., De Jong, K.A.: Cooperative Co-evolution: An Architecture for Evolving Coadapted Subcomponents. Evolutionary Computation 8(1), 1 29 (2000)

34 J. Wang, C.H. Piao, and C.H. Lee 14. Kalganova, T.: Bidirectional Incremental Evolution in Extrinsic Evolvable Hardware. In: Proc. of the 2nd NASA/DoD Workshop on Evolvable Hardware, pp. 65 74. IEEE Computer Society Press, Los Alamitos (2000) 15. Sekanina, L., et al.: An Evolvable Combinational Unit for FPGAs. Computing and Informatics 23(5), 461 486 (2004) 16. Miller, J.F., Thomson, P.: Cartesian Genetic Programming. In: Poli, R., Banzhaf, W., Langdon, W.B., Miller, J., Nordin, P., Fogarty, T.C. (eds.) EuroGP 2000. LNCS, vol. 1802, pp. 121 132. Springer, Heidelberg (2000) 17. Celoxica Inc., RC1000 Hardware Reference Manual V2.3 (2001) 18. http://www.xilinx.com 19. Martin, P.: A Hardware Implementation of a Genetic Programming System Using FPGAs and Handel-C. Genetic Programming and Evolvable Machines 2(4), 317 343 (2001) 20. Bensaali, F., et al.: Accelerating Matrix Product on Reconfigurable Hardware for Image Processing Applications. IEE proceedings-circuits, Devices and Systems 152(3), 236 246 (2005) 21. Wolfram, S.: Universality and Complexity in Cellular Automata. Physica 10D, 1 35 (1984) 22. Miller, J.F., et al.: Principles in the Evolutionary Design of Digital Circuits Part I. Journal of Genetic Programming and Evolvable Machines 1(1), 7 35 (2000)