Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter

Similar documents
A Survey on Power Reduction Techniques in FIR Filter

Tirupur, Tamilnadu, India 1 2

The Comparative Study of FPGA based FIR Filter Design Using Optimized Convolution Method and Overlap Save Method

Innovative Approach Architecture Designed For Realizing Fixed Point Least Mean Square Adaptive Filter with Less Adaptation Delay

International Journal of Advanced Research in Computer Science and Software Engineering

Design and Implementation of High Speed Carry Select Adder

Implementation of Parallel Multiplier-Accumulator using Radix- 2 Modified Booth Algorithm and SPST

Design and Implementation of Parallel Micro-programmed FIR Filter Using Efficient Multipliers on FPGA

Implementation of FPGA based Design for Digital Signal Processing

REALIAZATION OF LOW POWER VLSI ARCHITECTURE FOR RECONFIGURABLE FIR FILTER USING DYNAMIC SWITCHING ACITIVITY OF MULTIPLIERS

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

Design and Implementation of Scalable Micro Programmed Fir Filter Using Wallace Tree and Birecoder

EFFICIENT FPGA IMPLEMENTATION OF 2 ND ORDER DIGITAL CONTROLLERS USING MATLAB/SIMULINK

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

Design of Multiplier Less 32 Tap FIR Filter using VHDL

Design and Implementation of High Speed Carry Select Adder Korrapatti Mohammed Ghouse 1 K.Bala. 2

Performance Analysis of an Efficient Reconfigurable Multiplier for Multirate Systems

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

Design and Simulation of 16x16 Hybrid Multiplier based on Modified Booth algorithm and Wallace tree Structure

JDT LOW POWER FIR FILTER ARCHITECTURE USING ACCUMULATOR BASED RADIX-2 MULTIPLIER

An Optimized Design for Parallel MAC based on Radix-4 MBA

Area Efficient and Low Power Reconfiurable Fir Filter

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

Field Programmable Gate Arrays based Design, Implementation and Delay Study of Braun s Multipliers

Modified Booth Multiplier Based Low-Cost FIR Filter Design Shelja Jose, Shereena Mytheen

Performance Analysis of FIR Filter Design Using Reconfigurable Mac Unit

FIR Filter Design on Chip Using VHDL

A Low-Power Broad-Bandwidth Noise Cancellation VLSI Circuit Design for In-Ear Headphones

Resource Efficient Reconfigurable Processor for DSP Applications

Design of a High Speed FIR Filter on FPGA by Using DA-OBC Algorithm

HIGH PERFORMANCE BAUGH WOOLEY MULTIPLIER USING CARRY SKIP ADDER STRUCTURE

INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATION ENGINEERING & TECHNOLOGY (IJECET) STUDY ON COMPARISON OF VARIOUS MULTIPLIERS

VLSI IMPLEMENTATION OF MODIFIED DISTRIBUTED ARITHMETIC BASED LOW POWER AND HIGH PERFORMANCE DIGITAL FIR FILTER Dr. S.Satheeskumaran 1 K.

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

An Design of Radix-4 Modified Booth Encoded Multiplier and Optimised Carry Select Adder Design for Efficient Area and Delay

[Devi*, 5(4): April, 2016] ISSN: (I2OR), Publication Impact Factor: 3.785

An area optimized FIR Digital filter using DA Algorithm based on FPGA

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

FPGA Implementation of High Speed FIR Filters and less power consumption structure

High Speed Binary Counters Based on Wallace Tree Multiplier in VHDL

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

COMPARISION OF LOW POWER AND DELAY USING BAUGH WOOLEY AND WALLACE TREE MULTIPLIERS

Design of Digital FIR Filter using Modified MAC Unit

VLSI Implementation of Digital Down Converter (DDC)

Design of a Power Optimal Reversible FIR Filter ASIC Speech Signal Processing

Index Terms. Adaptive filters, Reconfigurable filter, circuit optimization, fixed-point arithmetic, least mean square (LMS) algorithms. 1.

DA based Efficient Parallel Digital FIR Filter Implementation for DDC and ERT Applications

Data Word Length Reduction for Low-Power DSP Software

IMPLEMENTATION OF AREA EFFICIENT MULTIPLIER AND ADDER ARCHITECTURE IN DIGITAL FIR FILTER

FPGA Implementation of Desensitized Half Band Filters

ISSN Vol.03,Issue.02, February-2014, Pages:

Research Article Design of a Novel Optimized MAC Unit using Modified Fault Tolerant Vedic Multiplier

DESIGN OF FIR FILTER ARCHITECTURE USING VARIOUS EFFICIENT MULTIPLIERS Indumathi M #1, Vijaya Bala V #2

Technology Timeline. Transistors ICs (General) SRAMs & DRAMs Microprocessors SPLDs CPLDs ASICs. FPGAs. The Design Warrior s Guide to.

AN EFFICIENT MAC DESIGN IN DIGITAL FILTERS

FIR_NTAP_MUX. N-Channel Multiplexed FIR Filter Rev Key Design Features. Block Diagram. Applications. Pin-out Description. Generic Parameters

A MODIFIED ARCHITECTURE OF MULTIPLIER AND ACCUMULATOR USING SPURIOUS POWER SUPPRESSION TECHNIQUE

FPGA Implementation of Serial and Parallel FIR Filters by using Vedic and Wallace tree Multiplier

International Journal of Scientific & Engineering Research Volume 3, Issue 12, December ISSN

Modified Design of High Speed Baugh Wooley Multiplier

FPGA Implementation Of LMS Algorithm For Audio Applications

Design of Low Power Column bypass Multiplier using FPGA

IN SEVERAL wireless hand-held systems, the finite-impulse

Design and Performance Analysis of a Reconfigurable Fir Filter

Design and Performance Analysis of 64 bit Multiplier using Carry Save Adder and its DSP Application using Cadence

On Built-In Self-Test for Adders

Comparative Analysis of Multiplier in Quaternary logic

An Efficient Method for Implementation of Convolution

International Journal of Advance Engineering and Research Development

Optimized FIR filter design using Truncated Multiplier Technique

Design and Characterization of 16 Bit Multiplier Accumulator Based on Radix-2 Modified Booth Algorithm

OPTIMIZATION OF LOW POWER USING FIR FILTER

ASIC Design and Implementation of SPST in FIR Filter

VLSI Design and FPGA Implementation of N Binary Multiplier Using N-1 Binary Multipliers

AUTOMATIC IMPLEMENTATION OF FIR FILTERS ON FIELD PROGRAMMABLE GATE ARRAYS

Partial Reconfigurable Implementation of IEEE802.11g OFDM

A Novel High-Speed, Higher-Order 128 bit Adders for Digital Signal Processing Applications Using Advanced EDA Tools

Design of an optimized multiplier based on approximation logic

Design and Simulation of Convolution Using Booth Encoded Wallace Tree Multiplier

KEYWORDS: FIR filter, Implementation of FIR filter, Micro programmed controller. Figure 1.1 block diagram of DSP

VLSI Implementation of Reconfigurable Low Power Fir Filter Architecture

Implementing Logic with the Embedded Array

ISSN Vol.07,Issue.08, July-2015, Pages:

Single Chip FPGA Based Realization of Arbitrary Waveform Generator using Rademacher and Walsh Functions

Keywords: Column bypassing multiplier, Modified booth algorithm, Spartan-3AN.

Design and Implementation of Wallace Tree Multiplier Using Kogge Stone Adder and Brent Kung Adder

Mahendra Engineering College, Namakkal, Tamilnadu, India.

Review Paper on an Efficient Processing by Linear Convolution using Vedic Mathematics

A FPGA Implementation of Power Efficient Encoding Schemes for NoC with Error Detection

Design and Simulation of Low Power and Area Efficient 16x16 bit Hybrid Multiplier

Area and Delay Efficient Carry Select Adder using Carry Prediction Approach

Implementation and Performance Evaluation of Prefix Adders uing FPGAs

An Efficient Implementation of Downsampler and Upsampler Application to Multirate Filters

An Efficient SQRT Architecture of Carry Select Adder Design by HA and Common Boolean Logic PinnikaVenkateswarlu 1, Ragutla Kalpana 2

A New Architecture for Signed Radix-2 m Pure Array Multipliers

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

High Speed Vedic Multiplier in FIR Filter on FPGA

FPGA based Asynchronous FIR Filter Design for ECG Signal Processing

Transcription:

IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 5, Ver. II (Sep. - Oct. 2016), PP 15-21 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Globally Asynchronous Locally Synchronous (GALS) Microprogrammed Parallel FIR Filter Gouri Wazurkar 1, Dr. S. L. Badjate 2 1 Department of Electronics Engineering, Shri Ramdeobaba College of Engineering & Management, Nagpur. 2 Principal, S. B. Jain Institute of Technology, Management & Research, Nagpur. Abstract: In thispaper, we propose the design of globallyasynchronouslocallysynchronous (GALS) microprogrammedparallelfinite impulse response (FIR) filterusingpipelined GALS Baugh Wooley Multiplier. The primary objective is to demonstratelow power implementation of microprogrammedparallel GALS FIR filter for digital signal processing applications. Fullysynchronousmicroprogrammedparallel FIR filter and GALS microprogrammed FIR filter are implementedusingsame FPGA and almostsamelogiccells for fairbenchmarking. The four tapsynchronous and GALS microprogrammedparallel FIR filteriscoded in VHDL and implemented in vertex 5 FPGA device. GALS microprogrammedparallel FIR filteris more power efficient as compared to synchronousfilter. Keywords: Low power, GALS, Microprogrammed, Parallel, FIR Filter. I. Introduction Low power operation is desirable in all the digital signal processing systems. Most of the digital signal processing systems is fully synchronous but a trend, as detailed in the International Technology Roadmap for Semiconductors (ITRS) [1], for an increasing use of asynchronous logic from the present 15 % to 49 % in 2024.In some of these SoC s, asynchronous signaling scheme [2], [3] were used for synchronization between the different fully synchronous modules that is opposed to fully asynchronous systems where asynchronous signaling scheme are used for both between modules (inter-modules) and within modules (intra-module). This hybrid inter-module asynchronous cum intra-module synchronous, termed Globally-Asynchronous-Locally- Synchronous (GALS) may be advantageously exploited to simplify some challenging design issues [4]. For the asynchronous-to-synchronous data transfer [2], [5] or vice-versa, the GALS approaches may be generally categorized by its clocking schemes, pausible clocking and the data-driven clocking. FIR filter is the fundamental digital signal processing (DSP) operation for many DSP systems. It finds applications in audio, image and video processing, wireless communication, noise removal etc. In most of the applications digital filters are used to implement frequency-selective operations. Therefore, specifications are required in the frequency-domain in terms of the desired magnitude and phase response of the filter. FIR with constant coefficients is a linear time invariant digital filter. The output of an FIR of order or length (N), to an input time-series x[n], is given by a finite version of the convolution sum given in (1) and (2), y n = x n h[n] (1) N 1 y n = k=0 x k h[n k] (2) where h[n] is called as filter coefficients or impulse response and y[n] is the output signal. For linear time invariant system, it can be expressed in Z domain as given in (3) y z = x z h(z) (3) where h(z) is the FIR filter transfer function defined in Z domain by (4) N 1 k=0 h k z k (4) h z = Direct form implementation of linear time invariant FIR filter using delay element, adder and multiplier is shown in fig 1. Fig. 1 Direct form FIR filter DOI: 10.9790/4200-0605021521 www.iosrjournals.org 15 Page

The difference equation for 4-tap FIR filter (N = 4) is given in (5) y n = 3 k=0 x k h[n k] y n = x 0 h n + x 1 h n 1 + x 2 h n 2 + x 3 h[n 3] (5) Direct form FIR filters are also known as tapped delay line or transversal filters. The size of the FIR filter is determined by the number of coefficients h[n]. N-tap FIR filter consist of N delay elements, N multipliers and N-1 adders or accumulators. Generally a linear phase response in the pass band is desirable for many applications especially in communication. It is shown in [6] that linear phase is achieved if the impulse response is symmetric or anti-symmetric and hence it is preferable to use anti-causal framework [7] given in (6) obtained from (4) (N 1)/2 h z = k= (N 1)/2 h k z k (6) Due to advances in technology, many researchers are trying to design FIR filter architecture which can offer one or more of the following design advantages such as high speed, low power consumption and less area. DSP functions are generally implemented in general purpose DSP processors where built in multiply accumulate (MAC) engines are used to perform mathematical operations. Application specific integrated circuits (ASICs) can also be used where high performance is needed or design volume is high enough to justify the non-recurring engineering (NRE) cost [8]. However, field programmable logic (FPGA) offers the better of the two technologies in addition to the re-configurability feature of the hardware platform. An important factor in a DSP processor is the limitation on hardware resources such as MAC engines. This is not an issue with FPGAs since these devices offer sufficient capacity to fit plenty of MAC processors into a single device. The performance of the parallel FIR is determined by multiplier. Modified Booth (MB) encoding reduces to half the number of partial products resulting in reduced area, critical delay and power consumption.however, a dedicated encoding circuit is required and the partial products generation is more complex [9]. Baugh Wooley 2 s complement multiplier offers better sign bit management, uniform VLSI structure and no complex encoding circuits that result in compact circuit. The biggest advantage of compact and uniform structure is implementation of pipelining that easily divides the partial product generation stages and increases speed of operation [9]. In this paper, we proposed FPGA implementation of GALSmicroprogrammed parallel 4-tap FIR filter and its comparison with fully synchronous parallel microprogrammed 4-tap FIR filter using GALS & synchronous Baugh Wooley multiplier respectively given in [10]. The primary objective of the design is to demonstrate low power implementation of GALS FIR Filter. The paper is organized as follows section I introduces GALS and FIR Filter, section II describes Baugh Wooley multiplier and section III describes microprogrammed FIR filter architecture. Section IV provides in detail design of synchronous and GALSmicroprogrammed parallel FIR filter. Results are discussed in section V and finally concluded in section VI. II. Baugh Wooley Multiplier The Baugh Wooley multiplication algorithm [11] is developed to designed regular 2 s complement multipliers. It effectively handles sign bit during the computation of partial products. Let a and b be the two n- bit signed numbers can be represented as, a = a n 1 2 n 1 + b = b n 1 2 n 1 + The result of multiplication of a and b is represented as p = axb i=0 2 i a i (7) j =0 2 j b j (8) = a n 1 2 n 1 + 2 i a i x b n 1 2 n 1 + i=0 2 j b j j =0 = a n 1 b n 1 2 2 + i=0 2 i a i j =0 2 j b j a n 1 2 n 1 j =0 2 j b j b n 1 2 n 1 i=0 2 i a i (9) The last two terms in equation (9) are n-1 bits each that are extended from position 2 n-1 to 2 2n-3. We pad zeros to remaining bits to obtain 2n bit number in order to extent binary weight from 2 0 to 2 2n-1. Rather than subtracting the last two terms, we can obtain 2 s complement of the last two terms and add all terms to obtain final product. Let z be one of the last two terms, it can be represented in equation (10) with zero padding. DOI: 10.9790/4200-0605021521 www.iosrjournals.org 16 Page

z = 0 x2 2n 1 + 0 x2 2 + 2 n 1 j =0 2 j z j + 0 x r=0 2 r (10) Table I Bit values for -Z Bit position Bit Values 2n-1 1 2n-2 1 2n-3 Z n-2 2n-4 Z n-3 2n-5 Z n-4 n Z 1 n-1 Z 0+1 1 0 0 0 Table II Bit patterns Bit position 2n-1 2n-2 n n-1 + 1 1 1 1 1 1 Carry in 1 0 / 1 1 0 / 1 Sum 0 / 1 0 / 1 After obtaining 2 s complement of z, the new bit value for z is shown in table I.Let z 1 and z 2 be last two terms in equation (3) then addition of z 1 + (-z 2 ) results in following bit patterns at most significant bits and bit position n shown in table II. Hence the product p in equation (9) can be given as j =0 2 i+j p = a n 1 b n 1 2 2 + i=0 a i b j + 2 n 1 j =0 2 j b j a n 1 + 2 n 1 i=0 2 i a i b n 1 2 2n 1 + 2 n (11) Let us assume if a and b are 8-bit numbers then product p is given as p = a 7 b 7 2 14 6 + i=0 6 j =0 2 i+j a i b j + 2 7 6 j =0 2 j b j a 7 + 2 7 6 i=0 2 i a i b 7 2 15 + 2 8 (12) Fig. 2 shows the implementation structure of 4-bit Baugh Wooley multiplier and fig. 3 shows the corresponding internal structure of cells. Fig. 2 Structure 4-bit Baugh Wooley multiplier DOI: 10.9790/4200-0605021521 www.iosrjournals.org 17 Page

Fig. 3 Internal structure of cells III. MicroprogrammedFIRFilter Architecture The microprogrammed FIR filter architecture consist of datapath unit and control unit [12]. The function of data path unit is to perform multiplication and addition operation on the applied input signal and impulse response. Control unit generates various control signals for data path. Fig. 4 shows the block diagram of microprogrammed FIR filter. Fig. 4Microprogrammed FIR filter architecture The architecture of the data path unit can be classified as sequential and parallel depending upon the method adopted for computing output signal. The architecture of datapath completely depends on the nature of application. Typically it consists of multipliers, adders, data registers and multiplexers. Data registers acts as a memory to hold input signal and filter coefficients for computing. Multiplexer are used to route the appropriate data to multipliers in accordance with (2). Two approaches can be adopted for designing control unit, hardwired and microprogrammed. Microprogrammed control unit stores the microinstructions inside the memory that can be fetched using address decoding logic. These microinstructions generate the control signals for data path unit. The mainadvantage of the microprogrammed control unit is its flexibility to modify themicroprogram in the memory [12]. Microprogrammed control unit consist of address decoding logic and memory. Fig. 5 shows the simplified block diagram of microprogrammed control unit. Fig. 5 Block diagram of microprogrammed control unit DOI: 10.9790/4200-0605021521 www.iosrjournals.org 18 Page

The control signals from microprogrammed control unit are fed to data path unit that performs necessary operations such as load data registers with input signal and filter coefficients, perform multiplication on appropriate data, addition and latch output signal. The microinstruction also has a bit to indicate address decoding logic to stop or continue generating memory address signal. IV. Implementation of Microprogrammed Parallel FIRFilter The 16 x 16 bit Baugh Wooley multiplier with 18 pipelined stages implemented using fully synchronous logic and globally asynchronous locally synchronous using clock divider and decoder modulegiven in [10] is used in the implementation of microprogrammed parallel FIR filter. GALS parallel 4-tap FIR filter that consist GALS 16-bit pipelined Baugh Wooley multipliers, carry look ahead adder and GALSmicroprogrammed control unit is implemented. For fair benchmarking synchronous parallel 4-tap FIR filter that consist synchronous 16-bit pipelined Baugh Wooley multipliers, carry look ahead adder and synchronous microprogrammed control unit is also implemented using same FPGA and almost same logic cells. A. Synchronous Microprogrammed FIR Filter Fig. 6 illustrates the block diagram of synchronous microprogrammed4-tap FIR filter. It consists of synchronous pipelined Baugh Wooley multiplier, carry look ahead adder, synchronous microprogrammed control unit and data registers to hold input signal and filter coefficients. All the registers, multipliers and control unit are clocked simultaneously by global clock signal.pipelined Baugh Wooley 16-bit multiplier requires 18 pipelined stages therefore it takes 18 clock cycles to generate output. Four (4) clock cycles are required to load data into both registers simultaneously. Finally two (2) clock cycles at the adder stages are required to achieve pipeline at each stage of FIR filter. Thus 24 clock cycles are required to generate final output of the filter. Since all the pipelined registers are clocked simultaneously at higher clock rate, considerable amount of power is dissipated in the circuit. Fig. 6 Synchronous microprogrammed 4-tap FIR filter B. GALS Microprogrammed FIR Filter Fig. 7 illustrates the block diagram of GALSmicroprogrammed 4-tap FIR filter. It consists of GALSpipelined Baugh Wooley multiplier, carry look ahead adder, GALSmicroprogrammed control unit and data registers to hold input signal and filter coefficients. All the registers, multipliers and control unit are not DOI: 10.9790/4200-0605021521 www.iosrjournals.org 19 Page

clocked simultaneously by global clock signal. GALSmicroprogrammed control unit receives a global clock signal that generates enable signals for all the pipelined stages and memory. On reception of the enable signal, memory generates various control signals to load data into the registers. Enable signals to the multiplier and pipelined stages at adder facilitate to perform operation in (2) to generate output. Since the global clock signal is applied only to the control unit termed as locally synchronous, while each subblocks of the FIR filter are not synchronized termed as globally asynchronous. The enable signals generated by the control unit are at much lower rate as compared to global clock rate, therefore the switching power dissipation reduces without affecting the speed of operation in GALS FIR filter. Fig. 7 GALS microprogrammed 4-tap FIR filter Table III Results FPGA Resources / Fully Synchronous GALS Parameters Number of Slices 2347 2154 Number of LUTs 2340 2448 Number of FFs 2675 2501 Delay 8.011ns 8.011ns Maximum Frequency 124.82 MHz 124.82 MHz Clock Frequency 1 GHz 1 GHz Total Power 2.516 W 0.478 W Dynamic Power 2.17 W 0.156 W Leakage Power 0.346 W 0.322 W V. Results & Discussion Virtex-5 FPGAs offer the best solution for addressing the needs of high-performance logic designers, high-performance DSP designers, and high-performance embedded systems designers with unprecedented logic, DSP, hard/soft microprocessor, and connectivity capabilities [13]. Built on a 65-nm state-of-the-art copper process technology, Virtex-5 FPGAs are a programmable alternative to custom ASIC technology [13]. The 16 x 16 bit fully synchronous and GALS pipelined MAC unit is coded in VHDL and implemented in virtex 5 FPGA (xc5vlx20t-2ff323) device. The obtained results are also confirmed on other FPGA devices such as Spartan 5, DOI: 10.9790/4200-0605021521 www.iosrjournals.org 20 Page

vertex 6, and Spartan 6.The output of the each block of FIR filter verified using Xilinx ISE web pack 13.1 simulation and synthesis tool. Table III summarizes the result obtained after simulation and implementation of synchronous and GALS FIR filter. Results clearly indicate that fully synchronous FIR Filter dissipates 5.26 times more power as compared to GALS FIR filter. But at the cost of increased area GALS FIR Filter requires 1.046 times more number of slices LUT as compared to fully synchronous FIR filter. VI. Conclusion The fully synchronous and GALS pipelined microprogrammed FIR filter coded in VHDL and implemented in vertex 5 FPGA (xc5vlx20t-2ff323) device. The primary objective is to demonstrate low power implementation of microprogrammed parallel GALS FIR filter for digital signal processing applications. Fully synchronous microprogrammed parallel FIR filter and GALS microprogrammed FIR filter are implemented using same FPGA and almost same logic cells for fair benchmarking.results clearly indicate that fully synchronous FIR filter dissipates 5.26 times more power as compared to GALS FIR filter. But at the cost of increased area GALS FIR filter requires 1.046 times more number of slices LUT as compared to fully synchronous FIR filter. GALSmicroprogrammed FIR filter can be used as basic building block in GALS implementation of digital signal processor. References [1]. Semiconductor Industry Association, International Technology Roadmap for Semiconductors, http://www.itrs.net. [2]. L. A. Plana et al., A GALS infrastructure for a massively parallel multiprocessor, IEEE Design and Test of Computers, vol. 24, no. 5, pp. 454 463, Sep. Oct. 2007. [3]. S. Dasgupta and A. Yakovlev, Comparative analysis of GALS clocking schemes, IET Computer &DigitalTechonolgy, vol. 1, no. 2, pp. 59 69, Mar. 2007. [4]. Kwen-Siong Chong, et al, Synchronous-Logic and Globally-Asynchronous-Locally-Synchronous (GALS) Acoustic Digital SignalProcessors, IEEE Journal Of Solid-State Circuits, vol. 47, no. 3, pp 769 780, March 2012. [5]. R.Dobkin, R. Ginosar, and C. P. Sotiriou, Data synchronization issues in GALS SoCs, in Proc. Int. Symp. Async. Circuits Syst. (ASYNC), pp. 170 179, 2004. [6]. V. K. Ingle, J. G. Proakis, Digital Signal Processing using Matlab, in 2 nd Edition, Cengage Learning, 2007. [7]. Uwe Beyer Baese, Digital Signal Processing using Field Programmable Gate Array, in Springer Series on Signal & Communication Technology, 2007. [8]. Clive Max Maxfield, The Design Warrior s Guide to FPGAs, Elsevier Publication, 2006. [9]. GouriWazurkar, S. L. Badjate, Power Efficient GALS Pipelined MAC Unit for FFT with Complex Numbers, in IEEE International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), 2016. [10]. GouriWazurkar, S. L. Badjate, Globally Asynchronous Locally Synchronous (GALS) Pipelined Signed Multiplier, in IEEE International Conference on Computing, Analytics and Security Trends (CAST), 2016. [11]. R. C. Baugh and A. B. Wooley, A two s complement parallel array multiplication algorithm, IEEE Trans. Computers, Vol. C-22, No. 12, pp. 1045-1047, Dec. 1973. [12]. Syed ManzoorQasim and Mohammed S. BenSaleh, Hardware Implementationof MicroprogrammedControllerBased Digital FIR Filter, in Springer IAENG Transactions onengineering Technologies, 2012. [13]. Xilinx, Virtex-5 Family Overview, http://www.xilinx.com, v5.1, 2015. DOI: 10.9790/4200-0605021521 www.iosrjournals.org 21 Page