VLSI Implementation of Pipelined Fast Fourier Transform

Similar documents
A Novel Approach in Pipeline Architecture for 64-Point FFT Processor without ROM

Area Efficient Fft/Ifft Processor for Wireless Communication

IMPLEMENTATION OF 64-POINT FFT/IFFT BY USING RADIX-8 ALGORITHM

A Low Power Pipelined FFT/IFFT Processor for OFDM Applications

EFFICIENT DESIGN OF FFT/IFFT PROCESSOR USING VERILOG HDL

An Area Efficient FFT Implementation for OFDM

DESIGN OF PROCESSING ELEMENT (PE3) FOR IMPLEMENTING PIPELINE FFT PROCESSOR

A FFT/IFFT Soft IP Generator for OFDM Communication System

A HIGH SPEED FFT/IFFT PROCESSOR FOR MIMO OFDM SYSTEMS

M.Tech Student, Asst Professor Department Of Eelectronics and Communications, SRKR Engineering College, Andhra Pradesh, India

Design of Reconfigurable FFT Processor With Reduced Area And Power

An Efficient Design of Parallel Pipelined FFT Architecture

ULTRAWIDEBAND (UWB) communication systems,

Design Of A Parallel Pipelined FFT Architecture With Reduced Number Of Delays

DESIGN AND IMPLEMENTATION OF OFDM TRANSCEIVER FOR ISI REDUCTION USING OQPSK MODULATION

A SURVEY ON FFT/IFFT PROCESSOR FOR HIGH SPEED WIRELESS COMMUNICATION SYSTEM

Architecture for Canonic RFFT based on Canonic Sign Digit Multiplier and Carry Select Adder

A High-Speed Low-Complexity Modified Processor for High Rate WPAN Applications

Keywords SEFDM, OFDM, FFT, CORDIC, FPGA.

DESIGN AND IMPLEMENTATION OF MOBILE WiMAX (IEEE e) PHYSICAL LAYERUSING FPGA

Combination of SDC-SDF Architecture for I/O Pipelined Radix-2 FFT

Fast Fourier Transform: VLSI Architectures

An Efficient FFT Design for OFDM Systems with MIMO support

VLSI Implementation of Area-Efficient and Low Power OFDM Transmitter and Receiver

A Partially Operated FFT/IFFT Processor for Low Complexity OFDM Modulation and Demodulation of WiBro In-car Entertainment System

OFDM Based Low Power Secured Communication using AES with Vedic Mathematics Technique for Military Applications

DESIGN AND IMPLEMENTATION OF FFT ARCHITECTURE FOR REAL-VALUED SIGNALS BASED ON RADIX-2 3 ALGORITHM

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November ISSN

Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier

ISSN Vol.07,Issue.01, January-2015, Pages:

An Area-Efficient Multimode FFT Circuit for IEEE ax WLAN Devices

Low power and Area Efficient MDC based FFT for Twin Data Streams

A Combined SDC-SDF Architecture for Normal I/O Pipelined Radix-2 FFT

Implementation of a FFT using High Speed and Power Efficient Multiplier

Figure 1: Basic OFDM Model. 2013, IJARCSSE All Rights Reserved Page 1035

Methods for Reducing the Activity Switching Factor

PAPER A High-Speed Two-Parallel Radix-2 4 FFT/IFFT Processor for MB-OFDM UWB Systems

Implementation of an IFFT for an Optical OFDM Transmitter with 12.1 Gbit/s

Data Word Length Reduction for Low-Power DSP Software

Implementation techniques of high-order FFT into low-cost FPGA

ISSN: (PRINT) ISSN: (ONLINE)

Low Power R4SDC Pipelined FFT Processor Architecture

Low-Power and High Speed 128-Point Pipline FFT/IFFT Processor for OFDM Applications

Simulation of Parallel Pipeline Radix 2^2 Architecture

Low Power Approach for Fir Filter Using Modified Booth Multiprecision Multiplier

A High Performance Split-Radix FFT with Constant Geometry Architecture

Design and Analysis of RNS Based FIR Filter Using Verilog Language

International Journal of Electronics and Communication Engineering & Technology (IJECET), ISSN INTERNATIONAL JOURNAL OF ELECTRONICS AND

A Novel Low Power Approach for Radix-4 commutator FFT Based on CSD Algorithm

A New High Speed Low Power Performance of 8- Bit Parallel Multiplier-Accumulator Using Modified Radix-2 Booth Encoded Algorithm

LOW-POWER FFT VIA REDUCED PRECISION

(OFDM). I. INTRODUCTION

OFDM and FFT. Cairo University Faculty of Engineering Department of Electronics and Electrical Communications Dr. Karim Ossama Abbas Fall 2010

Implementation of OFDM System Using FFT and IFFT

Chapter 1. Introduction

DESIGN & IMPLEMENTATION OF FIXED WIDTH MODIFIED BOOTH MULTIPLIER

OFDM TRANSMISSION AND RECEPTION: REVIEW

LOW POWER FEED FORWARD FFT ARCHITECTURES USING SWITCH LOGIC

Chapter 0 Outline. NCCU Wireless Comm. Lab

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

A Modified FFT Algorithm for OFDM Based Wireless System

Index Terms. Adaptive filters, Reconfigurable filter, circuit optimization, fixed-point arithmetic, least mean square (LMS) algorithms. 1.

Fast Fourier Transform utilizing Modified 4:2 & 7:2 Compressor

An Optimized Implementation of CSLA and CLLA for 32-bit Unsigned Multiplier Using Verilog

A Novel High Performance 64-bit MAC Unit with Modified Wallace Tree Multiplier

An FPGA Based Low Power Multiplier for FFT in OFDM Systems Using Precomputations

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

Implementation of FPGA Architecture for OFDM-SDR with an optimized Direct Digital Frequency Synthesizer

Optimized BPSK and QAM Techniques for OFDM Systems

Low-Power Multipliers with Data Wordlength Reduction

VLSI Implementation of Digital Down Converter (DDC)

FOR HIGH SPEED LOW POWER APPLICATIONS USING RADIX-4 MODIFIED BOOTH ENCODER

PAPR Reduction in SLM Scheme using Exhaustive Search Method

Available online at ScienceDirect. The 4th International Conference on Electrical Engineering and Informatics (ICEEI 2013)

FFT Factorization Technique for OFDM System

Innovative Approach Architecture Designed For Realizing Fixed Point Least Mean Square Adaptive Filter with Less Adaptation Delay

A New Data Conjugate ICI Self Cancellation for OFDM System

Anju 1, Amit Ahlawat 2

High Speed Vedic Multiplier Designs Using Novel Carry Select Adder

PERFORMANCE COMPARISON OF HIGHER RADIX BOOTH MULTIPLIER USING 45nm TECHNOLOGY

SDR Applications using VLSI Design of Reconfigurable Devices

CHAPTER 4 GALS ARCHITECTURE

An Optimized Wallace Tree Multiplier using Parallel Prefix Han-Carlson Adder for DSP Processors

A RobustJitter Noise Power Reduction in Ultra-Speed Optical OFDM Systems

High performance Radix-16 Booth Partial Product Generator for 64-bit Binary Multipliers

Using Modern Design Tools To Evaluate Complex Communication Systems: A Case Study on QAM, FSK and OFDM Transceiver Design

Adaptive beamforming using pipelined transform domain filters

DESIGN OF MULTIPLE CONSTANT MULTIPLICATION ALGORITHM FOR FIR FILTER

Design and Performance Analysis of a Reconfigurable Fir Filter

An Efficient Method for Implementation of Convolution

Design of an Optimized FBMC Transmitter by using Clock Gating Technique based QAM for Low Area, Power and High Speed Applications

Reduced Complexity Wallace Tree Mulplier and Enhanced Carry Look-Ahead Adder for Digital FIR Filter

Area Power and Delay Efficient Carry Select Adder (CSLA) Using Bit Excess Technique

A Fixed-Width Modified Baugh-Wooley Multiplier Using Verilog

SIGNED PIPELINED MULTIPLIER USING HIGH SPEED COMPRESSORS

MODIFIED UNIVERSAL SHIFT REGISTER BASED LOW POWER MULTIPLIER ARCHITECTURE

A Comparative Study on Direct form -1, Broadcast and Fine grain structure of FIR digital filter

VLSI Implementation of Auto-Correlation Architecture for Synchronization of MIMO-OFDM WLAN Systems

Vector Arithmetic Logic Unit Amit Kumar Dutta JIS College of Engineering, Kalyani, WB, India

Transcription:

ISSN: 2278 323 Volume, Issue 4, June 22 VLSI Implementation of Pipelined Fast Fourier Transform K. Indirapriyadarsini, S.Kamalakumari 2, G. Prasannakumar 3 Swarnandhra Engineering College &2, Vishnu Institute of Technology, 3 {darsiniprasanna36, Kamalakumari6 2, godiprasanna 3 }@gmail.com Abstract: Digital Signal Processing (DSP) has become a very important and dynamic research area. Now a day s many integrated circuits dedicated to DSP functions. Unfortunately Existing designs are restricted to a low accuracy and a small sample number. The Fourier transform is widely used in industrial applications as well as in scientific research. The most common use is to transform a function of time into a frequency function. In this paper, we present the efficient implementation of a pipeline FFT. Our design adopts a single-path delay feedback style as the proposed hardware architecture. To eliminate the read-only memories (ROM s) used to store the twiddle factors, the proposed architecture applies a reconfigurable complex multiplier and bit-parallel multipliers to achieve a ROM-less FFT processor, thus consuming lower power than the existing works. Index Terms: FFT, ROM, complex multiplier. I. INTRODUCTION Discrete Fourier transform (DFT) is a very important technique in modern digital signal processing (DSP) and telecommunications, especially for applications in orthogonal frequency demodulation multiplexing (OFDM) systems, such as IEEE 82.a/g [], Worldwide Interoperability for Microwave Access (WiMAX) [2], Long Term Evolution(LTE) [3], and Digital Video Broadcasting Terrestrial(DVB-T) [4]. However, DFT is computational intensive and has a time complexity of O(N 2 ). The fast Fourier transform (FFT) was proposed by Cooley and Tukey [5] to efficiently reduce the time complexity to O(Nlog 2N), where N denotesthe FFT size. For hardware implementation, various FFT processors have been proposed [6]. These implementations can be mainlyclassified into memory-based and pipeline architecture styles. Memory-based architecture is widely adopted to design anfft processor, also known as the single processing element (PE) approach. This deign style is usually composed of amain PE and several memory units, thus the hardware cost and the power consumption are both lower than the other architecture style. However, this kind of architecture style has long latency, low throughput, and cannot be parallelized. On the other hand, the pipeline architecture style can get rid of the disadvantages of the foregoing style, at the cost of anacceptable hardware overhead. Generally, the pipeline FFT processors have two popular design types. One uses single-path delay feedback (SDF) pipeline architecture and the other uses multiple-path delay commentator (MDC) pipeline architecture. The single-path delay feedback (SDF) pipeline FFT [6]- [7] is good in its requiring less memory space (about N- delay elements) and its multiplicationcomputation utilization being less than 5%, as well as its control unit being easy to design. Such implementations are advantageous to lowpower design, especially for applications in portable DSP devices. Based on these reasons, the SDF pipeline FFT is adopted in our work. Our proposed architecture includes a reconfigurable complex constant multiplier and bit-parallel complex multipliers instead of using ROM s to store twiddle factors, which is suited for the power-of-2 radix style of FFT/IFFT processors. In essence, a short version of the present work has been published in []. In this paper, a more detailed and completed description of the entire work is provided.the rest of this paper is organized as follows. First, a brief review of the fast Fourier transform is described in Section II. Section III presents our proposed FFT architecture for application in wireless communication systems. The performance evaluation of various FFT architectures is then discussed in Section IV. Finally, concluding remarks are given in Section V. II. FFT AND IFFT ALGORITHMS The discrete Fourier transforms (DFT) X k of an N- point discrete-time signal x n is defined by: X k = N kn n= x n W N k N-, () Where the twiddle factor W N kn = e j 2πkn N denotes N-point primitive root of unity. However, a straightforward implementation of this algorithm is obviously impractical due to the huge hardware All Rights Reserved 22 IJARCET 427

ISSN: 2278 323 Volume, Issue 4, June 22 required. Therefore, the fast Fourier transform (FFT) [5] was developed to efficiently speed up its Computation time and significantly reduce the hardware cost. Generally, FFT analyzes an input signal sequence by using decimation-in-frequency (DIF) or decimation-in-time (DIT) decomposition to construct an efficiently computational signal-flow graph (SFG). Here, our work employs a DIFdecomposition because it matches the manipulation manner of single-path delay pipeline facility. An example of radix-2 DIF FFT SFG for N = 6 is depicted in Fig.. Fig. Radix-2 DIF FFT signal-flow graph of length 8 The radix-2 DIF FFT described above appears regularity in SFG and has less complex multipliers required. Thus, it is suited for hardware implementation, because some complex multiplications can be simplified to reduce the chip area. For instance, an input signal multiplied by W 8 2 in Fig. can beexpressed as:. a jb W 6 2 = 2 a b j b a /2, (2) Where (ajb) denotes a discrete-time signal in complex form similarly, the complex multiplication of W 2 6 is given by: a jb W 2 6 = 2 b a j b a /2, (3) Both these above equations will ease hardware implementation in the future, because they only need to calculate the multiplication by 2 / 2 and two real additions, respectively. Especially, the multiplication by 2 / 2 can be obtained easily, which circuit design will be introduced in the latter section. The inverse discrete Fourier transform (IDFT) of length N is given by: x n = N kn X N k= k W N, n N- (4) To reuse the same hardware core for reducing the chip area [6], (4) can be rewrite as: x n = ( N X N k kn k= W N ) n N- (5) Where the star symbol * denotes a conjugate. This new form can be viewed as a general DFT. In other words, DFT and IDFT can reuse the same hardware core, while IDFT requires some extra computations. These extra computations include conjugating the input data X k and the outcomes of DFT, as well as dividing the previous output by N. Obviously, this new reuse version of DFT/IDFT algorithm will also simplify the design effort of an DFT/ IDFT processor and thus reduce the chip area, if both the DFT and IDFT processors are activated alternatively, and not simultaneously. III. PROPOSED ARCHITECTURE Traditional hardware implementation of FFT/IFFT processors usually employs a ROM to look up the wanted twiddle factors, and then word length complex multipliers to perform FFT computing. However, this introduces more hardware cost, thus a bit-parallel complex constant multiplication scheme [8] is used to improve the foregoing issue. Besides, since the twiddle factors have a symmetric property, the complex multiplications used in FFT computation can be one of the following three operation types W N k. a jb = W N k N 4 b ja, N/4<k<N/2, (6) W N k. a jb = W N k N 2 b ja, N/2<k<3N/4, (7) W N k. a jb = W N k 3N 4 b ja 3N/4<k<N (8) Given the above three equations, any twiddle factor can be obtained by a combination of these twiddlefactor primary elements. In other words, arbitrary twiddle factor used in FFT can utilize these operation types to derive the wanted value, thus can significantly shorten the size of ROM used to store the twiddle factors. Moreover, for hardware implementation consideration, we add two extra operation types to further decrease the size of ROM. Our method can also prune away the critical path in the designed hardware such that the system clock becomes faster. The two additional operation types are given by: W N k. a jb = [W N (N/4) k b ja ]*, k<n/4 (9) W N k. a jb = j[w N N 2 k b ja ], N/4 k<n/2, () All Rights Reserved 22 IJARCET 428

ISSN: 2278 323 Volume, Issue 4, June 22 A. Proposed Architecture In order to improve the previous works on power reduction, we propose a radix-2 pipeline FFT/IFFT processor with low power consumption. The proposed architecture is composed of three different types of processing elements (PEs), a complex constant multiplier, delay-line (DL) buffers (as shown by a rectangle with a number inside), and some extra processing units for computing IFFT. Here, the conjugate for extra processing units is easy to implement, which only takes the 2 s complement of the imaginary part of a complex value. In addition, for a complex constant multiplier in Fig. 2, we propose a novel reconfigurable complex constant multiplier to eliminate the twiddle-factor ROM. This new multiplication structure thus becomes the key component in reducing the chip area and power consumption of our proposed FFT processor. The detailed functions of these modules appeared in Fig. 2 are described in the following subsections. B. Processing Elements Based on the radix-2 FFT algorithm, the three types of processing elements (PE3, PE2, and PE) used in our design are illustrated in Fig. 2, Fig. 4, and Fig. 3, respectively. The functions of these three PE types correspond to each of the butterfly stages as shown in Fig.. First, the PE3 stage is used to implement a simple radix-2 butterfly structure only, and serves as the sub modules of the PE2 and PE stages. In the figure, and are the real parts of the input and output data, respectively. and denotes the image parts of the input and output data, respectively. Similarly, DL- and DL- stand for the real parts of input and output of the DL buffers, and DL- and DL- are for the image parts, respectivelyas for the PE2 stage, it is required to compute the multiplication by j or. Note that the multiplication by - in Fig. 3 is practically to take the 2 s complement of its input value. In the PE stage, the calculation is more complex than the PE2 stage, which is responsible for computing the multiplications by j, W N N/8, and W N 3N/8 respectively. Since W N 3N/8 =-j W N N/8 it can be given by either the multiplication by W N N/8 first and then the multiplication by j or the reverse of the previous calculation. Hence, the designed hardware utilizes this kind of cascaded calculation and multiplexers to realize all the necessary calculations of the PE stage. This manner can also save a bitparallel multiplier for computing, which further forms a low-cost hardware. C. Bit-Parallel Multipliers In Section II, the multiplication by / 2 can employ a bit parallel multiplier to replace the wordlength multiplier and square root evaluation for chip area reduction. The bit-parallel operation in terms of power of 2 is given by: Output =inx 2/2=inx(2-2 -3 2-4 2-6 2-8 2-4 ), () If a straightforward implementation for the above equation is adopted, it will introduce a poor precision due to the truncation error and will spend more hardware cost. Therefore, to improve the precision and hardware cost, Eq.() can be rewritten as: Output=in x 2/2=in x [((2-2 )(2-6 -2-2 )], (2) DL- DL- S S DL- DL- Fig. 2 Circuit diagram of our proposed PE3 stage. All Rights Reserved 22 IJARCET 429

ISSN: 2278 323 Volume, Issue 4, June 22 DL- I DL- DL- PE3 X W N N/8 Q S - S2 DL- Fig.3 Circuit diagram of our proposed PE stage >>2 >>4 DL- PE3 - DL- S In >>2 - Fig. 5 Circuit diagram of the bit-parallel multiplication by / 2 OUT DL- Fig. 4 Circuit diagram of our proposed PE2 stage. According to, the circuit diagram of the bit-parallel multiplier is illustrated in Fig. 5. The resulting circuit uses three additions and three barrel shift operations. The realization of complex multiplication by W N N/8 using a radix-2 butterfly structure with its both outputs commonly multiplied by / 2 is shown in Fig. 6. This circuit has just been used in the PE stage. DL- Besides, we need not to use bit-parallel multipliers to replace the word length one for two reasons. One is on the operation rate. If bit-parallel multipliers are used, the clock rate is decreased due to the many cascades adders. The other reason is the introduction of high wiring complexity because many bit-parallel multipliers are required to be switched for performing multiplication operations with different twiddle factors. In fact, this phenomenon also appears in [8]. Based on the above two reasons, the word of operation speed and chip area. Note that our proposed complex constant multiplier will not length multiplier is still adopted to implement our complex constant multiplier under the consideration. Introduce the issue of high hardware cost as described earlier, because no ROM is used IV. PERFORMANCE EVALUATION AND RESULT. The performance evaluation can be obtained by formulation of normalization power per FFT is defined as follows: All Rights Reserved 22 IJARCET 43

ISSN: 2278 323 Volume, Issue 4, June 22 Normalized power per FFT = power (voltage ) 2 (FFT sizefrequency ) X (3) The functional simulation of the proposed architecture has been justified by using Verilog HDL. The result evidences the validation of the proposed architecture. To further validate our proposed architecture, we implement this architecture on a commercial FPGA chip. The result shows that the proposed architecture works very well. / 2 V. CONCLUSION A novel ROM-less and low-power pipeline FFT/IFFT for OFDM applications have been described in this paper. Considering the symmetric property of twiddle factors in FFT, we have designed a reconfigurable complex constant multiplier such that the size of twiddle factor ROM is significantly shrunk, especially no ROM is needed in our work. This result shows that our design owns lower hardware cost and power consumption compared to the existing ones. Of course, our proposed scheme can also be adapted to high-point FFT applications, with a lower size of twiddle-factor ROM s. our design is relatively low cost and consumes lower power, it can serve as a powerful FFT/IFFT processor in many other wireless communication systems. REFERENCES _ [] IEEE Std 82.a, 999, Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-Speed Physical Layer in the 5 GHz band. / 2 [2] IEEE 82.6, IEEE Standard for Air Interface for Fixed Broadband Wireless Access Systems, the Institute of Electrical and Electronics Engineers, Inc., June 24. Fig. 6 Circuit diagram of the multiplication by W N N/8 [3] Chu Yu, Yi-Ting Liao, Mao-Hsu Yen, Pao-Ann Hsiung, and Sao-Jie Chen, A Novel Low-Power 64- point Pipelined FFT/IFFT Processor for OFDM Applications, in Proc. IEEE Int l Conference on Consumer Electronics. Jan. 2, pp. 452-453. [4] ETSI, Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television, ETSI EN 3744 v.4., 2. [5] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series Math. Comput, vol. 9, pp. 297-3, Apr. 965. [6] S.He and M. Torkelson, Designing Pipeline FFT Processor for OFDM (de)modulation, in Proc. URSI Int. Symp. Signals, Systems, and Electronics, vol. 29, Oct.998, pp. 257-262. Fig: 7 FFT using proposed architecture [7] H.L. Groginsky and G.A. Works, A pipeline fast Fourier transform, IEEE Transactions on Computers, vol. C-9, no., pp. 5-9, Nov. 97. All Rights Reserved 22 IJARCET 43

ISSN: 2278 323 Volume, Issue 4, June 22 [8] KoushikMaharatna, Eckhard Grass, and Ulrich Jagdhold, A 64-Point Fourier transform chip for high-speed wireless LAN application using OFDM, IEEE Journal of Solid-State Circuits, vol. 39, no. 3, pp. 484-493, Mar. 24. [9] Y.T. Lin, P.Y. Tsai and T.D. Chiueh, Lowpower variable-length fast Fourier transform processor, IEE Proc. Comput. Digit. Tech., vol. 52, no. 4, pp. 499-56, July 25. K.Indirapriyadarsini: studying M.Tech in Swarnandhra College of engineering and technology, Narsapuram, and Major working areas are VLSI and embedded systems Presented research paper in one national conference. S.kamalakumari: Associate. Professor in swarnandhra college of engineering and technology, Narsapuram, Major working areas are wireless communications, Linear and Digital ICs and VLSI. Has seven years of teaching experience presented research papers in two national conferences. G.Prasanna Kumar: Asst. Professor in Vishnu institute of technology. Has four years of teaching experience. Major working areas are Digital Signal Processing, Wireless communications and Embedded Systems Presented researchpaper in one national conference. All Rights Reserved 22 IJARCET 432