DESIGN AND IMPLEMENTATION OF FFT ARCHITECTURE FOR REAL-VALUED SIGNALS BASED ON RADIX-2 3 ALGORITHM 1 Pradnya Zode, 2 A.Y. Deshmukh and 3 Abhilesh S. Thor 1,3 Assistnant Professor, Yeshwantrao Chavan College of Engineering 2 Professor, Department of Electronics Engg., G.H. Raisoni College of Engineering, Nagpur (E-mail: 1 pradnya.u@rediff.com, 2 aydeshmukh@gmail.com, 3 abhileshthor@yahoo.com). Abstract- A new FFT architecture for real-valued signal is proposed using Radix-2 3 algorithm. It is based on modifying flow graph of the FFT algorithm such that it has both real and complex datapaths. A redundant operation in flow graph is replaced by imaginary part. Using folding technique RFFT architecture with any level of parallelism can be achieved. This RFFT architecture will lead to low hardware complexity as compare to radix-2 and radix 2 2 algorithm in terms of adder, and delay. N-point 2 parallel radix-2 3 architecture requires (log 8 N-1) complex,2log 2 N adders, 3N/2-2 delays. RFFT which is used for real time applications and in portable devices for which low power consumption is main requirement, so accordingly carry propagate adder which has least power consumption and CSD is selected for our proposed architecture. Keywords FFT, Parallel Processing, Pipelining, Real Signals, radix-2 3, Folding 1. Introduction Fast Fourier transform (FFT) is one of the widely used algorithms in digital signal processing [1]. Now a day s interest in the computation of FFT for real valued signals (RFFT) is increased since most of the physical signals are real. RFFT is very important algorithm used in various real time applications. In the area of digital signal processing (DSP) [2]. FFT is very important algorithm. Hardware complexities can be reduce in asymmetric digital subscriber line (ADSL) [3] by using RFFT. In applications like spectral and filtering analysis [4] FFT also plays an important role which helps to analyze spectral components. RFFT is very vital algorithm for analyzing signals like electroencephalography (EEG) and electrocardiography (ECG). RFFT is also used in various portable devices which leads to low power consumption. FFT is also used in power spectral density which can detect whether signal is perfect or there is any problem. This paper tells about designing RFFT architecture. In this flow graph is modified and redundant part is replaced by imaginary part in order to reduce complexity. Using folding technique RFFT architecture with any level of parallelism can be achieved. As imaginary part is injected in butterfly structure it will have both real and complex datapath. RFFT is used for real time applications and in portable devices for which low power consumption is main requirement, so accordingly carry propagate adder which has least power consumption and CSD is selected for our architecture. The paper is organized as follows. Section II describes previous work related to RFFT. Section III describes proposed architecture for 16 point RFFT radix-2 3 DIF. Section VI describes FPGA Implementation of adder and CSD. The Experimental results are discussed in section V and finally, concluding remarks are in section VI. 2. Previous Work Previously, in the past various algorithms for computation of RFFT is presented but they did not have proper regular geometry. This is very important for deriving pipelined architecture. Firstly pipelined architecture for real valued signal was designed in [5]. But it is restricted to only radix- 2 and 4 parallel RFFT architecture. Also, it has only real datapaths. To derive FFT architecture for real and complex inputs a design is presented in [6].But in this architecture even after removing redundant operations, it still calculate this samples. Again there was no full hardware utilization of architecture derived in this paper. In [7] pipelined RFFT architecture is derived but it has more hardware complexity than our proposed architecture as it has more number of adders, s and delays than radix-2 3 algorithm in [8]. ISSN (Print): 2249-9210 ISSN (Online): 2348-1862 71 IJREAS, Vol. 02, Issue 02, July 2014
3. Proposed Work The N-point discrete Fourier transform (DFT) of a sequence x[n]is defined as X[k] = x[n]w Where W = e ( / ) In RFFT inputs are real, If x[n] is real, then output X[k] have conjugate symmetric X [N-k] = X * [k] (1) Due to this, (N/2)-1 output calculations can be removed as they are redundant.proposed work involves following 2 steps. A. Modified butterfly structure Redundant samples can be find by approach in [5]. After finding redundant samples these are removed. But it leads to irregular geometry, so efficient pipelining cannot be done. In order to get regular geometry redundant operations are replaced by imaginary part. Now efficient pipelining can be done. Modified flow graph of 16 point RFFT DIF radix-2 3 shown in Fig.1 B. Folding. Fig. 2 Butterfly structure I for proposed architecture [8] Fig. 3 Butterfly structure II for proposed architecture [8] Using folding technique in [9] pipelined architecture can be derived from DFG. Also it leads to optimized datapath [10]. Nodes which are in DFG can be implemented by butterfly structure I and II. Proposed work 2 parallel architecture shown below in Fig. 4 Fig. 4. Proposed work 4. FPGA IMPLEMENTATION In this various adders are simulated on Spartan 6, Xc6sl16 device, CSG324 package and adder with least power power consumption is selected and also CSD is simulated on Spartan-6. II. Fig.1 Modified flow graph 16 point RFFT radix-2 3 DIF UNITS Since flow graph has both complex and real parts, it has both complex and real datapath. So to handle this we have two butterfly structures. First is very straight it involves two real inputs and consists of real adder and subtraction. This butterfly structure is shown in Fig.2butterfly structure is shown in Fig.2 Adder RFFT is generally used for for real time applications and portable devices like ECG and EEG etc. Low power and low area is main requirement for such devices so, we have select adder accordingly. So various adders like carry propagate as in [11], carry skip as in [12] and carry look ahead adder as in [13] is simulated on Spartan 6, Xc6sl16 device, CSG324 package. Table I shows their performance based on area, power and delay. Amongst these carry propagate adder suits best as it has least power consumption. Also its RTL view shown in Fig.5 and waveform in Fig.6. ISSN (Print): 2249-9210 ISSN (Online): 2348-1862 72 IJREAS, Vol. 02, Issue 02, July 2014
TABLE I Adder(16bit) Power(mw) Delay(ns) Area(slices) Carry propagate Carry look ahead 30 15.243 25 / 9112 42 13.567 59 Carry skip 37 14.767 44 has multiplexer which is controlled by these pair of bits. Depending on these input pair, multiplexer in output will be input data, inverse of input data or all zeros. Depending on these pair of bits shifts are applied. It also has circuitry which receives and combines these outputs and further shifts of bits are applied to generate final output. Normal and CSD[16] (8 bit x 8 bit)is simulated on Spartan 6, Xc6sl16 device, CSG324 package and their performance based on area,power and delay is shown in Table II. Normal CSD TABLE II Power(mw) Delay(ns) Area(slice) 35 11.7 116 / 9112 32 10.5 88 Clearly it is seen that CSD has less delay and power consumption, so it is selected for our proposed architecture. Its waveform shown in Fig.7 and RTL view in Fig.8. Fig. 5. RTL View of carry propagate adder Figure 7 Output waveform of CSD B. Multiplier Figure 6 Output Waveform of carry propagate adder In normal, multiplication is done by shifts producing partial products and then adding all the partial products. So multiplication of two N bits will generate N x N partial products and subsequently (N-1) adders will require if two inputs N bit adders are used. So number of hardware component increases and also time required for multiplication increases. So there is need of which generates less partial products which in turn reduces time as well as power consumption for multiplication.csd suits best to our requirement. Proposed CSD is very efficient way of multiplication; it leads to reduction in number of partial products by using redundancy of sign code CSD in [14] and provides an efficient way of multiplication as in [15]. Number of partial products also gets reduced. As number of partial products are reduced, it is more hardware efficient, so it require less time for multiplication and low power consumption than normal. Constant value with which multiplicand is to be multiplied has pair of bits, Fig. 8. RTL view of CSD ISSN (Print): 2249-9210 ISSN (Online): 2348-1862 73 IJREAS, Vol. 02, Issue 02, July 2014
C. Simulation result of FFT architecture Fig.9 shows inputs to FFT and Fig.10 shows outputs of FFT. will have low hardware as compare to other previous designs. Radix-2 [8] Radix-2 2 [8] Proposed Radix 2 3 (2 parallel) TABLE III CONSIDER N=64 Complex 2(log 4 N - 1) 4 C.M (log 4 N -1) 2 C.M (log 8 N 1) 1 C.M Adders 4log 2 N 24 adders 4log 2 N - 2 22 adders 2log 2 N 12 adders Delays 2N 128 delays 2(N - 2) 124 delays 3N/2 2 94 delays 6. Conclusion Fig.9 Inputs to FFT Efficient architecture for computation of RFFT has been proposed in this paper. Datapaths will be optimized by folding. It will also have less number of adders, and delay with respect to previous architecture. This architecture will have low power with respect to adder as carry propagate adder will be used and relatively fast as CSD will be used. References Fig.10 Outputs of FFT 5. Comparison and Analysis Table III compares the hardware complexity in terms of adder, and delays of proposed architectures with previous architecture for N-point FFT and as an example for N=64.It has been observe that number of adders, and delays for radix-2 3 architecture will be less as compared to other architectures, so radix-2 3 architecture [1] A. V. Oppenheim, R.W. Schafer, and J.R. Buck, Discrete- Time Signal Processing, 2nd ed. Englewood Cliffs, NJ, USA: Prentice Hall, 1998. [2] H. Chi and Z. Lai, A cost-effective memory-based realvalued FFT and Hermitian symmetric IFFT processor for DMT-based wire-line transmission systems, in Proc. ISCAS, May 2005, vol. 6, pp. 6006 6009. [3] W. Ko, J. Kim, Y. Park, T. Koh, and D. Youn, An efficient DMT modem for the G.LITE ADSL transceiver, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 11, no. 6, pp. 997 1005, Dec. 2003. [4] S. He and M. Torkelson, Designing pipeline FFT processor for OFDM (de)modulation, in Proc. Int. Symp. Signals Syst., Oct. 1998. pp. 257 262. [5] M. Garrido, K. K. Parhi, and J. Grajal, A pipelined FFT architecture for real-valued signals, IEEE Trans. Circuits Syst. I, Reg Papers, vol.56, no. 12, pp. 2634 2643, Dec. 2009. [6] M. Ayinala, M. Brown, and K. K. Parhi, Pipelined parallel FFT architectures via folding transformation, IEEE Trans. Very Large Scale Integer. (VLSI) Syst., vol. 20, no. 6, pp. 1068 1081, Jun. 2012. [7] M. Ayinala and K. K. Parhi, Parallel-pipelined radix-2 2 FFT architecture for real valued signals, in Proc. Asilomar Conf. Signals, Syst. Comput., Nov. 2010, pp. 1274 1278. [8] Manohar Ayinala, Keshab K. Parhi, FFT Architectures for Real-Valued Signals Based on Radix-2 3 and Radix-2 4 Algorithms IEEE transactions on circuits and systems- I:regular papers,2013. ISSN (Print): 2249-9210 ISSN (Online): 2348-1862 74 IJREAS, Vol. 02, Issue 02, July 2014
[9] K. K. Parhi, C. Y. Wang, and A. P. Brown, Synthesis of control circuits in folded pipelined DSP architectures, IEEE J. Solid State Circuits,vol. 27, no. 1, pp. 29 43, 1992. [10] Mario Garrido, J. Grajal, M. A. Sánchez, and Oscar Gustafsson, Pipelined Radix-2 k Feedforward FFT Architectures IEEE Trans. on very large scale integration (VLSI) systems, vol. 21, no. 1, January 2013 [11] V.G. Oklobdzija Design and analysis of fast carrypropagate adder under non-equal input signal arrival profile IEEE 1994 [12] Cha Min, E.E Swartzlander Modified carry skip adder for reducing first block delay IEEE 2000 [13] R.W Doran Variants of an improved carry look-ahead adder IEEE Tranctions 1988 [14] Jeffrey O. Coleman, Arda Yurdakul Fractions in the Canonical-Signed-Digit Number System 2001 Conference on Information Sciences and Systems, The Johns Hopkins University, March 21 23, 2001 [15] Michael A. Soderstrand CSD MULTIPLIERS FOR FPGA DSP APPLICATIONS Circuit and systems 2003 ISCAS 03.Procedings of 2003 international Symposium on (volume 5). [16] Prabir Saha, A. Banerjee,I Banerjee,ADandapat High speed low power floating point design based on CSD(canonical sign digit) VDAT 2010. ISSN (Print): 2249-9210 ISSN (Online): 2348-1862 75 IJREAS, Vol. 02, Issue 02, July 2014