Document downloaded from: This paper must be cited as:

Documet dowloaded from: http://hdl.hadle.et/10251/44304 This paper must be cited as: Caet Subiela, MJ.; Valls Coquillat, J.; Almear Terre, V.; Marí-Roig Ramó, J. (2012). FPGA implemetatio of a OFDM-based WLAN receiver. Microprocessors ad Microsystems. 36(3):232-244. doi:10.1016/j.micpro.2011.11.004. The fial publicatio is available at http://dx.doi.org/10.1016/j.micpro.2011.11.004 Copyright Elsevier

FPGA implemetatio of a OFDM-based WLAN receiver Authors: María José Caet, Javier Valls, Viceç Almear ad José Marí-Roig Affiliatio: Istituto de Telecomuicacioes y Aplicacioes Multimedia, Uiversidad Politécica de Valecia, Uiversidad Politécica de Valecia. Ctra. Nazaret-Oliva s/, 46730 Gadia, Spai macasu@iteam.upv.es Abstract: This paper deals with the desig ad implemetatio o FPGA of a receiver for OFDM-based WLAN. The circuit is particularized for IEEE 802.11.a,g stadards. The system icludes frame detectio, time ad frequecy sychroizatio, demodulatio, equalizatio ad phase trackig. The algorithms to be implemeted for each task are selected takig ito accout performace, hardware cost ad latecy. Also, a fixed poit aalysis is made for each algorithm. Our objective is to maitai the PER loss below 0.5 db for a PER = 10-2, 64-QAM ad error correctio. The whole system is composed of two mai blocks (correlator ad CORDIC) that are reused i differet time itervals to perform all the ecessary operatios, so the required hardware resources are miimized. To verify it, the receiver is physically implemeted ad tested. Keywords: WLAN receiver, OFDM, sychroizatio, FPGA. I. INTRODUCTION IEEE 802.11a/g are WLAN stadards from IEEE [1], [2], which work i the 5 GHz ad 2.4 GHz bads ad achieve data rates up to 54 Mbps. I the physical layer, Orthogoal Frequecy Divisio Multiplexig (OFDM) was selected as the modulatio scheme due to its good performace o highly dispersive chaels, like the idoor scearios where these stadards are used. Data are trasmitted i bursts, always preceded by a preamble (Fig. 1). This preamble cosists of te idetical short symbols (SS) of 16 samples ad two idetical log symbols (LS) of 64 samples with a guard iterval (GI) of 32 samples. It is used for time ad frequecy sychroizatio ad chael estimatio. At the receiver, oce sigal detectio ad automatic gai cotrol (AGC) are completed (at sample i i Fig. 1), time sychroizatio begis: its purpose is to fid the startig poit

of GI (sample GI). So, a time referece is obtaied ad chael estimatio ca be correctly carried out by usig the LS, oce carrier frequecy offset (CFO) is corrected. Fig. 1. IEEE 802.11a preamble. Fig. 2 shows the block diagram of the WLAN base-bad trasceiver. I WLAN stadards the base bad OFDM sigal is built usig a 64-poit Iverse Fast Fourier Trasform (IFFT). The a guard iterval (also called cyclic prefix) of 16 samples is added to make the system robust to multipath ad to prevet Iter-Symbol Iterferece (ISI) from happeig, this prefix ca also be employed to have some tolerace of symbol timig errors. Sice the frequecy samplig is 20 MHz, each symbol is 4 s legth (80 samples), icludig a guard iterval of 800 s. To facilitate implemetatio of filters ad to achieve sufficiet adjacet chael suppressio, oly 52 subcarriers are used: 48 are data carriers (with modulatios types from BPSK to 64-QAM) ad 4 are pilots for phase trackig. This makes that the subcarrier spacig is 312.5 khz, ad the spacig betwee two outmost subcarriers is 16.25 MHz. Fig. 2. WLAN trasceiver. The followig stages are applied to the base-bad received sigal (see Fig. 3): frame detectio, time sychroizatio, coarse ad fie CFO estimatio ad correctio, OFDM demodulatio based o Fast Fourier Trasform (FFT), chael estimatio ad compesatio, ad phase trackig. Fig. 3. Receiver structure.

A importat fact to take ito accout i the receiver desig is that there is a latecy limitatio give by the Short Iter-Frame Space (SIFS) i IEEE 802.11a/g [1]. Therefore, the mai parameters to be cosidered i the desig or selectio of sychroizatio, chael estimatio ad phase trackig algorithms are ot oly performace ad hardware cost, but also latecy. We placed more emphasis o the sychroizer desig because the performace of the receiver strogly depeds o it. Our proposal is compared with other sychroizatio algorithms foud i the literature ([3], [4], [5], [6]). The algorithms selected for chael estimatio ad phase trackig are simple i order to miimize the hardware cost ad also the receiver latecy. I ay case, the implemeted receiver achieves the performace required by the stadard (sigal to oise ratio SNR at a 10% Packet Error Rate (PER) defied i [1] for differet trasmissio modes) ad also outperforms previous solutios foud i the literature ([7], [8], [9]), as will be show i Sectio VI.E. This performace was achieved thaks to the desig flow followed, which miimizes the implemetatio losses. The implemetatio of the receiver was doe accordig to the followig desig flow: first, a floatig-poit simulator of the IEEE 802.11a/g physical layer without the sychroizatio stage (ideal sychroizatio is supposed) was used to obtai the PER performace of the ideal receiver. The, floatig-poit models of the proposed algorithms for sychroizatio, demodulatio, equalizatio ad phase trackig were added to this simulator ad the PER plot was checked. After that, a fiite precisio model of each algorithm was desiged ad a fixed-poit aalysis of the complete receiver was made: the data path of the proposed receiver (Fig. 3) was quatified, startig at the receiver iput. At each stage we selected the miimum umber of bits that guaratees a PER loss lower tha 0.5 db with respect to the ideal receiver, for a PER = 10-2, modulatio of 64- QAM icludig the error correctio stage (Viterbi with Chael State Iformatio (CSI) [10]). Next, the fixed-poit receiver was desiged usig HDL (Hardware Descriptio Laguage) ad its output was compared to the fixed-poit simulator output. Fially, the receiver was implemeted o a prototype board coected to a logic aalyzer, ad its output was also compared to the fixed-poit simulator output. A importat cotributio of this work is how the hardware is reused i order to miimize hardware cost. I fact, the proposed receiver architecture is based o the reuse of the ext blocks i differet time itervals: the COordiate Rotatio DIgital Computer (CORDIC), the FFT ad the correlator. This paper is orgaized as follows. Sectio II presets the proposed sychroizatio algorithm ad it is compared to other algorithms foud i the literature. Sectios III, IV ad V describe the FFT processor, the chael estimatio algorithm ad the phase trackig algorithm, respectively. Sectio VI is devoted to the architecture proposed for the complete receiver, icludig a fiite-precisio aalysis, a diagram of the temporal use of the hardware resources, the descriptio of the receiver test, the performace results ad a compariso with other receivers foud i the literature. Fially, i Sectio VII, coclusios are derived. II. FRAME, TIME AND FREQUENCY SYNCHRONIZATION

I this sectio the structure selected for the sychroizer is described. First, the IEEE preamble must be detected. This is doe by meas of a auto-correlatio of the SSs ad, as a result, a coarse time referece is obtaied. This coarse time estimatio is ot precise eough to achieve the desired performace i multipath chaels ad at low SNR, so a fie time sychroizatio algorithm was applied: a cross-correlatio betwee the received sigal ad the kow traiig preamble LS. This cross-correlatio does ot work properly i presece of CFO, so before calculatig it, CFO is estimated ad compesated from the LS. Next, the algorithm proposed for each task (frame detectio ad coarse time sychroizatio, frequecy sychroizatio ad fie time sychroizatio) will be described, icludig the implemetatio details ad the fixed-poit aalysis. Fially, our sychroizer will be compared with other sychroizatio schemes i terms of performace ad hardware resources. A. Frame detectio, coarse time ad frequecy sychroizatio Frame detectio is the first step i the sychroizatio process. After that, it is ecessary to estimate a time referece: ˆ GI. This estimatio must be as accurate as possible to avoid ISI with previous or later symbols whe the FFT widow is take. Positive time offsets cause ISI because the FFT widow takes samples of the ext symbol. Moreover, some samples of the CP are affected by multipath chael, aalog filterig required i trasmissio ad receptio, ad iterpolate ad decimate filters [11]. A commo desig rule [12] is to accept that the sychroizatio is correct if deviatio from the ideal iitial sample GI is betwee 0 ad -4 samples, egative offsets higher tha 4 samples ca cause ISI i chaels with moderate to high delay spread. As stated above, the proposed algorithm is divided ito two parts: coarse ad fie time sychroizatio. Coarse sychroizatio is a adaptatio of the auto-correlatio method proposed by Schmidl ad Cox [13]. The received sigal is auto-correlated with a delay H of 16 samples ad averaged durig 144 samples: R r r 16 where T r [ r r 1 r 143] is a vector with 144 samples from the received sigal. As a result, a sigle peak is obtaied at sample = 160 where the trasitio betwee the last SS ad the GI is placed (see Fig. 4).

Fig. 4. Output of the auto-correlator 2 R. Next, the modulus of the auto-correlatio output is ormalized by the local mea power, P 2 r, of the received sigal. The timig metric is defied as [13]: 2 2 R R P. (1) Also, 2 R is averaged durig 5 samples to improve the estimatio of the maximum. To show this we calculated the probability of detectig the maximum outside a rage of ±2 samples. The SNR required to maitai this probability below 10-2 is reduced from 13.25dB to 10.5dB thaks to the average. After the average, a threshold Thr is set i order to fid the positio of the peak ( ˆ GI ), which is obtaied by searchig the maximum of 2 the averaged R betwee those samples that fulfill the coditio R > Thr. I order to circumvet the costly divisio operatio, the ext coditio was used i the implemeted desig: R 2 2 2 P Thr. Fig. 5 shows the block diagram of the proposed coarse time sychroizatio algorithm, which ca be directly mapped i a VLSI implemetatio. Also, a maximum detectio algorithm with low cost ad low latecy was implemeted: preset ad previous samples are compared durig two clock cycles. If the preset sample is lower tha the previous sample durig two clock cycles, the maximum is cosidered to have bee foud. (2)

Fig. 5. Block diagram of the proposed coarse time sychroizatio algorithm. The performace of this algorithm strogly depeds o the threshold selectio, so a theoretical aalysis of the algorithm was made: the probability distributios of R were used to set a threshold which miimizes the probability of ot detectig the begiig of the GI [14]. We fixed the threshold to 0.4375 at SNR equal to or higher tha 6 db, which guaratees that the probability of ot detectig the begiig of the GI is lower tha 1x10-3. Additioally, the multiplier eeded i (2) ca be efficietly implemeted as show i Fig. 6. Fig. 6. Multiplier Thr 2. Fig. 7 shows the deviatio error of the estimated ˆ GI with respect to the ideal poit GI at a SNR of 6 db i a multipath chael. Three BRAN chael models were used [15]: A, B ad C with a RMS (Root Mea Square) delay spread of 50 s, 100 s ad 150 s, respectively. This figure plots the relative frequecy of deviatio for 10 4 test frames (each oe trasmitted through a differet chael realizatio for each chael model). The miimum ad maximum deviatio hardly chages for the tested chael models; i fact, more tha 99.9 % of the frames were detected withi a deviatio from GI betwee 4 to +15 samples. This deviatio rage ecessitates the use of a fie time sychroizatio algorithm.

Fig. 7. Probability of the detectio error for chael A (cotiuous lie), B (dashed) ad C (dotted). fˆ Moreover, the CFO ( fˆ ) is estimated at ˆ GI as [12]: R, (3) 2 16T s where T s is the samplig period. We cofirmed that this coarse sychroizatio algorithm works without ay degradatio whe the maximum CFO allowed by IEEE 802.11a/g stadard [1], [2] occurs: 232 khz (73% of the subcarrier spacig); ad that, thaks to the large average legth selected for the auto-correlatio, the achieved CFO estimatio is precise eough for the highest modulatio order used i the stadard (64- QAM): the estimatio error has a stadard deviatio of 0.35% of the subcarrier spacig for SNR higher tha 20 db, which gives a Bit Error Rate (BER) below 10-5 i the floatigpoit receiver. Therefore, fie frequecy sychroizatio, which is usually estimated usig a autocorrelatio of the LS [12], is ot ecessary ad, as a result, the fial latecy of the sychroizer is cosiderably reduced. The agle of the autocorrelator output ( R ) is obtaied by usig the CORDIC i [16]. I [16], the CORDIC is optimized to be used i OFDM-based WLAN desigs, so it ca be reused i differet time itervals with differet operatio modes. Also, the CORDIC output is ormalized to [-1,1[, so divisio by 2 i (3) is avoided. The estimated CFO ( fˆ ) is used for the correctio of the CFO from LS samples (previous to fie time sychroizatio) ad later, for the correctio of the CFO from the received OFDM symbols; this is carried out reusig the same CORDIC. As the correlator ad the CORDIC are reused for several tasks, the fiite precisio aalysis results for both blocks are discussed i Sectio VII, where the complete system desig is described.

After frequecy sychroizatio, there is a residual CFO that cotiuously rotates the phase of the received OFDM sigal ad causes a costellatio rotatio [12]. After oly a few symbols, the costellatio poits have just rotated over the decisio boudaries, thus correct demodulatio is o loger possible. This effect forces the receiver to track the carrier phase each time a ew OFDM symbol arrives. This will be discussed i Sectio V. B. Fie time sychroizatio The proposed algorithm for fie time sychroizatio is based o a cross-correlatio betwee the received sigal ad the kow log traiig symbol (LS). The LS is quatized to values {-1, 1} for the real ad the imagiary parts to avoid the use of multipliers, so the cross-correlatio is efficietly implemeted as a wired complex filter similar to [17]. We evaluated the miimum legth of this cross-correlatio, which gives approximately the same performace as the 64-sample legth cross-correlatio with oquatized LS. This legth is 32 samples. The performace is measured i terms of timig error probability, defied as the probability of detectio outside the five-sample widow for those frames correctly detected. Our simulatios show that the timig error probability of the cross-correlator of 32 quatified coefficiets is oly 0.6% lower tha the probability of the cross-correlator of 64 floatig-poit coefficiets at 2dB SNR. For SNR higher tha 6 db, both cross-correlatios have a timig error probability lower tha 0.1%. The 32-sample legth cross-correlatio with quatized LS oly eeds 63 real adders, whereas the 64-sample legth cross-correlatio with o-quatized LS requires 64 complex multipliers ad 63 complex adders. Thus, a cross-correlatio of 32 samples is H T calculated as: C g 32 r', where r [r r r ] is a vector with 32 samples ' 1 31 T from the received sigal after CFO compesatio ad g 32 [ LS 0 LS1 LS31] is a vector with the first 32 quatized samples from the LS. The objective of this legth reductio is to miimize ot oly the hardware cost, but also the latecy (a reductio of 1.6 µs is obtaied). I ideal coditios, the cross-correlatio gives a large peak at sample = 224, that is, i the middle of the log traiig symbol ( ). Additioally, to reduce GI 64 the computatioal complexity, this cross-correlatio is oly calculated for a iterval widow of 20 samples, which is the deviatio give by our coarse time sychroizatio algorithm (betwee samples -4 ad +15, see Fig. 7). Therefore, the positio of the peak is estimated as: ˆ GI arg max C 2, (4) 64 1 2 1 ˆ 1 GI 64 15 ˆ 2 GI 64 ). beig 2 the iterval of 20 samples where the cross-correlatio is computed ( ad 4 Fig. 8 shows the implemeted cross-correlator as well as the maximum detectio circuit. After 20 clock cycles, time offset estimatio ca be read from register D2. Additioally, the results of the fixed-poit aalysis are also icluded. The time referece give by this fie time estimatio will be used to load correctly the average of both LS s i the FFT ad to save correctly the OFDM symbols i the iput buffer (iput DPRAM i Fig. 16). As ca be see i Fig. 16, this maximum search does ot icrease the receiver latecy.

Fig. 8. Implemetatio of the proposed fie time sychroizatio algorithm. C. Performace ad algorithm compariso I this sectio the performace of the proposed algorithm is compared to several sychroizatio algorithms which are specifically desiged for the IEEE 802.11a/g stadard: Troya et al. [5], Chag ad Kelly [3] ad the ML algorithm proposed i [4]. Algorithms proposed i [3] ad [4] oly iclude time sychroizatio, so they eed some additioal algorithm for frame detectio. I these cases, we assume that AGC ad sigal detectio are completed durig the third SS (i i Fig. 1 is radomly set followig a uiform distributio i the rage 32 to 47 as i [3]). Both algorithms, [3] ad [4], work without CFO correctio, ayway a algorithm for CFO estimatio is required to perform the chael estimatio correctly. Troya et al. [5] ad the proposed algorithm are similar, so ow we will explai the mai differeces betwee them. Like our proposal, Troya s solutio [5] is divided i two parts: coarse ad fie time sychroizatio based o auto-correlatio ad cross-correlatio, respectively. Moreover, i [5] some simplificatios to reduce hardware cost for VLSI implemetatio are itroduced, but at the expese of performace degradatio. First, i [5] the received sigal is auto-correlated with a delay of 64 samples ad the averaged durig 64 samples. As a result, a plateau of 32-sample legth is obtaied. The, the differece betwee de auto-correlator output ad a delayed versio of itself (delay of 32 samples) is calculated, so a peak is obtaied at sample = 160. To detect this peak ( ˆ GI ) a group peak detector ad a istataeous peak detector are used as described i [5]. I cotrast, our proposed coarse time estimatio directly obtais a large peak at sample = 160. O the other had, the maximum CFO that ca be estimated by usig the autocorrelatio proposed i [5] is 156.25 khz, which is lower tha the maximum CFO allowed by IEEE 802.11a/g stadard (232 khz), so aother auto-correlatio (with a delay of 16 samples ad averagig durig 16 samples) is required as explaied i [5]. Also, a combiatio of both estimatios is calculated. I cotrast, our proposal oly requires oe CFO estimatio, which reduces the hardware cost ad the receiver latecy. Like the algorithm we propose i this paper, [5] makes use of a cross-correlatio with the first 32 complex samples of the LS for fie time sychroizatio. Nevertheless, i [5] the cross-correlatio is calculated by usig oly the sig bits of the complex iput values ad the sig bits of the referece (positive samples are cosidered 1 ad egative samples 0 ). This simplificatio allows a drastic reductio of the hardware cost: complex multipliers are replaced by XNOR 1-bit complex multipliers [6]. Ufortuately, ot oly is the hardware cost reduced, but also the performace. Aother differece betwee Troya s work [5] ad the proposed fie time sychroizatio algorithm is where the crosscorrelatio is applied. I our proposal the cross-correlatio is calculated oly durig 20

samples, where we kow that its first peak is located thaks to the coarse time sychroizatio. So, the search of the positio of the peak is limited to 20 samples, which improves the performace of the fie time sychroizatio algorithm. I cotrast, Troya s cross-correlatio [5] begis after ˆ GI is detected, oce CFO is removed from the preamble, so it is calculated approximately durig 64 samples. Moreover, the positio of the first peak is foud by settig a threshold: the maximum is searched betwee those samples that exceed the threshold. For performace compariso, we tested the differet sychroizatio algorithms usig the followig coditios: BRAN chael models A, B ad C [15]; with additive white Gaussia oise i a rage of SNR values from 2 to 18 db; ad a CFO of 73% of the subcarrier spacig. For each sychroizatio algorithm, the success rate of the time estimatio was measured usig 10 4 test frames (each oe with a differet chael realizatio). For those algorithms that make use of thresholds, the performace results provided i this paper were obtaied cosiderig the optimal threshold (that is, the oe that gives the highest success rate) at each SNR. I this way, the give results are the best oes that ca be obtaied by these algorithms uder the test coditios. Fig. 9 shows the probability of detectio error (probability of detectig the startig poit of GI outside a rage of ±15 samples from the ideal sample). O the oe had, the poor results give by the algorithm proposed by Chag ad Kelley [3] are due to the symbol sychroizatio stage. We observed that whe the algorithm begis to work (istat i) at samples i the middle of a SS, the results are close to the 100% of success rate (see [3]), but whe it begis to work at samples ear the trasitio betwee two cosecutive SS s the performace is drastically reduced ( GI is ofte detected with a offset of 16 samples). O the other had, the frame sychroizatio stage of the ML method [4] does ot give a good estimatio of the chael order Lˆ at low SNR. As this estimatio is used by the symbol sychroizatio stage, the algorithm performace is pealized causig a high probability of detectio error, especially at low SNR. I coclusio, the best algorithms i terms of probability of detectio error are the proposed oe ad Troya s [5], although the latter [5] has a higher probability of detectio error tha the former: aroud 1 % at 6 db SNR. This is due to the fact that our auto-correlatio obtais a larger peak tha the auto-correlatio ad differetiatio proposed i [5].

Fig. 9. Probability of detectio error for chael A (cotiuous lie), B (dashed) ad C (dotted). Additioally, we tested the proposed algorithm i more pessimistic coditios for chael model A: the received preamble presets clippig durig its 48 first samples (3 SS). For a sigal clippig equal to the RMS value of the preamble, we achieve approximately the same probability of detectio error tha without clippig. However, this probability icreases for a sigal clippig 5dB below the RMS value of the preamble ad low SNR: 9.5 10-2 for 2dB ad 1.1 10-3 for 6dB. Moreover, we checked how the iput clippig affects to the coarse frequecy estimatio. Our simulatios show that the miimum SNR eeded to maitai the Bit Error Rate (BER) below 10-5 for 64-QAM is 28dB for chael model A. O the other had, the stadard deviatio of the estimatio error must be below 0.36% for a SNR loss lower tha 0.4dB [19]. For all the studied clippig coditios we obtai a stadard deviatio below 0.36%, for chael model A ad 28dB SNR. I coclusio, o fie frequecy estimatio is required Fig. 10 shows the timig error probability (probability of detectio outside the fivesample widow for those frames correctly detected). The proposed algorithm presets the best results: a timig error probability lower tha 0.1 % for chael A ad 6 db SNR. Troya s algorithm [5] presets a very poor performace: its error probability is higher tha 10 % at 10 db SNR ad chael A, due to the 32 legth cross-correlatio based o XNOR 1-bit complex multipliers. This performace ca be improved by extedig the legth of the referece to 64 samples as metioed i [5], but this would icrease the latecy ad the hardware cost.

Fig. 10. Timig error probability for chael A (cotiuous lie), B (dashed) ad C (dotted). I coclusio, the proposed algorithm has better performace (lower timig error probability ad probability of detectio error) tha the rest of studied algorithms. III. FFT/IFFT The desiged FFT/IFFT processor is basically composed of 2 dual-port memories ad a radix-2 decimatio-i-frequecy butterfly (BF), as show i Fig. 11. The dual-port memories are used to store the FFT/IFFT iput, itermediate ad output data. Data ca be saved or read from the memories at the same time that FFT is calculated. For each OFDM symbol, its 64 data samples are saved i the DPRAMs ad, after 384 clock cycles, the FFT result is obtaied. A FFT must be calculated every 4 µs, the duratio of a OFDM symbol sampled at 20MHz, to achieve a cotiuous data flow. So, the miimum clock is 96 MHz, but we selected 100 MHz because it is a etire multiple of the iput sigal frequecy. For this clock, a FFT result is obtaied each 3.84 µs. The FFT is disabled durig the other 0.16 µs.

Fig. 11. FFT/IFFT implemetatio. The fixed-poit simulatio shows that the FFT output precisio must be 10 bits for a PER loss of our receiver lower tha 0.5 db (PER = 10-2 ). IV. CHANNEL ESTIMATION After time ad frequecy sychroizatio, the frequecy respose of the radio chael must be estimated. The log symbols (LS) of the preamble, whose iitial sample is give by the fie time sychroizatio algorithm, are used to do this estimatio oce the CFO is corrected. The selected chael estimatio algorithm is performed i the frequecy domai. The cotets of the two log symbols are idetical, so they ca be averaged to improve the quality of the chael estimatio [12]. This average ca be calculated before the FFT, because the FFT is a liear operatio. The frequecy respose of the chael estimatio ( Ĥ ) is obtaied as follows: LS 1 LS C 2 X FFT ; 2 C H C ˆ X, (5) L where LS1 ad LS2 are the first ad the secod received LS, respectively, ad CL is the trasmitted LS i the frequecy domai. For simplicity, chael distortio is compesated by applyig the zero forcig (ZF) solutio: the received OFDM symbols are multiplied by the iverse of the chael estimatio. To avoid the divisio by a complex umber ( C X ) the iverse is calculated as: 1 CL 1 * 1 * C X CL C ˆ 2 X H C X C C C X C * L X X I Fig. 12 the implemetatio of the chael estimatio ad compesatio algorithms ca be see. CL, which is -1, 0 or 1, is saved i the ROM 64x2. The pre-calculated divisio. (6)

1/x is saved i the ROM 1024x16. First, the FFT of the averaged LS s LS 1 LS 2 / 2 is calculated. Next, Cx 2 is performed usig the brach that calculates the iput eergy i the autocorrelator scheme. After that, 1/ Cx 2 is read from the ROM 1024x16. At the same time, CL is read from the ROM 64x2 ad Cx * CL is obtaied with 2 multiplexers ad 2 complemeters. Fially, Cx * CL / Cx 2 is calculated by usig two real multipliers ad saved i two DPRAM (64 values). These memories will be read for each received OFDM symbol after the FFT operatio to equalize the received subcarriers (a complex multiplier is eeded). Fig. 12. Implemetatio of chael estimatio ad compesatio. Fig. 12 also shows the results of the fixed-poit aalysis. The precisio detailed i each stage guaratees that the PER loss of our receiver is lower tha 0.5 db for a PER = 10-2. V. PHASE TRACKING As commeted above, the residual CFO progressively rotates the phase of the received sigal. This rotatio is costat for the subcarriers of a OFDM symbol ad it is icremeted from oe OFDM symbol to aother. This rotatio causes a rotatio i the costellatio, which makes it impossible to perform a correct demodulatio after receivig a few OFDM symbols. To avoid this, the rotatio must be estimated ad compesated for each OFDM symbol, which is kow as phase trackig. The phase trackig scheme makes use of the four pilot subcarriers embedded i the OFDM symbols. The phase rotatio is detected by comparig the received pilot subcarriers ( R k ) agaist the kow pilot subcarriers ( P k ) i the frequecy domai. The phase estimate ˆ is obtaied by usig the estimated chael frequecy respose for the pilot subcarriers ( Ĥ ) [12]: k

ˆ 4 k 1 R k Hˆ k P k (7) The trasmitter geerates the pilot subcarriers of each OFDM symbol usig the vector S0...126 = {1, 1, 1, 1, -1, -1, -1, 1, -1, -1, -1, -1, 1, 1, -1, 1, -1, -1, 1, 1, -1, 1, 1, -1, 1, 1, 1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1, -1, -1, 1, 1, 1, -1, 1, -1, -1, -1, 1, -1, 1, -1, -1, 1, -1, -1, 1, 1, 1, 1, 1, -1, -1, 1, 1, -1, -1, 1, -1, 1, -1, 1, 1, -1, -1, -1, 1, 1,-1, -1, -1, -1, 1, -1, -1, 1, -1, 1, 1, 1, 1, -1, 1, -1, 1, -1, 1, -1, -1, -1, -1, -1, 1, -1, 1, 1, -1, 1, -1, 1, 1, 1, -1, -1, 1, -1, -1, - 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1}. Pilot subcarriers for the th trasmitted OFDM symbol are [1-1 1 1] if S = 1 ad [-1 1-1 -1] if S = -1. I Fig. 13 the implemetatio of the phase trackig algorithm ca be see. First, is obtaied, beig k = 7, 21, 43 ad 57: the four pilots are read from the FFT of LS1 LS 2 / 2 (positios 7, 21, 43 ad 57 of the output memory). As the trasmitted pilots of the LS are C L7 CL21 CL57 1 ad C L43 1, the chael estimatio for the four pilots is obtaied as: Hˆ 7 C X 7, Hˆ 21 C X 21, Hˆ 43 C X 43, Hˆ 57 C X 57, ad they are saved i the registers D. The vector S0...126 is saved i the ROM 127x1 (the value 0 is saved if S = 1 ad the value 1 is saved if S = -1) ad the state machie geerates [0 1 0 0] whe S = 1 ad [1 0 1 1] whe S = -1. The output of the state machie cotrols both multiplexers, whose output is k Pk 4 Ĥ. After that, Hˆ P k 1 k k k Ĥ k R is calculated by reusig the correlator (its complex multiplier ad its movig average). Fially, the estimated phase ˆ is obtaied with the CORDIC. For each OFDM symbol, the phase is estimated ad the equalized data subcarriers are compesated by this phase reusig the CORDIC. Fig. 13. Implemetatio of phase estimatio ad compesatio. Fig. 13 also shows the results of the fixed-poit aalysis. The precisio detailed i each stage guaratees that the PER loss of our receiver is lower tha 0.5 db for a PER = 10-2. VI. COMPLETE SYSTEM

I this sectio, we describe the architecture selected for the complete receiver. The proposed architecture miimizes the hardware cost because the mai blocks (the correlator ad the CORDIC) are reused for several tasks i differet time itervals. This reuse is cotrolled by a state machie. Each task eeds a differet quatificatio i order to achieve the desired performace (PER loss below 0.5 db for a PER = 10-2 ). Next, we study the required precisio of the correlator ad the CORDIC whe they are employed i each task ad, fially, the precisio of both blocks is selected (it will be the most restrictive oe). A. Correlator precisio The mai purpose of the correlator is frame detectio ad time sychroizatio, but it is also used i chael estimatio ad phase trackig. This block must carry out the followig operatios: 1) Frame detectio: it correlates the SS s of the preambles ad detects the peak at the ed of the last SS. It is eough with 5 iput bits. 2) Frequecy offset estimatio: the output of the correlator (R) at the peak is used to obtai this estimatio. It eeds almost 6 iput bits. 3) Chael estimatio: the squared modulus is calculated with the brach of the correlator that obtais P 2. The iput ad the output must have 10 bits. 4) Phase trackig: the complex multiplier ad the movig average of the correlator are used to accumulate the 4 complex products for phase trackig. The iput must also have 10 bits, ad the output 11 bits. The fixed poit correlator ca be see i Fig. 14, as well as the extra hardware added to make the reuse possible.

Fig. 14. Correlator block. B. CORDIC precisio The CORDIC i [16] is employed i two basic operatios: the calculatio of the agle of a complex umber (cofigured as circular vectorig mode), ad the correctio of a complex umber by a agle (cofigured as circular rotatio mode). I the sychroizatio process the CORDIC realize the followig tasks:

1) CFO estimatio: it eeds 10-bit iput data (IN_I, IN_Q) ad 10 bits at the output agle (O_phase). The iput agle (I_phase) must be 0. 2) CFO compesatio from LS: the PDU trai is 10-bit legth, so iput data (IN_I, IN_Q) must be 10-bit legth ad the output data (OUT_I, OUT_Q) must be 11-bit legth. The iput agle (I_phase) is the CFO estimatio divided by 16, so it must be 14-bit legth. 3) CFO compesatio from the OFDM symbols: it requires the same precisio as 2), the oly differece is that, i this case, data are read from the DPRAM where the OFDM symbols are saved. 4) Phase trackig estimatio: it requires 11-bit iputs ad 12 bits at the output agle. 5) Phase trackig compesatio: the 11-bit legth equalized subcarriers are coected to the iput data, ad the 12-bit legth phase estimatio is coected to the iput agle. Output data must be 11-bit legth. To sum up, the iput ad output data are 11-bit legth, the iput agle is 14-bit legth, ad the output agle is 12-bit legth. To achieve this output precisio a CORDIC with 11 iteratios is used. Fig. 15 shows how the CORDIC is reused, as well as, the precisio of its iputs/outputs. Fig. 15. CORDIC block. C. Temporal diagram Fig. 16a shows the temporal diagram for frame detectio, time ad frequecy sychroizatio ad chael estimatio. Fig. 16b shows the temporal diagram for chael estimatio ad compesatio, demodulatio ad phase trackig. As ca be see, there are two basic blocks, which are reused several times: the correlator ad the CORDIC processor. First, PDU trai is correlated ad a large peak is detected whe the IEEE preamble is preset. After that, a coarse time referece is obtaied ad the CFO is calculated usig the CORDIC. Next, the estimated CFO is compesated from the LS s ad the, the corrected LS s are cross-correlated to achieve fie tie sychroizatio. By usig this fie time referece, the OFDM symbols of the PDU trai are correctly saved i the iput buffer (iput DPRAM). The, chael estimatio begis: first, the average of the two LS s is obtaied ad loaded i the FFT; after 3.2µs the result is obtaied (Cx i Fig. 16b) ad the iverse of the chael estimate is calculated ad saved i a DPRAM as explaied i Sectio IV.

A OFDM symbol is received each 4µs ad the ext operatios are performed durig this time: first, the 64 data samples of the OFDM symbol are read from the iput DPRAM; ext, the estimated CFO is removed ad the corrected data samples are loaded i the FFT; after 4µs the result is read: first, the four pilots are read ad the phase is estimated by usig the correlator ad the CORDIC; ext, the 48 subcarriers are read from the FFT ad equalized; fially, the equalized subcarriers are rotated by the phase estimatio usig the CORDIC ad, the the sigal is ready to be demapped. As ca be see i Fig. 16, the first part of the process (frame detectio ad CFO estimatio) works at a sample rate of 20MHz (iput data rate), whereas the rest works at 100MHz. This is ecessary to allow the FFT to obtai a result every 4µs ad the CORDIC to do all the required operatios, ad also to miimize the total latecy of the system. The first output subcarriers are obtaied 24.6µs after the first sample of the preamble is received. Also, with this schedule, oly two OFDM symbols must be saved i the iput DPRAM. All blocks are disabled whe they are ot used i order to reduce the power cosumptio.

Fig. 16. Temporal use diagram. D. Hardware implemetatio

Several tests were doe to verify the desiged receiver, with three differet iput sigals: 1) QPSK (18 Mbits/s), SNR = 20 db ad CFO betwee trasmitter ad receiver of 150 khz; 2) 16-QAM (36 Mbits/s), SNR = 25 db ad CFO = 150 khz; ad 3) 64- QAM (54 Mbits/s), SNR = 30 db ad CFO = 150 khz. Each test frame was composed of 320 oise samples, the IEEE preamble ad 100 OFDM symbols. For each operatio mode, we selected a SNR that guaratees a BER lower tha 10-5 with error correctio ad the floatig-poit receiver (ideal sychroizatio) i our simulatios. Firstly, the desiged receiver was implemeted o a Virtex-II FG676 Proto Board, which was coected to a logic aalyzer (see Fig. 17). We checked that the output subcarriers of the implemeted receiver were idetical to the output subcarriers of the fixed-poit model of the receiver, for the same digital test iput. Secodly, we implemeted our receiver o the XtremeDSP Developmet Kit of Nallatech, i order to simulate a more realistic sceario ad observe the received costellatio. This developmet kit icludes a clock FPGA, a user FPGA (Xilix Virtex II XC2V6000), two high-speed ADC (Aalogue-to-Digital Coverter) ad two high-speed DAC (Digital-to- Aalogue Coverter). The aalogue test iputs were geerated by usig a ROHDE&SCHWARZ I/Q Modulatio Geeratio AMIC ad a Sigal Geerator SMIQ04B (see Fig. 18). The output was coected to a TDS210 digital oscilloscope i XY mode to view the output costellatio. As ca be see i Fig. 19, we obtaied the desired costellatio for each iput sigal. Fig.17. Virtex-II FG676 Proto Board test.

Fig.18. XtremeDSP Developmet Kit test. Fig.19. Output costellatios capture for (a) QPSK, (b) 16-QAM ad (c) 64-QAM. The implemeted receiver oly requires 2986 slices, 20 embedded multipliers ad 8 DPRAM. E. Performace results ad receiver structure compariso The performace results of the proposed receiver were obtaied by usig a simulator of the IEEE 802.11a/g physical layer ad icludig a fixed-poit model of the desiged receiver (frame, time ad frequecy sychroizatio, chael estimatio, equalizatio ad phase trackig). Simulatio coditios were: chael model A [15][14], 64-QAM (54 Mbits/s), error correctio (Viterbi with CSI [10]), 1000 frames (each frame was composed by 50 data packets of 1000 bytes ad had a differet chael realizatio). Fig. 20 compares the receiver PER plot for ideal ad real sychroizatio. The ideal sychroizatio plot was obtaied by usig a floatig-poit model of the receiver supposig that the sychroizatio is perfect (o time or frequecy sychroizatio errors) ad zero forcig chael compesatio, whereas the real sychroizatio plot was

calculated by meas of a fixed-poit model of the desiged receiver (each frame has time ad frequecy errors). As ca be see, the PER loss is oly 0.5 db at PER = 10-2. Our simulatios show that this loss is due to the quatificatio of the receiver (0.25 db) ad to the sychroizatio errors (for example, the floatig-poit phase trackig algorithm itroduces a loss of 0.2 db i the receiver). Fig. 20. PER performace. Fially, we compare our receiver with other solutios proposed i the literature ( [6], [7], [8] ad [9]), i terms of algorithm implemetatio cost ad performace. It is to be remarked that it is extremely difficult to make a fair compariso with these solutios due to the fact that differet techologies ad implemetatio strategies were utilized. Table 1 summarizes the SNR at a 10% PER required by 802.11a stadard ad the studied receivers, for AWGN chael ad differet data rates. Troya s PER performace is ot give i [6], so it caot be icluded i this compariso. As ca be see, our proposal outperforms the receivers implemeted previously ad it eeds smaller SNR at every data rate tha the oe required by the 802.11a stadard.

Rate (Mb/s) 9 (BPSK) 18 (QPSK) 36 (16-QAM) 54 (64-QAM) 802.11a required SNR [1] (db) Proposed (db) Referece [7] (db) Referece [8] (db) Referece [9] (db) 10.7 3.9 5.8 5.8 9.7 14.7 8.1 9.9 9.5 12.8 21.7 14.6 15.9 14.9 20.5 26.7 19.6 21.7 20.6 26.1 Table 1. Performace compariso at a 10% PER. I [7] a trasceiver architecture for OFDM-WLAN is desiged. It icludes sychroizatio, chael estimatio ad phase trackig. Few implemetatio details are give, so we oly compare some aspects of the architecture i other to commet o our cotributios. Coarse time sychroizatio i [7] is based o a auto-correlatio with a delay of 16 samples (the movig average is ot defied), ad fie time sychroizatio is based o a matched filter of 64 coefficiets. No performace results for the sychroizatio stage are give. This matched filter requires 64 complex multipliers ad 63 complex adders, whereas the cross-correlator that we employ for fie time sychroizatio oly eeds 63 real adders. CFO estimatio i [7] is doe i three steps: coarse CFO estimatio (usig the auto-correlatio with a delay of 16 samples), fie CFO estimatio (usig a auto-correlatio with a delay of 64 samples) ad CFO trackig usig the pilot subcarriers. CFO is corrected with a phase-locked loop (PLL). Our desig oly eeds oe CFO estimatio due to the large average used i the auto-correlatio ad, the CFO estimatio ad correctio is doe by reusig the same CORDIC, which reduces the hardware cost of our receiver. For phase trackig i [7], four phases are calculated, whereas oly oe phase is computed i our implemeted algorithm. I [8] ad [9] the algorithms used for sychroizatio, chael estimatio ad phase trackig are ot described, either their implemetatio. So, we oly compare the PER performace of the receiver i Table 1. Additioally, i [8] the implemetatio losses are also give: 1.6 db for 54Mb/s at a 10% PER, whereas our implemetatio losses are oly 0.45 db uder idetical coditios. I Sectio III.C, the mai differeces betwee the sychroizatio stage of [5], [6] ad the proposed algorithm have bee described, ad it has bee show that our sychroizatio algorithm outperforms the oe proposed i [5], [6]. Now, we will compare the implemetatio details of the complete receiver. For frequecy sychroizatio, [6] uses a CORDIC for CFO estimatio (16-iteratio CORDIC, 16-bit iputs, 13 bit-output) ad a umerically cotrolled oscillator NCO for CFO compesatio (768 full adders ad 533 registers). I cotrast, we oly use a 11-iteratio CORDIC to do the same operatios. For chael estimatio, [6] describes the implemetatio of a simplified versio of the chael estimatio algorithm proposed i [5], but o performace compariso is icluded. First, a ZF estimatio is obtaied by usig a uique LS (a 3 db pealty occurs with respect to averagig both LS s as we do i our proposal). Afterwards, the ZF referece is updated by meas of a decisio-feedback mechaism usig the decoded bits. This drastically icreases the latecy of the chael estimatio algorithm, due to the delay of the Viterbi decoder, ad also the hardware cost. The ZF estimatio i [6] requires a complex divider which is replaced by a complex multiplier

ad some memories where the limited umber of values that the decoded bits ca take (depedig of the modulatio scheme) are saved. The chael compesatio based o a complex divisio is absolutely ecessary i the solutio proposed i [6] due to the proposed residual phase correctio algorithm. This complex divisio is based o aother CORDIC implemetatio (16 iteratios, iputs data: 16 ad 32 bits). The implemetatio of the chael estimator proposed i this paper is based o the ZF solutio (averagig both LS s). It basically requires oe ROM where the iverse of a costat is saved (the cotet of this memory do ot deped o the modulatio scheme) ad a complex multiplier for chael correctio. The latecy ad the hardware cost of our proposal are lower because we do ot use ay feedback ad we do ot eed a complex divisio. For phase trackig, [5] proposes a phase estimatio algorithm which does ot eed either a arctaget block or a NCO for phase correctio. It requires 16 real adders ad 1 complex multiplier [18]. I our proposal we basically eed a accumulator of 4 complex products, a arctaget calculatio ad a phase rotator, but all these operatios are implemeted reusig the correlator ad the CORDIC, so o extra hardware is required. I coclusio, our receiver requires less hardware resources tha the receiver proposed i [6]. Fially, i cotrast to [6], our desig icludes a complete fixed-poit aalysis, as well as, the PER performace of the implemeted receiver ad the implemetatio losses. VII. CONCLUSIONS I this work, a practical solutio is give for the implemetatio of a OFDM-based WLAN receiver o a FPGA. The receiver is composed of frame, time ad frequecy sychroizatio, FFT-based OFDM demodulatio, chael estimatio ad equalizatio ad phase trackig. Emphasis is give to the sychroizer desig because the performace of the receiver strogly depeds o it. The proposed sychroizatio algorithm outperforms the oes i [3], [4], [5] ad [6]: it gives a probability of detectio error ad a probability of timig error below 0.1% at 6 db SNR ad chael model A [15]. Additioally, our sychroizer has low hardware cost ad low latecy due to the fact that oly oe CFO estimatio is eeded ad thaks to the simplificatios made o the cross-correlator. The algorithms proposed for chael estimatio ad phase trackig are simple. I ay case, the complete receiver achieves the desired performace: it outperforms the receivers implemeted previously ([7], [8] ad [9]) ad eeds smaller SNR at every data rate tha the oe required by the 802.11a stadard. This is maily due to the desig flow which was used, makig a complete fixed-poit aalysis which guaratees at each stage a PER loss below 0.5 db for a PER = 10-2, with a modulatio scheme of 64-QAM ad icludig the error correctio stage. Oce the precisio of the differet stages is determied, the desiged receiver is implemeted o a prototype board i order to verify it: first, we check that the output subcarriers are idetical to those obtaied by simulatio ad, the, we observe the output costellatios for differet modulatio schemes. The desig of the complete receiver is based o the reuse of the mai blocks (CORDIC, correlator ad FFT) ad so the fial hardware cost is reduced (oly 2986 slices, 20 embedded multipliers ad 8 DPRAM are required o a Virtex-II Xilix FPGA.). This reuse is possible thaks to the detailed schedule of the complete receiver that we preset i this paper.

I coclusio, the proposed implemetatio of a OFDM-based WLAN receiver achieves excellet performace with low cost ad low latecy (the first output subcarrier is obtaied 7.8µs after the first data sample of the first OFDM data symbol is received). EFERENCES [1] IEEE stadard 802.11a, Wireless LAN medium access cotrol (MAC) ad physical layer (PHY) specificatios: high-speed physical layer i the 5 GHz bad, Dec. 1999. [2] IEEE 802.11g: Wireless LAN specificatios: Further Higher Data Rate Extesio i the 2.4 GHz Bad, Jue 2003. [3] Sekchi Chag ad B. Kelley, Time sychroizatio for OFDM-based WLAN systems, Electroics Letters, vol. 39, o. 13, pp. 1024-1026, Jue 2003. [4] Yik-Chug, Ku-Wah Yip, Tug-Sag Ng, ad Erchi Serpedi, Maximum- Likelihood symbol sychroizatio for IEEE 802.11a WLANs i ukow frequecy-selective fadig chaels, IEEE Tras. o Wireless Commuicatios, vol. 4, o. 6, November 2005. [5] A. Troya, K. Maharata, M. Krstic, E. Grass, U. Jagdhold ad R. Kraemer, Efficiet Ier Receiver Desig for OFDM-based WLAN Systems: Algorithm ad Architecture, IEEE Tras. o Wireless Commuicatios, vol. 6, o. 4, pp. 1374-1385 April 2007. [6] A. Troya, K. Maharata, M. Krstic, E. Grass, U. Jagdhold ad R. Kraemer, Lowpower VLSI implemetatio of the ier receiver for OFDM-based WLAN systems, IEEE Tras. o Circuits ad Systems - I, 55 (2). pp. 672-686. March 2008. [7] Wei-Hsiag Tseg, Chig-Chi Chag, Chorg-Kuag Wag, Digital VLSI OFDM Trasceiver Architecture for Wireless SoC Desig, ISCAS 2005. [8] J. Thomso et al., A itegrated 802.11a basebad ad MAC processor, Dig. Tech. Papers IEEE ISSCC 2002, Feb. 2002, vol. 1, pp.126-127. [9] T. Fujisawa et al., A Sigle-Chip 802.11a MAC/PHY with a 32-b RISC processor, IEEE Joural of Solid State Circuits, vol. 38, o. 11, pp.2001-2009, November 2003. [10] Weo-Cheol Lee, Hyug-Mo Park, Kyug-Ji Kag ad Kue-Bae Kim, Performace aalysis of Viterbi decoder usig chael state iformatio i COFDM system, IEEE Tras. o Broadcastig, Vol. 44, No. 4, pp. 488-496, Dec. 1998. [11] M.J. Caet, F. Vicedo, J. Valls, V. Almear, Desig of a digital frot-ed trasmitter for OFDM-WLAN systems usig FPGA, ISCCSP 2004, Hammamet, Tuisia, 2004. [12] J. Heiskala, J. Terry. OFDM Wireless LANs: A theoretical ad practical guide. SAMS Publishig, 2001. [13] T. Schmidl, ad D. Cox. Robust Frequecy ad Timig Sychroizatio for OFDM, IEEE Tras. O Comm. Vol 45, No. 12, December 1997. [14] M.J. Caet, V.Almear, J. Marí-Roig ad J. Valls, Low Complexity Time Sychroizatio for WLAN, submitted to Digital Sigal Processig. [15] J. Melbo ad P. Schramm, Chael models for HIPERLAN/2 i differet idoor scearios, 3ERI085B, HIPERLAN/2 ETSI/BRAN cotributio, 1998. [16] F. Agarita, M.J. Caet, T. Sasaloi, A. Perez-Pascual, J.Valls, Efficiet mappig of CORDIC Algorithm for OFDM-based WLAN, Joural of Sigal Processig Systems, Vol. 52, No. 2, pp. 181-191, Aug. 2008.

[17] M.J.Caet, I. Wassel, V. Almear, J. Valls, Performace evaluatio of fie time sychroizer for WLANs, Proc. 13th europea Coferece o Sigal Processig (EUSIPCO 2005), Sept. 2005. [18] A. Troya, M.Krstic, K. Maharata, Simplified residual phase correctio mechaism for the IEEE 802.11a stadard, Proc. IEEE VTC-Fall 2003, vol. II, Oct. 2003. [19] T. Pollet, M. Va Bladel, M. Moeeclaey, BER sesitivity of OFDM systems to carrier frequecy offset ad Wieer phase oise, IEEE Trasactio o Commuicatios, vol. 45, No 2/3/4, Febrero, Marzo, Abril 1995, pp. 191-193.

Figure captios Fig. 1. IEEE 802.11a preamble. Fig. 2. WLAN trasceiver. Fig. 3. Receiver structure. Fig. 4. Output of the auto-correlator 2 R. Fig. 5. Block diagram of the proposed coarse time sychroizatio algorithm. Fig. 6. Multiplier Thr 2. Fig. 7. Cumulative distributio of the detectio error for chael A (cotiuous lie), B (dashed) ad C (dotted). Fig. 8. Implemetatio of the proposed fie time sychroizatio algorithm. Fig. 9. Probability of detectio error for chael A (cotiuous lie), B (dashed) ad C (dotted) without clippig the sigal. Fig. 10. Timig error probability for chael A (cotiuous lie), B (dashed) ad C (dotted). Fig. 11. FFT/IFFT implemetatio. Fig. 12. Implemetatio of chael estimatio ad compesatio. Fig. 13. Implemetatio of phase estimatio ad compesatio. Fig. 14. Correlator block. Fig. 15. CORDIC block. Fig. 16. Temporal use diagram. Fig.17. Virtex-II FG676 Proto Board test. Fig.18. XtremeDSP Developmet Kit test. Fig.19. Output costellatios capture for (a) QPSK, (b) 16-QAM ad (c) 64-QAM. Fig. 20. PER performace.