SOFTWARE IMPLEMENTATION OF a BLOCKS ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Sitij Agarwal, Mayan Moudgill, John Glossner

Similar documents
The Optimal Employment of CSI in COFDM-Based Receivers

Performance Analysis of n Wireless LAN Physical Layer

2002 IEEE International Solid-State Circuits Conference 2002 IEEE

Mohammad Hossein Manshaei 1393

Implementation of High-throughput Access Points for IEEE a/g Wireless Infrastructure LANs

WLAN a Spec. (Physical Layer) 2005/04/ /4/28. WLAN Group 1

Lecture 3: Wireless Physical Layer: Modulation Techniques. Mythili Vutukuru CS 653 Spring 2014 Jan 13, Monday

Basic idea: divide spectrum into several 528 MHz bands.

Nutaq OFDM Reference

An FPGA 1Gbps Wireless Baseband MIMO Transceiver

Design and Simulation of COFDM for High Speed Wireless Communication and Performance Analysis

Anju 1, Amit Ahlawat 2

Practical issue: Group definition. TSTE17 System Design, CDIO. Quadrature Amplitude Modulation (QAM) Components of a digital communication system

Performance Analysis of WiMAX Physical Layer Model using Various Techniques

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR

OFDM and FFT. Cairo University Faculty of Engineering Department of Electronics and Electrical Communications Dr. Karim Ossama Abbas Fall 2010

High Performance Fbmc/Oqam System for Next Generation Multicarrier Wireless Communication

IEEE P Wireless Personal Area Networks

Block interleaving for soft decision Viterbi decoding in OFDM systems

Low Complexity Decoding of Bit-Interleaved Coded Modulation for M-ary QAM

ENHANCING BER PERFORMANCE FOR OFDM

Multi-carrier Modulation and OFDM


Implementation and Comparative analysis of Orthogonal Frequency Division Multiplexing (OFDM) Signaling Rashmi Choudhary

Wireless Medium Access Control and CDMA-based Communication Lesson 16 Orthogonal Frequency Division Medium Access (OFDM)

UNIFIED DIGITAL AUDIO AND DIGITAL VIDEO BROADCASTING SYSTEM USING ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING (OFDM) SYSTEM

Proposal for an OFDM-based BWA Air Interface Physical Layer. Re: In response to Call for Proposals for the BWA PHY layer from Sep 22, 1999.

Optimized BPSK and QAM Techniques for OFDM Systems

With a lot of material from Rich Nicholls, CTL/RCL and Kurt Sundstrom, of unknown whereabouts

An FPGA Case Study: Narrowband COFDM Video Transceiver for Drones, UAV, and UGV. Produced by EE Times

Comparison of BER for Various Digital Modulation Schemes in OFDM System

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi

Experimenting with Orthogonal Frequency-Division Multiplexing OFDM Modulation

Performance Analysis of OFDM for Different Digital Modulation Schemes using Matlab Simulation

Partial Reconfigurable Implementation of IEEE802.11g OFDM

4x4 Time-Domain MIMO encoder with OFDM Scheme in WIMAX Context

Outline / Wireless Networks and Applications Lecture 7: Physical Layer OFDM. Frequency-Selective Radio Channel. How Do We Increase Rates?

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access

Software-only implementation of DVB-H

One Cell Reuse OFDM/TDMA using. broadband wireless access systems

Performance of COFDM Technology for the Fourth Generation (4G) of Mobile System with Convolutional Coding and Viterbi Decoding

Performance Analysis of Concatenated RS-CC Codes for WiMax System using QPSK

Mobile & Wireless Networking. Lecture 2: Wireless Transmission (2/2)

Receiver Designs for the Radio Channel

Performance Evaluation of the PHY & MAC for WLAN Systems and Efficiency Improvement by Application of Convolution Codes

Digital Video Broadcast Library (DVB)

802.11a Hardware Implementation of an a Transmitter

Chapter 0 Outline. NCCU Wireless Comm. Lab

SOFTWARE IMPLEMENTATION OF THE

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

EC 551 Telecommunication System Engineering. Mohamed Khedr

Wireless LAN Consortium OFDM Physical Layer Test Suite v1.6 Report

HOW DO MIMO RADIOS WORK? Adaptability of Modern and LTE Technology. By Fanny Mlinarsky 1/12/2014

Analysis of WiMAX Physical Layer Using Spatial Multiplexing

Baseline Proposal for EPoC PHY Layer

Porting the p receiver on the ExpressMIMO Platform (LabSession OAI 2)

Software Implementation and Analysis of a Differentially Encoded DPSK Physical Layer Wireless Communication System on an SDR Baseband Processor

CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS

SECTION 4 CHANNEL FORMAT TYPES AND RATES. 4.1 General

A Software Defined OFDM Modulator

Chapter 3 Introduction to OFDM-Based Systems

Comparison of MIMO OFDM System with BPSK and QPSK Modulation

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Performance Evaluation of STBC-OFDM System for Wireless Communication

Local Oscillators Phase Noise Cancellation Methods

On the Performance of IEEE n: Analytical and Simulations Results

International Journal of Scientific & Engineering Research, Volume 5, Issue 11, November ISSN

Lecture 13. Introduction to OFDM

Baseline Proposal for EPoC PHY Layer IEEE 802.3bn EPoC September 2012 AVI KLIGER, BROADCOM LEO MONTREUIL, BROADCOM ED BOYD, BROADCOM

On Performance Improvements with Odd-Power (Cross) QAM Mappings in Wireless Networks

Performance Analysis of Cognitive Radio based WRAN over Rayleigh Fading Channel with Alamouti-STBC 2X1, 2X2&2X4 Multiplexing

Capacity Enhancement in WLAN using

A Guide. Wireless Network Library Ultra Wideband (UWB)

Reduced Complexity Software Receivers for TD-SCDMA Downlink

Orthogonal Frequency Division Multiplexing for Wireless Networks

Bit Error Rate Performance Evaluation of Various Modulation Techniques with Forward Error Correction Coding of WiMAX

Keysight Technologies Testing WLAN Devices According to IEEE Standards. Application Note

Channel Estimation by 2D-Enhanced DFT Interpolation Supporting High-speed Movement

Adaptive communications techniques for the underwater acoustic channel

IJMIE Volume 2, Issue 4 ISSN:

TCM-coded OFDM assisted by ANN in Wireless Channels

Using Modern Design Tools To Evaluate Complex Communication Systems: A Case Study on QAM, FSK and OFDM Transceiver Design

Performance Evaluation of IEEE STD d Transceiver

Orthogonal Frequency Division Multiplexing (OFDM) based Uplink Multiple Access Method over AWGN and Fading Channels

This chapter describes the objective of research work which is covered in the first

Codificación para los sistemas de comunicaciones

BER of OFDM system using concatenated forward error correcting codes (FEC) over Nakagami m fading channel

Power and Area Efficient Hardware Architecture for WiMAX Interleaving

G410 CHANNEL ESTIMATION USING LEAST SQUARE ESTIMATION (LSE) ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING (OFDM) SYSTEM

Available online at ScienceDirect. Procedia Technology 17 (2014 )

Asmaa Ali Rayan. Alaa Amr Othman. Maha Emad Abd El-Maqsoud. Under the Supervision of. Dr. Hassan Mostafa. Dr. Tawfik Ismail

Study on the next generation ITS radio communication in Japan

Evaluation of channel estimation combined with ICI self-cancellation scheme in doubly selective fading channel

WiMAX Basestation: Software Reuse Using a Resource Pool. Arnon Friedmann SW Product Manager

Diversity techniques for OFDM based WLAN systems: A comparison between hard, soft quantified and soft no quantified decision

IEEE Broadband Wireless Access Working Group < Proposal for an OFDM-based Air Interface Physical Layer

Performance Study of MIMO-OFDM System in Rayleigh Fading Channel with QO-STB Coding Technique

Research Letter Throughput of Type II HARQ-OFDM/TDM Using MMSE-FDE in a Multipath Channel

A New Data Conjugate ICI Self Cancellation for OFDM System

Feature (Claims) Preamble. Clause 1. Clause 2. Clause 3. Clause 4. Preamble. Clause 1. Clause 2. Clause 3. Clause 4

Transcription:

SOFTWARE IMPLEMENTATION OF 802.11a BLOCKS ON SANDBLASTER DSP Vaidyanathan Ramadurai, Sanjay Jinturkar, Sitij Agarwal, Mayan Moudgill, John Glossner Sandbridge Technologies, 1 North Lexington Avenue, White Plains, NY 10601 vramadurai@sandbridgetech.com ABSTRACT In this paper, we describe the design and implementation of software blocks for 802.11a receiver on Sandblaster DSP. A software solution provides high reusability, low cost and short development time when compared to dedicated hardware solutions. A significant challenge faced is in achieving high throughput and stringent latency requirements. 802.11a is an IEEE standard that operates in the 5GHz band using Orthogonal Frequency Division Multiplexing (OFDM). OFDM divides a data signal across 48 separate sub-carriers to provide higher data rates and minimize the multi-path propagation effects. The standard supports multiple data rates from 6Mbps to 54Mbps and involves high computational complexity. The steady state 802.11a receiver consists of an FFT and removing pilot/dc, demapper, deinterleaver, depuncture, FEC decoder and CRC. We explore techniques for optimizing individual blocks and also combining multiple blocks to increase the overall performance and to meet real time throughput and latency requirements. There is significant data movement between individual blocks like FFT, demapper, deinterleaver and depuncture and we explain how multiple blocks could be coupled to significantly reduce the instruction cycle count as well as data transfers. Traditional software deinterleavers have been implemented using table look-ups. We explain how tablelook ups could be merged with other compute intensive and data intensive blocks like demapper and depuncturer thereby speeding up the entire system. Instead of optimizing blocks for a specific data rate, we propose optimizations that could be exploited for any data rate specified by the 802.11a standard. 1. INTRODUCTION The OFDM system provides a wireless LAN with data payload communication capabilities of 6, 9, 12, 18, 24, 36, 48, and 54 Mbit/s. The support of transmitting and receiving at data rates of 6, 12, and 24 Mbit/s is mandatory. The system uses 52 subcarriers that are modulated using binary or quadrature phase shift keying (BPSK/QPSK), 16-quadrature amplitude modulation (QAM), or 64-QAM. Forward error correction coding (convolutional coding) is used with a coding rate of 1/2, 2/3, or 3/4. To handle the high data rate requirements, several hardware based solutions like ASICS and FPGAs exist for 802.11a/g. However, such solutions lack the flexibility and reusability of software based solutions. Also, software based methods reduce time to market by quick modifications. In this paper, we consider a software implementation of 802.11a blocks on SandBlaster DSP. Since 802.11a demands very high throughput and real time latency requirements, a software implementation of the baseband functions becomes very challenging. Section 2 gives a brief overview of SandBlaster DSP architecture. Section 3 provides an introduction to the 802.11a receiver. In section 4, we show the software implementation and optimization of demapper. In Sections 5 and 6 we will discuss the optimizations of deinterleaver and depuncturer respectively. Finally, conclusions are drawn in section 7. 2. SANDBLASTER DSP Sandbridge Technologies has developed the Sandblaster architecture for a convergence device [1,2]. The Sandblaster architecture supports the data types necessary for convergence devices including RISC control code, DSP, and Java. As shown in Figure 1, the design includes a unique combination of modern techniques such as a SIMD /DSP unit, a parallel reduction unit, and a RISCbased integer unit. Each processor core provides support for concurrent execution for up to eight threads of execution. All states may be saved from each individual thread and no special software support is required for interrupt processing. The machine is partitioned into a

RISC-based control unit that fetches instructions from a set-associative instruction cache. Instruction space is conserved through the use of compounded instructions that are grouped into packets for execution. The memory subsystem has been designed carefully to minimize power dissipation. The pipeline design in combination with the memory design ensures that all memories are single ported and yet the processor can sustain nearly 4 taps per cycle for a filter (the theoretical maximum) in every thread unit simultaneously. A RISCbased execution unit, depicted in the center of Figure 1, assists with control processing. CR LR CTR External Thread Cache Memory Instruction code Branch ADD PC Integer IQ Offset Data Memory RA Data Buffer Register ADD Fig 1: Sandblaster DSP For the control code, a 16 entry, 32-bit register file per thread unit provides for very efficient control processing. Common integer data types are typically stored in the register file. This allows for branch bounds to be computed and addresses to be efficiently generated. Intensive loop processing is performed in the SIMD/ unit depicted on the right side of Figure 1. Each cycle, a 4x16-bit vector may be loaded into the register file while two vectors are being multiplied, saturated, reduced (e.g. summed), and saturated again. The branch bound may also be computed and the instruction looped on itself until the entire vector is processed. This may be specified in as little as 64-bits. To enable signal processing in software, the processor supports many levels of parallelism. Thread-level parallelism is supported by providing hardware support for up to 8 independent programs to be simultaneously active on a single Sandblaster core. This minimizes the latency in physical layer processing. Since many algorithms have stringent requirements on response time, multithreading is an integral technique in reducing latencies. The data-level parallelism (SIMD) is supported through the use of a unit. RB IQ Offset ADD ADD ADD ADD ACC ACC ACC ACC SAT 3. 802.11a RECEIVER The OFDM modulation scheme used in 802.11a distributes the data over 52 subcarriers on a 20MHz channel to mitigate the effects of multipath. Among the 52 subcarriers, 48 are for data and 4 are for pilot signals used for tracking. Each subcarrier is 312.5kHz wide, giving raw data rates from 125kbits/s to 1.125Mbits/s per subcarrier depending on the modulation type binary phase shift keying (BPSK), quaternary PSK (QPSK), 16-quadrature amplitude modulation (QAM), or 64-QAM and the error-correcting code rate (1/2, 2/3, or 3/4). The composite signal therefore has a data rate ranging from 6Mbits/s to 54Mbits/s in the 20MHz channel [11]. Table 1 lists the mode-dependent parameters for the 802.11a standard. Data rate (Mbits/s) IEEE 802.11a Table 78 Rate-dependent parameters Coded bits Data bits Coded bits Coding per per per Modulation rate OFDM OFDM subcarrier (R ) symbol symbol (NBPSC) (NCBPS) (NDBPS) 6 BPSK 1/2 1 48 24 9 BPSK 3/4 1 48 36 12 QPSK 1/2 2 96 48 18 QPSK 3/4 2 96 72 24 16-QAM 1/2 4 192 96 36 16-QAM 3/4 4 192 144 48 64-QAM 2/3 6 288 192 54 64-QAM 3/4 6 288 216 Table1 The baseband physical layer block diagram of an 802.11a receiver is shown in Fig 2. Remove GI scramble Conv coder FFT Channel Eq puncture Fig 2. 802.11a receiver physical layer mapper interleave The digital data is frequency corrected and after removing the guard intervals given to an. FFT block. The FFT

converts the time domain samples to frequency domain sub-carriers. 802.11a uses a total of 52 sub-carriers out of which 48 are data symbols are 4 are pilots. A channel equalizer is then employed to avoid inter symbol interference (ISI) and inter carrier interference (ICI). The demapper takes the 48 complex symbols and depending on the type of modulation demodulates the subcarriers into data bits (hard/soft bits). The demodulated bits are deinterleaved, depunctured, decoded using a convolutional decoder and descrambled. In this paper, we will consider the following blocks implemented on SandBlaster DSP: 1. mapper 2. interleaver 3. puncuturer for k = 1:48 %% termine the 3 inphase bits chmagsq(k) = chi(k).^2 + chq(k).^2; % mag sq of each entry of the vector bit0 = rxi(k)*chi(k) + rxq(k)*chq(k); bit1 = 4 * chmagsq(k) - abs(bit0); bit2 = 2 * chmagsq(k) - abs(bit1); %% Repeat computations to get the 3 quadrature bits bit3 = -rxi(k)*chq(k) + rxq(k)*chi(k); bit4 = 4 * chmagsq(k) - abs(bit3); bit5 = 2 * chmagsq(k) - abs(bit4); 4. DEMAPPER We employ the soft demapper with the channel state information (CSI) for 16QAM and 64QAM demodulation as described in [6]. It has been shown that the simplified soft output demapper greatly outperforms the hard decision demapper [6]. The matlab code for the 64QAM is shown in Fig. 3. chi, chq are the CSI and rxi, rxq are the inputs to the demapper. Every sub-carrier produces 6 soft output bits after demodulation. The equivalent C code implemented in SandBlaster DSP is shown in Fig.4 The loop is written in such a way that it gets vectorized, i.e. for e.g. bit0 for 4 sub-carriers can be computed at the same time. 4 input elements and 4 CSI elements can be loaded, multiplied and added to get bit0 from each of the 4 sub-carriers. This is done similarly for calculating the other bits from all the sub-carriers. 5. DEINTERLEAVER In 802.11a, all encoded data bits shall be interleaved by a block interleaver with a block size corresponding to the number of bits in a single OFDM symbol. The interleaver is defined by a two-step permutation. The first permutation ensures that adjacent coded bits are mapped onto nonadjacent subcarriers. The second ensures that adjacent coded bits are mapped alternately onto less and more significant bits of the constellation and, thereby, long runs of low reliability (LSB) bits are avoided. The first permutation is defined by the rule: i = s floor(j/s) + (j + floor(16 j/ncbps)) mod s {j = 0,1, NCBPS 1 The value of s is determined by the number of coded bits per subcarrier, NBPSC, according to s = max(nbpsc/2,1) Fig 3. Matlab code for 64QAM demapper for(k=0; k<48; k++){ softbits_t[48*0+k] = (rxi[k]*chr[k] + rxq[k]*chi[k]) >> QAM64_SF; softbits_t[48*1+k] = 4*chMagSq[k] SB_ABS(softBits_t[48*0+k]); softbits_t[48*2+k] = 2*chMagSq[k] SB_ABS(softBits_t[48*1+k]); softbits_t[48*3+k] = (rxq[k]*chr[k] rxi[k]*chi[k]) >> QAM64_SF; softbits_t[48*4+k] = 4*chMagSq[k] SB_ABS(softBits_t[48*3+k]); softbits_t[48*5+k] = 2*chMagSq[k] SB_ABS(softBits_t[48*4+k]); Fig 4. C code for 64QAM demapper The second permutation is defined by the rule: k = 16 i (NCBPS 1)floor(16 i/ncbps), where {i = 0,1, NCBPS 1 NCBPS is the number of coded bits per symbol and NBPSC is the number of coded bits per sub-carrier. Let us consider the deinterleaver for data modulated by 16QAM. Here the number of soft bits from the demapper is 192. We will observe the deinterleaver permutation by combining permutations 1 and 2 mentioned above. We will name the indices from 0 to 191. Given below is the input for the first 60 elements and final deinterleaved output of those 60 elements. Note that permutation 2 also implicitly produces a (12x16) transpose of the final output. If we ignore the transpose and look at

the pattern of the data shuffling, row 0, row 2 and row 4 are unchanged, in rows 1 and 3 the elements are shuffled as {x1,x0,x3,x2 taken 4 elements {x0,x1,x2,x3 at a time. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 0 1 2 3 4 5 6 7 8 9 10 11 13 12 15 14 17 16 19 18 21 20 23 22 24 25 26 27 28 29 30 31 32 33 34 35 37 36 39 38 41 40 43 42 45 44 47 46 48 49 50 51 52 53 54 55 56 57 58 59 Hence we can deinterleave 4 elements at a time, either using just a vector copy or a vector shuffle and a vector copy. Fig 5 shows the code snippet for the 16QAM case. The code shows only the shuffling for 4 elements for 4 rows. This will be executed 4x3 times for all rows and all columns. By inlining and software pipelining, the 16QAM deinterleaving is done in ~200 cycles. A table lookup or even direct copy would have taken atleast 384 cycles. Also, by doing in-place deinterleaving, we could perform the data shuffling only for the odd rows. This would improve the cycle count further by 50%. A similar approach has been used for all other modulation schemes like BPSK, QPSK and QAM64 which produces different number of softbits. 6. DEPUNCTURER Puncturing is a procedure for omitting some of the encoded bits in the transmitter (thus reducing the number of transmitted bits and increasing the coding rate) and inserting a dummy zero metric into the convolutional decoder on the receive side in place of the omitted bits. In 802.11a, there are two puncturing modes, rates r = 2/3 and r = ¾. void qam16_deinterleave( { int i; int j; short A[16]; A[4*0+0] = X[instride*0+0]; A[4*0+1] = X[instride*0+1]; A[4*0+2] = X[instride*0+2]; A[4*0+3] = X[instride*0+3]; A[4*1+0] = X[instride*1+1]; A[4*1+1] = X[instride*1+0]; A[4*1+2] = X[instride*1+3]; A[4*1+3] = X[instride*1+2]; A[4*2+0] = X[instride*2+0]; A[4*2+1] = X[instride*2+1]; A[4*2+2] = X[instride*2+2]; A[4*2+3] = X[instride*2+3]; A[4*3+0] = X[instride*3+1]; A[4*3+1] = X[instride*3+0]; A[4*3+2] = X[instride*3+3]; A[4*3+3] = X[instride*3+2]; ) Fig 5. C code for 16QAM deinterleaving A0 A1 A2 A3 A4 A5 A6 A7 A8 B0 B1 B2 B3 B4 B5 B6 B7 B8 Fig 6. r=3/4 depuncture short * restrict X, short * restrict Y, int instride, int outstride for(i=0; i<4; i++){ for(j=0; j<4; j++){ Y[outstride*j+i] = A[i*4+j]; In r = ¾, there are two zero bits inserted for every 4 bits as shown in Fig 6. The C code for QAM16, ¾ depuncture is also shown. The input to the depuncture is 192 soft bits from the QAM16 demapper. The ¾ depuncture produces 288 bits output after inserting zeros. In here again, the copy {B2,A3,B3,A4, {B5,A6,B6,A7, etc vectorizes.

k = 0; out[0] = in[k++]; out[1] = in[k++]; out[2] = in[k++]; for(m=5; m<288; m+=6){ for(i=0; i<4; i++){ out[m+i] = in[k++]; Fig 6. QAM16 r=3/4 depuncture A table lookup based method can also be used that will combine the three blocks: demapper, deinterleaver and depuncturer. The output soft bits from the demapper could be directly routed to its appropriate location after depuncturing by using a lookup table. This method would be optimal in terms of memory access as we could avoid the data transfers from demapper to deinterleaver and from deinterleaver to depuncturer. Also this method is very useful in parallelizing the blocks. One could partition the sub-carriers across multiple threads and each thread could independently process the sub-carriers from demapping up to depuncturing. In SandBlaster DSP we use 8 threads to process the 48 sub-carriers from demapper to depuncturing to keep up real time requirements. Since every 802.11a symbol arrives in 4uSec, a 75MHz thread processor has to complete processing a symbol in 300 cycles. In this case, every thread works on 6 symbols or sub-carriers and run completely in parallel. [2] Sanjay Jinturkar, John Glossner, Mayan Moudgill, Erdem Hokenek, Programming the Sandblaster Multithreaded Processor, GSPx 2003. [3] IEEE 802.11b-1999, Wireless LAN medium access control (MAC) and Physical layer (PHY) Specifications: High Speed Physical Layer Extension in the 2.4 GHz Band, 1999. [4] IEEE 802.11a-1999, Wireless LAN medium access control (MAC) and Physical layer (PHY) Specifications: High Speed Physical Layer in the 5 GHz band, 1999. [5] M.J. Meeuwsen, O. Sattari, and B.M. Baas, A full-rate software implementation of an IEEE 802.11a compliant digital baseband transmitter, Proc. of IEEE Workshop on Signal Processing Systems (SIPS 2004), pp. 124-129, Oct. 13-15, 2004. [6] F. Tosato and P. Bisaglia, Simplified soft-output demapper for binary interleaved COFDM with application to HIPERLAN/2, in Proc. IEEE ICC 2002, vol. 2, 2002, pp. 664 668. 7. CONCLUSION In this paper, we have described a software implementation of 802.11a receiver blocks on SandBlaster DSP. We have discussed critical blocks like demapper, deinterleaver and depuncture and their software optimizations. Instead of using traditional table lookup based methods for interleaving, we have found special patterns on data shuffling for these blocks. Such patterns have been exploited to optimize the blocks. Also, table lookup based methods have been used to merge multiple blocks and achieve thread level parallelism. REFERENCES [1] John Glossner et al, Sandblaster low power DSP, IEEE 2004 Custom Integrated Circuits Conference, 2004, pp 575-581.