A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver

Similar documents
MODIFIED K-BEST DETECTION ALGORITHM FOR MIMO SYSTEMS

Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems

Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei

Iterative Soft Decision Based Complex K-best MIMO Decoder

FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver

Discontinued IP. IEEE e CTC Decoder v4.0. Introduction. Features. Functional Description

A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels

ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM

A High Throughput Configurable SDR Detector for Multi-user MIMO Wireless Systems

K-Best Decoders for 5G+ Wireless Communication

REAL-TIME IMPLEMENTATION OF A SPHERE DECODER-BASED MIMO WIRELESS SYSTEM

ABSTRACT. MIMO (Multi-Input Multi-Output) wireless systems have been widely used in nextgeneration

A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems

3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

SIC AND K-BEST LSD RECEIVER IMPLEMENTATION FOR A MIMO-OFDM SYSTEM

Keywords SEFDM, OFDM, FFT, CORDIC, FPGA.

Array Like Runtime Reconfigurable MIMO Detector for n WLAN:A design case study

Low-Complexity LDPC-coded Iterative MIMO Receiver Based on Belief Propagation algorithm for Detection

MULTIPLE-INPUT multiple-output (MIMO) systems

FPGA-Based Design and Implementation of a Multi-Gbps LDPC Decoder

Mehnaz Rahman Gwan S. Choi. K-Best Decoders for 5G+ Wireless Communication

A low cost soft mapper for turbo equalization with high order modulation

Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection

IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION

Comparative Study of the detection algorithms in MIMO

VLSI IMPLEMENTATION OF LOW POWER RECONFIGURABLE MIMO DETECTOR. A Thesis RAJBALLAV DASH

Using TCM Techniques to Decrease BER Without Bandwidth Compromise. Using TCM Techniques to Decrease BER Without Bandwidth Compromise. nutaq.

IMPLEMENTATION TRADE-OFFS FOR LINEAR DETECTION IN LARGE-SCALE MIMO SYSTEMS

Massively Parallel Signal Processing for Wireless Communication Systems

REALISATION OF AWGN CHANNEL EMULATION MODULES UNDER SISO AND SIMO

An FPGA 1Gbps Wireless Baseband MIMO Transceiver

FPGA Implementation of Digital Modulation Techniques BPSK and QPSK using HDL Verilog

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access

A GPU Implementation for two MIMO OFDM Detectors

Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems

Managing dynamic reconfiguration on MIMO Decoder

Research Article Application-Specific Instruction Set Processor Implementation of List Sphere Detector

SELECTIVE SPANNING WITH FAST ENUMERATION DETECTOR IMPLEMENTATION REACHING LTE REQUIREMENTS

Layered Space-Time Codes

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Iterative Decoding for MIMO Channels via. Modified Sphere Decoding

Configurable Joint Detection Algorithm for MIMO Wireless Communication System

1318 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 27, NO. 8, OCTOBER 2009

Partial Reconfigurable Implementation of IEEE802.11g OFDM

Implementation of Space Time Block Codes for Wimax Applications

Implementation of MIMO Encoding & Decoding in a Wireless Receiver

Digital Television Lecture 5

Algorithm and hardware design of a 2D sorter-based K-best MIMO decoder

Socware, Pacwoman & Flexible Radio. Peter Nilsson. Program Manager Socware Research & Education

Embedded Orthogonal Space-Time Codes for High Rate and Low Decoding Complexity

Vol. 4, No. 4 April 2013 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

Interference Mitigation in MIMO Interference Channel via Successive Single-User Soft Decoding

Realization of 8x8 MIMO-OFDM design system using FPGA veritex 5

NOWADAYS, many Digital Signal Processing (DSP) applications,

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi

A High-Throughput VLSI Architecture for SC-FDMA MIMO Detectors

Coding for MIMO Communication Systems

An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems

Advanced channel coding : a good basis. Alexandre Giulietti, on behalf of the team

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR

Supplemental Slides: MIMO Testbed Development at the MPRG Lab

Hardware implementation of Zero-force Precoded MIMO OFDM system to reduce BER

Improved concatenated (RS-CC) for OFDM systems

Design of 2 4 Alamouti Transceiver Using FPGA

6. FUNDAMENTALS OF CHANNEL CODER

Reduced-Complexity VLSI Architectures for Binary and Nonbinary LDPC Codes

Embedded Alamouti Space-Time Codes for High Rate and Low Decoding Complexity

BER Performance Analysis and Comparison for Large Scale MIMO Receiver

#8 Adaptive Modulation Coding

Vector-LDPC Codes for Mobile Broadband Communications

ABSTRACT. Parallel VLSI Architectures for Multi-Gbps MIMO Communication Systems. Yang Sun

Performance Evaluation of MIMO Spatial Multiplexing Detection Techniques

4x4 Time-Domain MIMO encoder with OFDM Scheme in WIMAX Context

STUDY OF THE PERFORMANCE OF THE LINEAR AND NON-LINEAR NARROW BAND RECEIVERS FOR 2X2 MIMO SYSTEMS WITH STBC MULTIPLEXING AND ALAMOTI CODING

Reduced Complexity by Incorporating Sphere Decoder with MIMO STBC HARQ Systems

EE 435/535: Error Correcting Codes Project 1, Fall 2009: Extended Hamming Code. 1 Introduction. 2 Extended Hamming Code: Encoding. 1.

Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity

Outline. Communications Engineering 1

EFFICIENT FPGA IMPLEMENTATION OF 2 ND ORDER DIGITAL CONTROLLERS USING MATLAB/SIMULINK

Performance Analysis of the D-STTD Communication System with AMC Scheme

An Alamouti-based Hybrid-ARQ Scheme for MIMO Systems

A Sphere Decoding Algorithm for MIMO

HARDWARE-EFFICIENT IMPLEMENTATION OF THE SOVA FOR SOQPSK-TG

Multilevel RS/Convolutional Concatenated Coded QAM for Hybrid IBOC-AM Broadcasting

Performance comparison of convolutional and block turbo codes

MIMO CONFIGURATION SCHEME WITH SPATIAL MULTIPLEXING AND QPSK MODULATION

Reception for Layered STBC Architecture in WLAN Scenario

Hardware Implementation of BCH Error-Correcting Codes on a FPGA

Low Power Efficient MIMO-OFDM Design for n WLAN System

Near Optimal Combining Scheme for MIMO-OFDM HARQ with Bit Rearrangement

Performance Analysis of n Wireless LAN Physical Layer

Performance Evaluation of V-Blast Mimo System in Fading Diversity Using Matched Filter

Merging Propagation Physics, Theory and Hardware in Wireless. Ada Poon

LDPC Decoding: VLSI Architectures and Implementations

II. FRAME STRUCTURE In this section, we present the downlink frame structure of 3GPP LTE and WiMAX standards. Here, we consider

Project. Title. Submitted Sources: {se.park,

CHAPTER 4 FIELD PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF FIVE LEVEL CASCADED MULTILEVEL INVERTER

Transcription:

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver Vadim Smolyakov 1, Dimpesh Patel 1, Mahdi Shabany 1,2, P. Glenn Gulak 1 The Edward S. Rogers Sr. Department of Electrical & Computer Engineering, University of Toronto 1 Department of Electrical Engineering, Sharif University of Technology 2 Department of Electrical and Computer Engineering University of Toronto Toronto, ON, Canada M5S 3G4 {svadim, pateldim, mahdi, gulak} @eecg.toronto.edu Abstract This paper presents a prototype of a highthroughput 4x4 64-QAM MIMO receiver consisting of a channel matrix QR decomposition, a soft-output K- Best MIMO detector and a Convolutional Turbo Code decoder. The proposed MIMO receiver provides low processing latency and a pipelined architecture scalable to a larger number of antennas and constellation order. Therefore, it is suitable for LTE-Advanced and IEEE 802.16m broadband wireless standards. A rapid prototyping platform interfacing MATLAB with Xilinx ISE was used in the development of the 4x4 64-QAM MIMO receiver. The receiver utilizes 96% of the slice LUTs and 78% of slice registers on Virtex-5 FX130T FPGA, operating at a maximum frequency of 125MHz. Keywords-Architecture and Implementation: 1. Programmable and Reconfigurable Architectures - MIMO receiver, OFDM, Soft-output K-Best detector, QR decomposition, CTC decoder, VLSI, FPGA, 802.16m, LTE Advanced. I. INTRODUCTION Multiple-Input Multiple-Output (MIMO) systems are commonly included in 4G wireless standards that require high data rates and high reliability of data transmission. Multiple antenna systems offer a number of advantages such as spatial multiplexing for higher capacity and diversity gain for improved bit error rate performance. However, the design of MIMO receivers carries a number of challenges. First, a MIMO detection scheme must be scalable as the number of antennas and the constellation order increase in order to meet high throughput requirements of emerging wireless standards, such as IEEE 802.16m and LTE- Advanced. Second, the performance of the detection algorithm must be close to that of a Maximum Likelihood (ML) detector. Third, an appropriate VLSI architecture is required to yield high-throughput, energyefficient implementation of the MIMO receiver. An FPGA prototyping platform that enables architectural verification of the receiver is particularly useful in developing answers to the above challenges. Moreover, a rapid prototyping system that interfaces Matlab with Xilinx ISE allows for accelerated simulations and systematic hardware development. The MIMO receiver presented in this paper was prototyped using a Virtex-5 FX130T FPGA platform. It consists of three main processing blocks: K-Best soft-decision MIMO detector, QR channel matrix decomposition (QRD), and a Convolutional Turbo Code (CTC) decoder. The presented receiver provides a low-complexity, highthroughput architecture with near-optimum ML performance scalable to a higher number of antennas and constellation order. II. SYSTEM MODEL Consider a MIMO system with N T transmit and N R receive antennas and the equivalent baseband model of the Rayleigh fading channel described by a complex-valued N R N T channel matrix H. The complex equivalent baseband model can be expressed as Ỹ = H S + Ṽ (1) where S =[ s 1, s 2,..., s NT ] T is the N T -dimensional transmitted signal vector, in which each element is drawn from a symmetric M-QAM constellation Ω={ M +1,..., 1, +1,...,+ M 1}, Ỹ = [ỹ 1, ỹ 2,..., y NR ] T is the received symbol vector Figure 1. System Diagram: 4x4 64-QAM MIMO Receiver. 978-1-4244-9721-8/10/$26.00 2010 IEEE 385 Asilomar 2010

and Ṽ =[ṽ 1, ṽ 2,..., v NR ] T is an independent identically distributed (iid) Gaussian noise vector. We assume that the channel H is quasi-static and is updated every four channel uses. A complex N R N T MIMO system can be modeled as an equivalent 2N R 2N T real system, Y = HS + V, via Real Value Decomposition (RVD). Thus, the objective of the MIMO detection system is to find the closest transmitted vector S based on the observation Y such that the Euclidean distance Y HS 2 is minimized: S = arg min Y HS 2 (2) S Ω 2N T The K-Best detector solves the above problem with linear complexity [1]. Furthermore, a soft-output K-Best detector provides a likelihood estimate of whether the transmitted bit was a zero or a one by computing a Log-Likelihood Ratio (LLR) for each received bit, defined as: LLR(x k y) =ln P [x k =1 y] P [x k =0 y] { } { } min X 1 i y Hs 2 σ 2 min X 0 i y Hs 2 σ 2 = MinPED 1 i MinPED 0 i (3) where (3) is derived based on the standard simplifications [3], and X 1 i and X 0 i represent all vectors X with bit X i being 1 and 0, respectively; the minimum Partial Euclidean Distance (PED) for the i th bit in X being 1 or 0 is denoted by MinPED 1 i and MinPED 0 i, respectively. LLR output is important in iterative Error Correction Codes (ECC) that require soft input for decoding, such as Low-Density Parity-Check (LDPC) and Convolutional Turbo Codes (CTC). The Soft K- Best detector operates on an upper triangular matrix R, generated from the channel matrix H by the QR decomposition block: H = QR. After a nulling multiplication of the received signal by Q H, the equivalent real-valued model can be expressed as: Z = RS + W (4) where Z = Q H Y, W = Q H V, and Q H Q = I since Q is a unitary matrix. The triangular nature of the matrix R is used to find the closest transmitted vector S in (2) as follows: 2N T S = arg min s j Ω 2N T i=1 2N T z i j=i R ij s j 2 (5) which can be thought of as a detection problem in a tree with 2N T levels and M children per parent node. The MIMO detection problem in (5), the LLR computation in (3), QR decomposition and the CTC decoder are implemented in the presented FPGA prototype. III. SYSTEM IMPLEMENTATION A VLSI implementation of the 4x4 64-QAM MIMO receiver is presented in this section. The receiver consists of three main processing blocks: K- Best soft-output MIMO detector (Soft K-Best) [1], QR decomposition (QRD) [4], and CTC decoder [5] reviewed below. A. K-Best Soft-Output MIMO Detector The architecture for the K-Best MIMO detector is presented in Figure 2. Figure 2. K-Best Soft-Output MIMO Detector Architecture. [2] The K-Best detector consists of 2N T =8stages, denoted L8 through L1, corresponding to the 8-level detection tree. The first level of the tree processes the last row in equation (5) due to the upper triangular structure of the matrix R. The Level-I block expands all M =8possible children in Ω and calculates the Partial Euclidean Distance (PED) for each path, forming K 1 at the output. For each of the paths in K 1, a child with the lowest PED, called the First Child (FC), is found in Level-II. The Sorter block sorts all 8 FCs from each path in 4 clock cycles and 1) Find the K-Best elements with minimum PED in level 1 (K 1). 2) For l =2:1:2N T Find elements with minimum PED in K l 1. Call this set C l. 3) For k =1:K 3.1) Select an element in C l with the lowest PED. 3.2) Announce this element as the next K-Best candidate in level l (add it to K l ). 3.3) Replace this element with the next lowest PED element. End End Table I THE ON-DEMAND HARD K-BEST MIMO DETECTION ALGORITHM [1] 386

transfers the sorted FCs from L7 to the next level L6. Note that every level contains two Processing Elements (PEs): PE-I and PE-II. The PE-I block takes the FCs of each level and with a PED sorter generates the K-Best paths of the current level, while retaining K-1 discarded paths (DP). The PE-II block receives the K-Best paths of the previous level oneby-one, computes their First Child (FC) and sorts the K-Best paths as they arrive. The process repeats for subsequent stages. The On-Demand Hard K-Best MIMO detection algorithm [1] is summarized in Table I. The soft-output extension is implemented based on the Modified K-Best Schnorr-Euchner (MKSE) scheme [9] by utilizing information contained in the discarded paths at intermediate tree levels to compute the LLR. The presented Soft K-Best detection scheme implements three improvement ideas that reduce computational complexity of MKSE: relevant discarded paths selection, last stage on-demand expansion, and relaxed LLR computation scheme [2]. B. QR Decomposition A QR decomposition (QRD) block decomposes the channel matrix H into a unitary matrix Q and an upper-triangular matrix R. The QRD core was designed for K-Best 4x4 MIMO detector with K=10. Thus, the K-Best detector requires a new input Z, according to (4), every K=10 clock cycles. Assuming that the channel is quasi-static and is updated every four channel uses, the QRD core was designed to generate a new 8 8 real-valued matrix R and four 8 1 vectors Z every 40 clock cycles. In order to meet the challenging 40 clock cycle latency requirement, a novel pipelined architecture employing unrolled CORDIC processors iteratively was developed for QR decomposition [4]. The architecture, shown in Figure 3, consists of six pipelined stages. The Input Controller and Output Controller stages interface the QRD core with the preceding and succeeding stages in the MIMO receiver. The four central Stages 1-4 decompose H into QR and compute the four Z vectors. The computation is carried out using multidimensional Givens rotations, Householder transformations, and the conventional (2D) Givens rotations to reduce computational complexity and achieve higher execution parallelism. The increase in throughput Figure 3. QR Decomposition Architecture [4]. arises from annihilating multiple Hi,j Re elements simultaneously via multi-dimensional Givens rotations and Householder reflections. The circuit complexity is reduced by implementing multi-dimensional rotations with a series of shift and add operations. Stages 1-4 contain independent Stage Controllers that provide control signals to direct appropriate data in and out of the CORDIC processor. The CORDIC modules are designed to minimize gate count by performing CORDIC iterations in each half of the clock cycle [4]. Thus, QRD implementation meets the processing latency specification of 40 clock cycles, while maximizing resource utilization and minimizing gate count. C. CTC Decoder The Convolutional Turbo Code (CTC) decoder performs iterative decoding of the channel data encoded in compliance with the IEEE 802.16 specification [6]. A Xilinx LogiCORE IP [5] consisting of parallel Soft-Input Soft-Output (SISO) processors was used to realize the CTC decoder. The top-level architecture of the decoder IP is shown in Figure 4, in which A and Figure 4. CTC Decoder Architecture [5]. B represent non-interleaved systematic LLR data; and Y 1, W 1, Y 2, W 2 are de-punctured and de-interleaved LLR parity data. The two SISO decoders exchange information using extrinsic signals ex1 and ex2 as shown in Figure 4. One complete iteration of the CTC occurs when both SISO decoders finish generating new extrinsics. Although, only 2 SISO decoders are shown in Figure 4, 8 SISO decoders were used in the FPGA prototype. IV. METHODOLOGY The MIMO receiver prototype was implemented on an Alpha-Data ADM-XRC-5T2 FPGA platform [7] using a Virtex-5 FX130T FPGA. A rapid prototyping methodology similar to [8] was employed in architectural development and verification of the 387

Virtex-5 FX130T FPGA Soft K-Best QRD CTC MIMO Receiver (4x4 64-QAM) Resource Summary: Number of Flip-Flops 23,970 5,013 N/A 29,810 Number of Adders 438 181 N/A 620 Number of Multipliers 14 0 N/A 14 Number of Block-RAM (298) 0%(0) 0%(0) 8%(24) 14%(44) Number of DSP48Es (320) 4%(14) 0%(0) 5%(16) 9%(30) Number of IOs 58 34 135 57 Device Utilization: Number of Slice Registers (81,920) 30%(24,761) 6%(5,026) 34%(28,341) 78%(64,272) Number of Slice LUTs (81,920) 39%(32,550) 7%(6,329) 39%(32,694) 96%(79,219) Timing Analysis: Maximum Frequency 125 MHz 125 MHz 125 MHz 125 MHz MIMO receiver blocks. This prototyping methodology provides: 1) Experimental grounds for improving the design at the algorithmic and architectural levels 2) Matlab integrated environment that simplifies prototype verification 3) An API interface developed in Matlab and C providing access to FPGA and external-to- FPGA resources. Table II FPGA RESOURCE AND PERFORMANCE SUMMARY. Figure 5 shows the Signal Flow Graph (SFG) of a PED sorter used in the Soft K-Best detector to sort K minimum PED elements in K/2 clock cycles implemented using the methodology described above. The methodology enables a direct connection between a Signal Flow Graph and a hardware prototype for fast algorithmic and architectural level verification. Figure 6. FPGA Methodology: Matlab/Simulink with Xilinx ISE/SysGen. Figure 6 shows the MIMO Receiver instantiated as a black box along with the source and sink interfaces used to generate and capture real-time data. The Matlab/Simulink environment is interfaced with the FPGA platform via PCI bus. The interface logic is provided as an IP module by Alpha-Data including API interfaces in Matlab and C. Additional functionality such as single step testing, user design templates and user register maps are included. Figure 5. FPGA Methodology: Signal Flow Graph of a PED Sorter; Implemented in Simulink and Prototyped with Xilinx ISE. V. RESULTS The 4x4 64-QAM MIMO receiver prototype was implemented on Virtex-5 FX130T FPGA. The design summary, including the main partitions, is presented in Table II. In addition to the soft-output K-Best detector (Soft K-Best), QR decomposition (QRD), and Convolutional Turbo Code (CTC) decoder, the MIMO receiver includes qrd-soft and soft-ctc interfaces. The interface logic implements the required input/output scheduling sequence for the three processing blocks. The MIMO receiver utilizes 96% of 388

slice LUTs and 78% of slice registers, operating at a maximum frequency of 125 MHz. The critical path of the receiver was found in the CTC decoder IP. The Soft K-Best detector contributes 70% of all addition operations and requires fourteen 13 13-bit multipliers that perform all multiplication operations, in contrast to the shift-and-add CORDIC processor arithmetic of the QRD. The CTC decoder IP occupies 8% of the FPGA Block-RAM, while the additional 6% are consumed by the qrd-soft and soft-ctc interfaces. Notice that the Soft K-Best [2] and the QRD [4] architectures integrated in the MIMO receiver do not require embedded memory. The DSP48E resources were used to accelerate computation in Soft K-Best and CTC decoder blocks. In addition, the IO count of the MIMO receiver is suitable for on-chip implementation. The following set of parameters was used in the implementation of the 4x4 64-QAM MIMO receiver prototype. The K-Best soft-output detector operated with K=10 and generated two LLR output streams on each edge of the clock, quantized to 13-bit word-length and 7 fractional bits. The QRD core was configured for eight CORDIC iterations with fixed-point wordlength precision of 16 bits with 7 fractional bits based on extensive Matlab simulations. The CTC decoder was parameterized to use eight parallel SISO decoders with a maximum data-block size of 600 bytes (2,400 bit-pairs) and an 8-bit LLR input resolution. The CTC decoder IP, which supports a maximum throughput of 189 Mbps [5], was configured for 8 iterations. The decoder was parameterized for increased throughput at the cost of FPGA resources. Figure 7 shows the BER performance comparison of K-Best MIMO detection schemes based on fixedpoint simulation results of the MIMO receiver. A 4x4 64-QAM modulation scheme with a code rate of 1/2 was assumed for a data-block size of 600 bytes and a quasi-static channel updated every 4 channel uses. The BER plot shows that the presented soft-output K-Best MIMO detector improves BER performance by approximately 2.3 db at BER =10 3 compared to the Hard K-Best implementation [1]. Figure 7 also shows that the presented QR decomposition achieves BER performance very close to the ideal QR decomposition based on Givens rotations, while providing the advantage of low-latency. VI. CONCLUSIONS An FPGA prototype of a 4x4 64-QAM MIMO receiver was presented. The prototype integrates a high-throughput low-complexity soft-output K-Best detector, QR channel matrix decomposition, and a Convolutional Turbo Code (CTC) decoder. The implemented MIMO receiver is scalable to large QAM Figure 7. BER Performance of K-Best MIMO Detection Schemes (4x4, 64-QAM, K=10). constellations and is thus suitable for applications defined by emerging IEEE 802.16m and LTE Advanced wireless standards. A rapid prototyping methodology that integrates Matlab with Xilinx ISE was used in the development of the MIMO receiver. ACKNOWLEDGMENTS The financial support by NSERC and IP blocks provided by Xilinx are gratefully acknowledged. REFERENCES [1] M. Shabany and P. G. Gulak, A 0.13um CMOS, 655Mbps, 4x4 64-QAM, K-Best MIMO Detector. IEEE International Solid-State Circuits Conference (ISSCC), pp. 256 257, 2009. [2] D. Patel, V. Smolyakov, M. Shabany and P. G. Gulak, VLSI Implementation of a WiMAX/LTE Compliant Low-Complexity High-Throughput Soft-Output K-Best MIMO Detector IEEE International Symposium on Circuits and Systems (ISCAS), pp. 593 596, May 2010. [3] B. M. Hochwald and S. ten Brink, Achieving Near- Capacity on a Multiple-Antenna Channel, IEEE Transactions on Communications, pp. 389 399, March 2003. [4] D. Patel, M. Shabany, and P. G. Gulak, A Low- Complexity High-Speed QR Decomposition Implementation for MIMO Receivers, IEEE International Symposium on Circuits and Systems (ISCAS), pp. 33 36, May 2009. [5] Xilinx Inc., IEEE 802.16e CTC Decoder v3.1, Product Specification, pp. 1 19, September 2008. [6] IEEE Standard for local and metropolitan area networks. Part 16: Air Interface for Broadband Wireless Access Systems, pp. 1035 1045, May 2009. [7] Alpha Data Inc. www.alpha-data.com, 2010. [8] L. G. Barbero and J. S. Thompson, FPGA Design Considerations in the Implementation of a Fixed- Throughput Sphere Decoder for MIMO Systems, International Conference on Field Programmable Logic and Applications (FPL), pp. 28 30, August 2006. [9] Z. Guo and P. Nilsson, Algorithm and Implementation of the K-Best Sphere Decoding for MIMO Detection, IEEE Journal on Selected Areas in Communications, pp. 491-503, March 2006. 389