A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels

Size: px
Start display at page:

Download "A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels"

Transcription

1 A Flexible VLSI Architecture for Extracting Diversity and Spatial Multiplexing Gains in MIMO Channels Chia-Hsiang Yang University of California, Los Angeles Challenges: 1. A unified solution to span the entire diversity-multiplexing tradeoff curve 2. Tradeoff between two search methods Depth-first: ML performance with variable throughput and long latency K-best: near ML performance with constant throughput and short latency 3. Antenna array size beyond 4 4 Area increases quadratically with the number of transmit antennas Critical path increases linearly with the number of transmit antennas 4. Modulations beyond 16-QAM Hardware increases quickly with the constellation size Longer latency introduced by the minimum search circuit 5. Multiple sub-carriers Research Contributions: 1. A unified sphere decoder architecture for extracting diversity and spatial multiplexing gains in MIMO channels 2. Signal processing techniques to support antenna sizes up to Folding: hardware area increases linearly with antenna array size Loop retiming: reduces the critical path Data interleaving: supports multiple independent sub-carriers A region partition enumeration method for constellations up to 64-QAM 3. A flexible architecture Antenna array: 2 2 to Modulations: BPSK to 64-QAM Number of sub-carriers: 16 to 128 Search method: K-best or depth-first search 4. A simplified multiplier Numerical strength reduction Gray coding to reduce number of operations 5. A multi-core architecture for enhanced performance 1

2 Abstract Sphere decoding algorithm is widely used in MIMO communications, because of its ability to approach maximum likelihood detection with significantly reduced computational complexity. This makes it attractive for hardware implementation; however, prior work focused only on solutions with fixed number of antennas or fixed modulations. This work presents a unified sphere decoder architecture that deploys diversity-multiplexing tradeoff in MIMO channels by taking advantage of the flexibility in the number of antennas and modulation schemes. Several signal processing and circuit techniques are constructively combined to reduce the hardware complexity: a 20 times area reduction is achieved even without interleaving of subcarriers compared to direct-mapped architecture. The proposed flexible architecture supports antenna arrays from 2 2 to 16 16, modulations from BPSK to 64-QAM, over 16 to 128 sub-carriers. The peak estimated data rate exceeds 1.5 Gbps ideal throughput using a 16 MHz bandwidth in just 0.55 mm 2 in a standard 90 nm CMOS process. I. INTRODUCTION Multi-input multi-output (MIMO) communication has recently received significant attention due to its potential to increase link robustness and channel capacity. Hardware realization of MIMO signal processing algorithms is quite challenging, because it requires multi-dimensional, matrix-based, computations. However, with the growing demand for higher data transmission rates over wireless links, the need of devices equipped with multiple antennas increases. Among various MIMO algorithms, sphere decoding is one of the most promising solutions. It approximates the information theoretic bound, set by the maximum likelihood (ML) detection, with several orders of magnitude lower computational complexity [1] [2]. This means that, for a given hardware cost, the reduced complexity could be utilized to increase the size of antenna array and effectively improve the performance beyond the ML performance of a system with smaller array size. The complexity reduction is achieved by transforming an exhaustive search of the ML decoders into a tree search procedure of sphere decoding. Tree search is quite popular in other communications areas such as multi-user detection (MUD) for CDMA systems, block-based demodulation, and linear block error control code decoding [3]. Other potential applications include speech recognition, data compression, protein sequence exploration, and neural signal detection. Sphere decoding algorithm is a multi-dimensional signal processing task dealing with vector and matrix arithmetic. The required computation involves hundreds of add and multiply operations, and may also need divide and trigonometric functions. Such a high complexity limits the system specifications such as antenna array size and 2

3 modulations. In addition, prior work focused only on solutions with fixed number of antennas or fixed modulations [16][17][19][21][22][24]. In this work, we evaluate the architectures proposed in prior work and advance state-of-the-art in the area of multidimensional matrix-based signal processing hardware. A number of signal processing techniques [23] are considered jointly with the technology parameters to greatly reduce hardware area (cost) and power while maximizing the performance. This work develops an architecture that further simplifies sphere decoding implementation by jointly considering tradeoffs at the algorithm, architecture, and circuit layers of abstraction, with the goal of minimizing chip power and area. At the same time, additional degrees of freedom are considered in the design in order to take full advantage of the diversity and spatial multiplexing gains available in MIMO wireless channels [5]. Tuning over a range of diversity-multiplexing points is possible by varying antenna array size and modulation scheme, for example. Flexibility and scalability are, thus, key additional requirements in the design of multi-mode, multi-standard systems. Also, our work uses the Matlab/Simulink framework to improve design productivity in mapping of DSP algorithms onto silicon. BEE2 platform [38] is used to verify system functionality before entering physical ASIC design. This proposal is organized as follows. Section II reviews the fundamental diversity-multiplexing tradeoff in MIMO communications and describes sphere decoding algorithm. Several signal processing techniques, evaluated in power-area-performance space, and architecture details are presented in Section III. Section IV describes the Simulink design environment and BEE2 emulation platform. Conclusions are summed up in Section V. Finally, Section VI proposes future work and the timeline. II. ALGORITHM SPACE EXPLORATION A MIMO system can improve the reliability of a wireless link through increased diversity or improve the channel capacity through spatial multiplexing. Diversity gain and spatial multiplexing gain are related to system coverage range and data rate, respectively. Both gains can be improved using a larger antenna array. However, given a MIMO system, there is a fundamental trade-off between these two gains [4] [5]. In the diversity-multiplexing space, repetition code, Alamouti code, and space-time code use data redundancy to increase diversity at the price of losing spatial multiplexing gain. In contrast, Bell Labs Layered Space Time (BLAST) algorithm, Singular Value Decomposition (SVD), and QR decomposition allocate data-streams 3

4 Diversity gain (range) in different eigen-modes to maximize spatial multiplexing gain while sacrificing diversity gain, as shown in Fig. 1. Sphere decoding is a decoding scheme that can extract both diversity and multiplexing gains. With flexibility in coding and modulation, sphere decoder can effectively explore the entire tradeoff curve as shown in Fig. 1. The original data type for sphere decoding is uncoded data. By manipulation of input data, sphere decoding is capable of decoding space-time block codes (STBC), which improves the error probability and increases diversity gain. The data rate can be maximized by transmitting different modulations over different MIMO substreams to increase spatial multiplexing gain. Also, with proper preprocessing, the decoding process starts from decoding the symbols with highest SNR first, and then canceling the effect of the decoded symbols for remaining symbols until the final symbol is decoded. This decoding sequence is equivalent to that in BLAST [41]. A unified sphere decoder model is illustrated in the following section. Repetition Alamouti Space-time Sphere decoding array size array size BLAST SVD QR Spatial multiplexing gain (rate) Fig. 1. Diversity-Multiplexing tradeoff in MIMO communications. A. Sphere Decoding Algorithm Consider a multiple antenna system with M transmitter antennas and N receive antennas. The received vector y can be represented by y Hs n (1) where y is an N 1 vector of received symbols, and H denotes an N M channel matrix whose elements are i.i.d. complex Gaussian with zero mean and unit variance. Vectors s and n (M 1 and N 1 respectively) represent the transmitted symbols and zero mean, circularly symmetric white Gaussian noise, respectively. The transmitted vector s Q with the smallest Euclidean distance is selected as ML estimate in (2). The 4

5 channel matrix can be decomposed further using QR factorization; the equivalent ML estimate thus can be written as sˆ arg min y Hs with 2 yˆ Q ˆ 2 arg min y Rs (2) H y Rs ZF where Q is a unitary matrix, R is an upper triangular matrix, and s ZF (H is the zero-forcing (unconstrained ML) estimate. The signal model is presented in Fig. 2. H H) 1 H H y n s H y Q H ^ y Sphere Decoder s^ Channel TX H=QR RX ^ min ^ s=arg y -Rs 2 Fig. 2. Signal model of sphere decoding algorithm. The most commonly used methods for QR decomposition are Grahm-Schmidt decomposition, Householder transformation, and Givens rotations [7]. Several modifications such as division free, or square-root and division reduction methods are proposed to simplify the operation in the original algorithm [45] [46]. For hardware realization, [8] proposed an algorithm suitable for fixed-point implementation and [9] proposed a CORDIC-based triangular systolic array architecture to reduce latency. Under the assumption of block fading channel, QR decomposition is computed at the packet rate. Using the upper triangular nature of R, the symbol decoding begins from the last row and occurs in several steps. The decoded symbols are used for successive decoding steps until all symbols are decoded. This decoding algorithm can be mapped to finding a shortest path (with minimum Euclidean distance) in a tree topology one possible constellation point denotes one node, each row of the R matrix is mapped to each level of the tree whose edges are weighted by channel coefficients. The whole solution space of this tree is equivalent to exhaustive search in the trellis diagram of the original problem; number of total combinations of transmitted symbols is Q M, where Q is the constellation size. By properly choosing a search radius and a search method, the ML solution can be approached by visiting only nodes within a hyper-sphere, rather than performing an 5

6 exhaustive search. This complexity reduction is feasible, because the Euclidean distance is a cumulative sum of square terms. This means that for each node, if its Euclidean distance is larger than the search radius, the corresponding branches are outside the search radius as well. The conceptual view of sphere decoding algorithm is illustrated in Fig. 3. Tree pruning technique makes sphere decoding achieve ML performance with polynomial complexity (highlighted nodes in Fig. 3) rather than exponential complexity (all nodes in Fig. 3) [1]. search radius... constellation size... ant-m ant-2 ant-1 Fig. 3. Concept of sphere decoding. Unlikely nodes and branches are indicated with gray shade. B. Performance Improvements Several simple yet effective methods such as detection ordering, candidate enumeration and search radius setting are applied to improve error performance and/or reduce the complexity the basic sphere decoding [3]. For instance, the sphere decoding algorithm for QAM system as compared to exhaustive search results in over 10 5 times reduction in computational complexity [10]. 1) Detection Ordering: The idea behind detection ordering is to detect symbols with the largest SNR first: to avoid discarding the ML solution, the first decoded symbols should be the most reliable. Various ordering algorithms have been proposed for the preprocessing stage: V-BLAST-ZF ordering, V-BLAST-MMSE, and Norm ordering [3] [25]. Assuming a packet-based wireless communication system, the ordering only needs to be performed once at the beginning of each received frame. 2) Candidate Enumeration: Detection ordering is applied across levels in the tree topology. For each level, the order of constellation point enumeration is another important factor to improve search speed. Schnorr-Euchner (SE) enumeration suggests traversing the constellation candidates according to the cumulative distance increment in an ascending order [2]. Therefore, the first candidate s i for each row is the one with minimum distance between b i and R ii Q i in (3). Finding a good admissible 6

7 solution early means that we can shrink our initial radius early. M M yˆ i Rijs j bi Riis with i bi yi j i j i 1 ˆ R s. (3) ij j 3) Search Radius Setting: One major feature of sphere decoding is the radius shrinking. Once a solution is found with a smaller Euclidean distance, the search radius is updated to this value so that more unlikely branches can be pruned. However, the initial choice of search radius is not easy for sphere decoding, because the choice of search radius influences the complexity of the algorithm. When search radius is too large, a very high number of visited nodes is in the solution space which causes high detection complexity. Conversely, when the search radius is too small, this may result in an empty sphere and no available solution. Based on AWGN model, sum of noise square is central-chi-square distributed with 2M degrees of freedom [11] [47]. Given the channel SNR, the search radius can be decided by solving the probability density function (pdf) with a confidence interval. If channel SNR is unknown, the Euclidean distance of zero-forcing solution can be used as an initial guess. Algorithm with increasing search radius was proposed, which starts the search with a strict search radius first, and expands the search radius if no solution is available within the radius [12] [48]. C. Tradeoff in Diversity-Multiplexing Space A unified sphere decoder architecture is illustrated here for extracting diversity and spatial multiplexing gains along the tradeoff curve. We demonstrate that adding flexibility in varying antenna size and varying modulations is the key features for this purpose. Antenna array size provides an added flexibility to shift the tradeoff curve in the diversity-multiplexing space. In order to maximize diversity gain, we have to supply to the receiver multiple independently faded replicas of the same symbol, so that the error probability is reduced [13] [14]. The data replicas can be sent in space and/or time directions. Since a unified signal model can be developed for these space-time (ST) coding schemes, the same sphere decoder architecture can be used with some data rearrangement. Sphere decoding supporting algebraic ST codes [48], linear dispersion code [49], and space time block code (STBC) [15] were reported in prior work. The ML estimate can be written as sˆ 2 arg min y Bs (4) 7

8 where matrix B depends on code generators and channel matrix. By interpreting B as H in the original signal model, sphere decoding algorithm can be applied. Since the matrix dimension we deal with is changed due to the data rearrangement in the preprocessing stage, the equivalent antenna array size will be changed accordingly. For example, repetition coding by 2 in space domain for an 8 8 system will be transformed into data processing in a 4 4 system (only one half of symbols need to be decoded). This requirement enhances the need for flexibility in antenna array size. Spatial multiplexing gain is characterized by data rate. To maximize spatial multiplexing gain, we should allow data rate to scale with the SNR or assign different data rate to different substreams for a fixed SNR [5][15]. To this end, modulation scheme should be adaptive according to channel condition: a larger constellation is applied to substreams with higher SNR, and a smaller constellation is applied to substreams with lower SNR. In principle, this transmission strategy just uses water-filling in space domain. The system performance perspective, therefore, further motivates the need for adding flexibility in modulation schemes. III. ARCHITECTURE SPACE EXPLORATION The optimal architecture is decided by jointly considering tradeoffs at the algorithm, architecture, and circuit layers of abstraction, with the goal of minimizing chip power and area. As shown in Fig. 4, a layered design approach is adopted to merge algorithm and circuit decisions. An efficient multiplier is proposed to reduce area and delay at the same time. Saving in area directly translates to power reduction since power spent in charging/discharging parasitic capacitances is also reduced. At the processing element (PE) architecture level, we evaluate the existing architectures [16][17][19][21][22][24] and propose a solution with improved area and throughput. Unlike prior work, flexibility is also considered in the design stage. Antenna size, modulation scheme, number of subcarriers, and search method are designed with flexibility and scalability to cover multiple communication scenarios. A multi-core architecture which consists of many PEs ( small cores ) is developed to support the tradeoff between range and data rate at the system architecture level. We finally summarize the flexibility, scalability, and system specification. 8

9 R S1^S S0 1 0 S1&S0_b 0 1 S2 System arch. PE arch Metric calc. neg -1 1 <<2 x4 1 <<1 x8 neg Multiplier Fig. 4. Illustration of layered design approach. A. Numerical Strength Reduction From an algorithm perspective, the complexity of sphere decoding is evaluated by the number of nodes visited in the tree search process. When considered for hardware implementation, decoding algorithms are generally compared in terms of the number of multiplications. Down to the circuit level, the size of multipliers is the key factor to estimating the area, speed, and power of the sphere decoder. We start with simplifying the cost of the multiply operation to reduce hardware complexity. The multiplication is required to calculate Euclidean distance, which is mathematically represented by two equivalent forms, Eqs. (5), (6). sˆ ML 2 arg min R( s s ZF ) (5) arg min Q H y Rs 2 (6) Seemingly, the number of multiplications in Eq. (5) is less than in Eq. (6): one multiplication for Eq. (5) and two multiplications for Eq. (6). Hence, Eq. (5) was most commonly used in prior work [16]-[21] as a baseline for implementation. However, a careful investigation shows that Eq. (6) is a better choice from hardware perspective for at least two reasons. First, we observe that s ZF and Q H y can be pre-computed and, hence, have negligible impact on the total number of operations. Also, computation effort of s ZF is not less than Q H y. Second, the wordlength of s is usually much shorter than s ZF. Separating terms as in Eq. (6) results in multipliers with reduced wordlength. Without loss of generality, the normalized size of a multiplier can be estimated by the product of wordlength of the multiplier and multiplicand. The normalized delay of a multiplier can be estimated by the sum of wordlength of the multiplier and 9

10 multiplicand if an array multiplier is used [39]. The array multiplier approximation works well for first-order comparison purposes. Table 1 summarizes the relative area and delay reduction of a multiplier due to numerical strength reduction in a 64-QAM system, where wordlength (WL) of s is 3 for a real multiplier. We see that the area reduction is at least 50%, and that the delay reduction also reaches 50% for large wordlength inputs. The absolute area difference between these two types of multipliers is amplified by the total number of multiplications in the entire decoding process, which is approximately O(M 3 ). TABLE I AREA AND DELAY REDUCTION DUE TO NUMERICAL STRENGTH REDUCTION WL of s ZF WL of R =12, Area/delay 0.5/ / / /0.63 WL of R =16, Area/delay 0.5/ / / /0.54 The multiplier can be simplified further by taking advantage of some characteristics of communication signal processing: Gray coding and quantization effects. Gray code is a more compact representation in the constellation plane since only odd numbers are used. Conventionally, the number is transformed to 2 s complement representation for the purpose of arithmetic operations. Carefully examining the Gray code representation, the corresponding multiplication can be implemented by simple shift, add and invert operations. The code mapping, the associated operations, and the simplified multiplier are shown in Fig. 5. The neg operator stands for bit-inversion. 1-bit carry-in in 2 s complement can be absorbed as a carry-in (shaded in gray) in the following adders or simply be discarded as a quantization error on LSB, which can be recovered by wordlength optimization. The shifter has no direct area cost apart from routing, while the cost of inverters and multiplexers is relatively low because they are simple operations. Overall, it is possible to implement one complex multiplier with 6 adders + inverters and multiplexers, resulting in a total 40% area reduction compared to traditional approach (area is estimated by Synopsys Design Compiler). This implementation does not imply that we have to force the use of Gray coding in the constellation plane; the Gray coding is only used inside the sphere decoder to simplify metric calculation and candidate enumeration. The decoded symbols can be converted into any arithmetic representation at the sphere decoder outputs. Gray code value operation

11 S1 S0 R neg neg 0 1 <<2 x4 1 0 <<1 x8 S0 S1 S1 S0 S0 S2 Fig. 5. Gray code representation and the simplified multiplier. B. Architecture Tradeoff In the prior work, two major types of tree search methods are reported: depth-first (DF) [23] [24] and K-best [16]-[22]. The depth-first algorithm starts the search from the root of the tree and explores as far as possible along each branch, then it back-traces until a leaf node is found. The K-best algorithm approximates a breadth-first search by keeping only K branches with the smallest partial Euclidean distance (PED) at each level [26], which is similar to the M-algorithm in trellis decoding [27]. The major advantages of DF are that the ML performance can be achieved, and that radius shrinking can be used for tree pruning. On the other hand, the advantages of K-best are its uniform data path and constant throughput. Further examining details, depth-first ensures the ML performance if complete solution space is explored. This might not be feasible in practice, however, because of limited buffer size and processing cycles. This means that some termination schemes should be used and thus ML performance is no longer guaranteed. Since the default input is uncoded data, achieving a sub-optimal performance while keeping constant throughput is more important. Then, space-time codes or error correction codes can be used to improve the performance. The iterative decoding scheme which combines MIMO decoder and error correction code decoder was proven to achieve near-capacity performance [2]. In hardware implementation, depth-first is realized in a folding-like architecture because only one node is visited at a time during the tree search process. In this case, an extra memory to record the visited nodes is required, for the trace-back operation. K-best is realized in a multi-stage pipelined way, because no trace-back is needed. To process K data paths at the same time, parallel architecture is applied. Figure 6 illustrates the basic architectures of these two search schemes, and Table 2 summarizes their comparison in terms of circuit metrics and algorithmic performance. 11

12 PE PE 1 PE 2... PE M (a) Depth-first (folding) (b) K-best (parallel and multi-stage) Fig. 6. Basic architecture of (a) depth-first and (b) K-Best algorithm. TABLE II COMPARISON OF DEPTH-FIRST AND K-BEST ALGORITHM Area Throughput Latency Radius Shrinking /Tree Pruning Performance Depth-first Small variable long Yes ML K-best large Constant short No Near-ML For the sphere decoder operating with a large antenna array, the biggest challenge in the implementation is reducing area of the design. Using the number of (complex) multipliers as a first order area estimate, the number of multipliers needed in the folding and multi-stage architectures are M and M(M+1)/2, respectively, where M is the number of transmit antennas. Expanding a 4 4 system to a system, relative area increases from 4 to 16 for the folding architecture and 10 to 136 for the multi-stage architecture. The folding architecture is 2.5 to 8.5 more area efficient compared to the multi-stage architecture, as shown in Fig. 7 (a). To keep the area within a reasonable value, folding technique is considered. The second design challenge is operating frequency for the folded architecture. As the array size increases, the number of operands in the Multiply-Accumulate (MAC) operation in the metric calculation unit increases proportionally to the number of antennas. Assuming a tree adder design, the critical path delay roughly increases linearly with the number of transmit antennas. However, the time required to finish the MAC operation should be scaled down by the number of antennas in order to increase the throughput proportionally to the number of antennas. This timing requirement for a fixed bandwidth is shown in Fig. 7 (b). The situation is actually worse when metric enumeration is included in the loop. Since pipelining in the loop is considered a difficult task, this architecture can not operate at a high frequency even for a 4 4 system [24]. To facilitate pipeline insertion, inputs are up-sampled by a factor m, and then one register can be replaced with m pipeline registers in the loop using Noble Identity [42]. In this case, only one out of m samples is useful data, and the rest could be repeating 12

13 values of an original sample or padding zeros. By applying data-stream interleaving, samples of other independent data streams can be introduced in the loop in place of the repeated values or padding zeros. Technique of interleaving is therefore used to improve area efficiency through logic sharing and to provide flexibility needed to support varying number of data sub-carriers. In a multi-carrier communication system, data streams are transmitted over narrow-band sub-carriers [28]. multi-stage critical path in the loop Area x2.5 x3.5 folding x8.5 Delay timing requirement Timing gap 4x4 8x8 Antenna array size 16x16 4x4 8x8 Antenna array size 16x16 (a) area reduction using folding technique (b) growing timing gap in folding architecture Fig. 7. Design challenge and tradeoff for large antenna size. Impact of antenna array size on (a) area and (b) critical path delay. C. PE structure The function of the PE is to find the s i with minimum T i ( T i b R s ) for each level in the tree search, and to provide a candidate list with T i in a descending order since a path with smaller T i means a higher probability to be the ML estimate. A straightforward algorithm mapping is to enumerate all possible constellations and sort the T i to find the s i and the candidate list [24]. The hardware cost and computational latency of this architecture is very high for a large constellation size due to the circuit parallelism and inevitable latency of the sorting circuit. To resolve this problem, we propose another strategy: first, the closest point is found through the geometric relationship since the s i with minimum T i stands for the closest point between b i and R ii Q i. The second step is to use the selected s i to calculate T i. Finally, the candidate list is generated by the constellation arrangement, as described in Section III-C-2, Fig. 12. i ii i We decompose the PE into two parts: Metric Calculation Unit (MCU) and Metric Enumeration Unit (MEU). Each submodule can be mapped to Area-Energy-Delay space to explore optimal design parameters for the top-level integration. Decomposing a design problem along these three axes provides insight into design techniques and their impact on power, area, and throughput. Concurrency versus latency is one of the basic tradeoffs that need to be considered. Maximizing data 13

14 throughput calls for a parallel architecture, which results in a large area. Conversely, time-multiplexing improves area efficiency, but increases latency. For example, the decoding algorithm operating on complex numbers can be transformed into a real-valued problem, which results in a tree that is twice as deep as the original tree with a smaller number of children per node [16]. Since the multipliers are reused, the number of multipliers is reduced to one half at the cost of equal throughput reduction. Flexibility is another issue in circuit design. Ideally, the circuit should be flexible to support different search schemes (Depth-first or K-best). In general, the overhead of flexibility results in reduction of both energy efficiency and area efficiency. This overhead should be minimized while maintaining system performance. Fig. 8 shows the circuit diagram of one PE. There are m-stage pipeline registers inserted in the loop, so the critical path can be shortened under the timing constraint by choosing a larger m. Since m data streams are interleaved into the PE, the hardware always keeps active, creating the maximum throughput as if the m pipeline registers are introduced without the loop. The area overhead of the up-samplers for R can be removed if R is invariant for each sub-carrier during one packet transmission. The flexibility of search scheme is provided by the shift-register chain, which can be configured as forward trace or backward trace. By placing K PEs onto one sphere decoder, K search paths are explored at the same time to implement K-best algorithm, while each PE has flexibility to trace back as Depth-first. The flexibility to support varying antenna size is provided by the folding architecture. It reuses the same hardware with a higher clock frequency as the antenna size increases. The details of sub-modules are illustrated in the following. MCU shift-register chain s^ R m... partial product... adder tree... m stages y ^ i m sub sub MEU R ii b i Symbol selection 2 T i Fig. 8. Circuit diagram of one PE. 14

15 1) Metric Calculation: Metric Calculation Unit (MCU) computes R ij s j. Basically, it executes a Multiply-Accumulate (MAC) operation. To accumulate the maximal 16 operands and achieve the highest throughput, there are 15 simplified complex + 1 simplified real multipliers followed by an adder tree that merges the partial products. It is possible to reduce the number of multipliers in a time-multiplexing manner at the price of lower throughput [30]. For example, 4 complex multipliers can be time-multiplexed by 4 to deploy 16 multipliers, with throughput reduced by 4. Such tuning at the architecture level is used to position the design along throughput and power axis, with optimal tuning of variables such as supply voltage. M j i 1 Since the search process advances one stage per clock cycle, we propose an FIR-like architecture to facilitate metric calculation, as shown in Fig. 8. If only forward trace is allowed, the BER performance is limited by the number of parallel processors such as in K-best algorithm. Even though more processing cycles are provided, there is no room to improve the BER performance for K-best algorithm. By observing that the trace-back goes back up by only one or two layers instead of a random jump, a bidirectional shift register chain is embedded to adjust the search depth. Since the search state is recorded in the shift registers, no extra memory, such as stack memory, is needed to keep all the states [26] [40]. Due to the trace-back requirement, transpose form FIR architecture is not suitable to reduce the critical path, but the critical path is reduced by data-interleaving. s i+1 s i+2 s M s i... R i,i R i,i+1 R i,i+2 R i,m... adder tree Fig. 8. Circuit diagram of MCU. Coefficients of R matrix are stored in memory in an area efficient way. The diagonal terms of R matrix are real, while the rest are complex numbers. Using the upper triangular nature, the Real part diagonal and the Imaginary part triangular data are organized into a square memory, which saves around 50% of area. 15

16 ) Metric Enumeration: The Metric Enumeration Unit (MEU) enumerates the possible constellation points according to their Partial Euclidean Distance (PED) ( T i j M j 2 ) in an ascending order. Exhaustive search is a straightforward implementation; it calculates the PEDs of all constellation points and uses a sorting circuit to find the minimum one, as shown in Fig. 10 (a). The number of distance calculation units is proportional to the constellation size (64 units are required for 64-QAM, for example). This requirement in itself makes hard to support a large constellation size, in addition to the extra latency introduced by the minimum search circuit. In the constellation plane, metric enumeration is equal to finding the points closest to b i and scaling constellation points R ii Q i from the closest to the farthest [2]. This is the underlying principle of Schnorr-Euchner (SE) algorithm. The SE enumeration is originally applied to one dimensional signal, such as real valued PAM or PSK constellation; therefore it was modified to arrange QAM constellations in P Q concentric groups to fit the original algorithm. For example, 16-QAM constellation can be expressed as an arrangement of points in 3 concentric circles. Then the problem is reformulated to find the closest point in each subgroup and find the closest point over subgroup, as shown in Fig. 10 (b) [24]. R iiq 1 sub 2 PSK ALU 1 R iiq 2 b i R iiq k sub sub 2 2 min-search ^ s i b i PSK ALU 2 PSK ALU P Q min-search ^ s i (a) exhaustive search real part (b) SE enumeration for QAM Q R ii Region decision b i R ii Region decision ^ s i decision boundary b i I R iiq i imag. part (c) region partition search Fig. 10. Closest point selection scheme: (a) exhaustive search architecture, (b) SE enumeration for QAM, (c) region partition based search approach. Real value is represented by gray line. The original algorithm [2] uses phase relationship to find the closet point in a concentric circle. This approach is not suitable for hardware implementation, so [24] 16

17 proposed a decision boundary based method to simplify the SE enumeration. One type of decision boundary is set by straight lines passing through the origin and the middle point between two adjacent constellation points in a concentric circle, to specify the starting point. Another type of decision boundary is set by straight lines passing through the origin and the middle point between two constellation points around the starting point in a concentric circle, to determine the initial search direction. However, this simplification is only applicable to small size constellations such as 16-QAM. Larger constellation sizes are hard to support for several reasons. First, the decision boundary algorithm is quite complex many multiplications are needed to generate the decision boundaries. Second, the number of subgroup grows quickly, which increases the latency of the min-search circuit. For example, 64-QAM is decomposed into 9 subgroups. Third, the concentric group partition is scalable as QAM constellation size changes, thereby making the architecture infeasible to support different modulations. We propose a simple partition method based on Cartesian coordinates. The constellation plane is partitioned into 64 regions for 64-QAM (8 regions in the Real part and 8 regions in the Imaginary part). The closest point (with minimum distance) can be decided by the location of b i /R ii since real part and imaginary part can be decoded separately, as shown in Fig. 9 (c). In fact, this idea is also applied to symbol decision. For instance, to make a decision on a QPSK system, we do not need to calculate the distances from received signal to 4 constellation points. Instead, we just need to examine the sign of real and imaginary parts. The area complexity of the three architectures in Fig. 9 is evaluated using the number of add-equivalent operators (add, subtract, compare) as area estimation. For 64-QAM constellation, exhaustive search needs 64 subtractors, 64 square operators, and a min-search circuit. Assuming the square operators are simplified to absolute operators with a little performance loss [24] and that min-search uses a serial comparison circuit, total 192 adder equivalent operators are need. SE enumeration for 64-QAM needs 64 boundary decision comparisons and min-search across 9 subgroups, so 73 add-equivalent operators are need assuming the boundary is given. The proposed region partition search needs 8 comparators for real part and 8 for imaginary part, which is only 16 add-equivalent operators. Therefore, 4.6 area reduction is achieved compared to SE enumeration for 64-QAM and 32 compared to exhaustive search. Similar concept is applied in delay comparison: the number of adder delays is 17

18 used as delay estimation metric. Here, we assume the delay of min-search circuit is equal to log 2 n, where n is the number of sorting elements. However, a serial comparison circuit needs n adder delays to finish the comparison, so a more area consuming parallel architecture should be used to reduce the delay. The delay of exhaustive search is approximated by the sum of delay of 1 adder, 1 absolute, and log 2 64, which is equal to 8. Delay of SE enumeration is equal to 1 operators plus log 2 9 = 5. Our design needs only 1 comparator, which is 1/5 the delay of the SE enumeration without pipelining. TABLE III AREA AND DELAY COMPARISON Exhaustive SE enumeration Our work Area (normalized) Delay (normalized) One challenge in the MEU implementation is that a divider or an inverse operator seems inevitable to calculate b i /R ii, which usually introduces a longer latency and higher hardware complexity. The property that diagonal element R ii of R matrix is real simplifies the problem, but still introduces hardware overhead. One possible method is to calculate R in the preprocessing stage, since these values are updated at a 1 ii packet rate [16] [21]. If 1 R ii is given, only one multiplier is needed. In our approach, we can demonstrate that this inverse operation is not necessary. Instead of deciding 1 b i R ii in the constellation plane, it is equivalent to deciding b i in a constellation plane scaled by R ii. The decision boundary (db) is denoted as 1 db { 6, 4, 2, 0, 2, 4, 6}, then we simplify b i R ii to db Rii calculation. It may seem that replacing one multiplier with 6 multipliers in order to execute the boundary comparison in a parallel way may not be a good tradeoff from the area standpoint. However, a careful examination reveals a large multiplier is replaced with small multipliers, and that these small multipliers can be simplified as shift-add operators. Therefore, only one adder is needed to implement db R ( 6 Rii 4 Rii 2 Rii ); others can be implemented by hard-wired shifting and inversion. The negative value can be computed by bit-inversion without the carry-in bit, because carry bit appears as negligible quantization error from the signal decision perspective. The area reduction is quite high. If the wordlength of R is L, then the multiply operation with large 1 ii WL L WL( b )] is replaced with add operation which also has smaller number of [ i bits [ L 3]. The simple region decision circuit is shown in Fig. 11. ii 18

19 7 Real{b i } /imag{b i } >6R ii >4R ii >2R ii >0 >-2R ii >-4R ii >-6R ii R ii Sign constellation size Symbol remapping s[2] s[1] s[0] Fig. 11. Region decision circuit. An extra symbol remapping block is inserted at the end to remap constellation points if different constellation size is used. Decision outputs are mapped to Gray code directly without extra 2 s complement representation and Gray code transformation. Table 4 shows the mapping rules. Although R ii can be chosen always positive to simplify this circuit further, we leave the flexibility of supporting negative value as well in order to relax QR decomposition processing. With the proposed approach, no sorting is needed and it is easy to expand to a large constellation size. Additionally, the use of bit-level arithmetic results in only linear complexity increase as the constellation size grows exponentially. s[1:0] TABLE IV SYMBOL REMAPPING AND DECISION 64-QAM 16-QAM QPSK/BPSK real imag s[2] s[1] s[0] s[2] s[1] s[0] 64-QAM (6-bit) 16-QAM (4-bit) QPSK (2-bit) BPSK (1-bit) After finding the closest point, remaining candidates are also decided by the distance between b i and constellation points in an ascending order. The decoded symbol s i is used to enumerate remaining candidates through geometric relationship rather than sorting either in trace-back or parallel search mode. The complexity of the 19

20 sphere decoding algorithm is independent of the lattice constellation size [48]; therefore, we can enumerate the adjacent possible constellation points instead of the whole constellation plane. We extract 9 points in the constellation plane as illustrated in Fig. 12. Eight surrounding constellation points have either 1-bit error (Fig. 12 (a-b)) or 2-bit errors (Fig. 12 (c-d)) if Gray coding is used. The 2 nd closest point for each solution set is decided based on decision boundaries indicated by the dashed lines in Fig. 12 (a), (c). The remaining points are decided by the search direction, which is specified by other decision boundaries, starting from the 2 nd point, as shown in Fig. 12 (b), (d). These two decision boundaries are easy to calculate by sign check and comparison for {Re} and {Im}. The search sequence of each group is well-defined, but the boundary between these two groups is not easy to calculate. For example, which 3 rd search point in these two groups, Fig. 12 (b) and (d), is the closer point can not be decided by a simple boundary. Therefore, we adopt a mixed method: the two solution sets are compared to find the final enumeration sequence with respect to the central point. (a) 1 bit error subset #1 #2 (b) # # # #3 R ii s i b i (c) (d) # # bit errors subset # real part Imag. part #2 #5 #2 2 nd closest point 3 rd to 5 th points Fig. 12. Candidate enumeration scheme. Decision boundaries are dashed lines in the central region. Fig. 13 shows the overall area reduction for one PE. An overall 20 area reduction is achieved through various signal processing and circuit techniques, from arithmetic stage down to circuit stage. If 16 sub-carriers are processed through data-stream interleaving, the equivalent area reduction would be more than 260 times. So far, we have built a one-pe sphere decoder. To speed up the search and improve error probability, multiple PEs need to be utilized to span the search range. A multi-core architecture is proposed to cooperate all the functional blocks in a power and area efficient way. 20

21 x8.5 Area 30% 20% 5% total 20x reduction 20% initial folding MEU simplfication simplified multiplier memory reduction wordlengh reduction Fig. 13. Summary of area reduction for one PE. D. Multi-Core Architecture Multiple-PE architecture inherently improves the search speed by the number of processing elements. However, the search speed is further increased since the shorter paths can be found earlier thereby pruning the tree more efficiently. In addition, the number of processing elements offers the flexibility to trade performance with area. Virtually all K-best architectures use parallelism to search several branches at the same time [16]-[22], but they do not take advantage of the important features of sphere decoding radius shrinking and tree pruning. When the search paths run outside the search radius, they should be discarded instead of continuing with a deeper search. Intuitively, we should assign a new search branch within the search radius to the processors whose search paths outside the search radius. To maximize the probability of finding the ML estimate, the children of the branch with smaller Euclidean distance for that level are assigned as the new search candidates. Therefore, the functions needed include: (1) sorting circuit to record the branch with minimum Euclidean distance, (2) radius checking block to examine if the Euclidean distance is larger than the search radius, and (3) candidate enumeration circuit, illustrated in Fig. 12. Since the radius checking block is included in the sphere decoder, one of the many algorithms for effective radius shrinking can be utilized [2] [3] [10] [12]. 1) Sorting Circuit: Sorting algorithms are extensively studied in computer science. In hardware, several architectures are widely used: serial sorting, parallel sorting (Batcher sorter) and Single Instruction Multiple Data (SIMD) architecture [16][33-36]. Serial sorter executes the bubble sorting algorithm [16]. The serial comparison nature results in a longer latency. Parallel sorter is widely used in packet switch networks sorter, which 21

22 makes use of parallelism to speed up sorting at the cost of increased area. SIMD provides the largest flexibility, but its interconnect network is very complex. A comparison of these architectures is summarized in Table V. For N inputs, n log 2 N. Latency and Area are estimated as the number of comparator delays and the number of comparators, respectively. TABLE V SUMMARY OF SORTING CIRCUITS Serial Parallel SIMD Latency N n(n+1)/2 n(n+1)/2 Area N/2 (n 2 +n)n/4 N/2 Routing complexity Low Medium High Area is the first priority in the design of sorting circuit, because the sorting circuit needs to be replicated to support multiple sub-carriers. Leveraging the data-interleaving operation, N 1 time slots are available for additional sub-carriers, which makes serial comparison possible within a symbol period. Therefore, serial sorter is selected in our design. Since the first input is loaded into the register of the first stage, the latency is N 1 cycles (one cycle saved). Fig. 14 shows the circuit of serial sorter. For each comparator, the larger operand is sent to the lower branch and the smaller one is sent to the upper branch. The final sorted Euclidean distance from each PE can be used for outer receiver for iterative decoding. L H L H... compare compare compare L H stage 1 stage 2 stage M/2 Fig. 14. Circuit diagram of a serial sorter. 2) Radius Checking: Radius checking is executed with parallel sorting. Euclidean distances stored in all PEs are checked serially. If the Euclidean distance is larger than the search radius, a new search path is assigned. On the other hand, if the Euclidean distance is smaller than the search radius, then the search radius is updated to this smaller value and the corresponding branch is chosen as the ML estimate. A multi-core architecture is proposed to coordinate all functional blocks. The number of PEs are decided from BER-are-power tradeoff. A 16-PE architecture is shown in Fig. 15. For each PE, the decoded symbols and the associated Euclidean 22

23 distance for 16 sub-carriers are fed into registers serially after processing. For each cycle, only the metrics of one sub-carrier are computed, while other sub-carriers conduct sorting, radius checking, and candidate enumeration across PEs. A sorting circuit connecting 16 registers belonging to the same sub-carrier is embedded. Radius checking is conducted serially using a multiplexer, and followed by a new path assignment conditionally. PE SC-1 SC-2 SC-3 SC-4 Sub-carrier space PE-1 PE-2 PE-3 PE-4 SC-13 SC-14 SC-15 SC-16 Demux MCU Memory MEU SC-5 SC-6 SC-7 SC-8 I/O Interface PE-16 PE-1 PE-13 PE-12 PE-2 PE-3 PE-4 PE-15 PE-6 PE-5 PE-14 PE-7 PE-8 PE-11 PE-10 PE-9 PE-13 PE-14 PE-15 PE-16 Mux radius checking and updating PE-5 PE-6 PE-7 PE-8 SC-12 SC-11 SC-10 SC-9 PE-12 PE-11 PE-10 PE-9 Fig. 15. Multi-PE sphere decoder architecture. With this compact multi-pe architecture, the sphere decoder provides a very high performance. At 256MHz, each PE provides 46.5GOPS (12-bit equivalent add), and total operations for 16-PE architecture amount up to 800GOPS (including sorting and radius checking circuits) for the whole system when operating at the QAM mode. In addition to high performance, flexibility and scalability are also included. We illustrate the design specifications next. E. Design Specifications The sphere decoder is designed to support different system configurations with respect to antenna array size, modulation and detection schemes, as well as the number of sub-carriers. Table 6 summarizes the configuration modes. Since varying antenna array size and modulation are supported, this design is also capable of trading off diversity gain for spatial-multiplexing if STBC is used. Due to interleaving by 16, the supported number of sub-carriers can be a multiple of 16 through data rearrangement. TABLE VI OVERVIEW OF SYSTEM CONFIGURATION MODES Configuration Modes Antenna array size Any square matrix # b/w Modulation BPSK, QPSK, 16-QAM, 64-QAM # sub-carriers 16, 32, 64, 128 Detection Depth-first, K-best 23

24 Main design specification is the throughput constraint for the algorithm. Since total 16 MHz bandwidth is used, each sub-channel requires 1MS/s to process the data in the case of 16 sub-carriers. The requirement is thus to process 16 parallel streams of data at a 1MHz rate. Clock specification for the resulting architecture then becomes 256 MHz (1MHz 16 sub-carriers 16 antennas). Notice the clock frequency of all modes can achieve 256MHz. The clock frequency for smaller array size is reduced due to a fixed channel bandwidth. Detailed system specifications are listed in Table 7 for array size 4 4 to We see the system supports ideal throughput up to 1.536Gbps, which results in a spectral efficiency up to 96 bps/hz. When the system is operated at a smaller array mode, clock frequency and supply voltage can be reduced to minimize power consumption. TABLE VII SUMMARY OF SYSTEM SPECIFICATION Antenna array Modulation QPSK 16-QAM 64-QAM QPSK 16-QAM 64-QAM QPSK 16-QAM 64-QAM BW (baseline) 16 MHz Clock freq. 64MHz 128MHz 256MHz Throughput (bps) 128M 256M 384M 256M 512M 768M 512M 1.024G 1.536G Spectral Efficiency (bps/hz) A comparison of hardware is illustrated in Table 8. The estimated chip area is 0.55 mm 2 in a standard 90 nm CMOS process using the approximation of 10,000 FPGA slices 1 mm 2 layout area in 90 nm CMOS [28]. To make a fair comparison, the area is normalized by the number of transmit antennas (this is a conservative estimate, because the hardware complexity could grow quadratically with the number of transmit antennas). The data indicates that the proposed architecture is the most area efficient compared to prior work. Furthermore, our design outperforms all previously published designs in terms of supported antenna array size and constellation size, as shown in Fig. 16. Unlike previous work, the proposed architecture also supports multiple sub-carriers and search methods. Finally, this is the first design that offers the flexibility required to fully traverse the diversity vs. spatial-multiplexing tradeoff curve. TABLE VIII HARDWARE COMPLEXITY COMPARISON [19] [17] [21] [22] [24] This work 500k 10 mm k 50k Area * GC (0.18um) slices GC GC Area (norm.) *154k gate count (GC), 0.55 mm 2 (90nm), or 5.5 slices 24

25 Antenna array size 16x16 8x8 4x4 [19] This work [17][21][22][24] BPSK QPSK 16QAM 64QAM Modulation Fig. 16. Comparison of this work with previous work. IV. DESIGN METHODOLOGY An integrated design methodology is adopted in our work to incorporate algorithm, architecture, and circuit implementation in a highly automated environment. Since the design is complex, we start with a layered design approach which decomposes the whole system from the top architecture down to the fundamental modules hierarchically. Different considerations such as area and throughput are evaluated at each layer for architecture optimization. A graphical Simulink/Matlab development environment offers bit-true, cycle-by-cycle hardware equivalent modules for simulation, and then translates to FPGA emulation without hardware description language (HDL) coding. Due to the limited capacity of single FPGA, BEE2 platform [38] is used to accommodate the whole system and speed up emulation. A. Simulink-Based Design Environment We use Simulink/Matlab design environment [44]. Traditionally, circuit design for communication signal processing is divided into two stages: algorithm design and circuit implementation. Algorithm designers use C/C++ or Matlab for system simulation, and then the designed architecture is implemented by circuit designers using HDL. There are usually several iterations between two design stages to ensure the final design satisfies the specifications. In this work, Xilinx System Generator (XSG) block-sets are used to build hardware equivalent modules, which leverages cycle-accurate software simulation. In addition, quantization effects due to finite wordlength are considered in the simulation. Area information is extracted by resource estimator (XSG) or design compiler (Synopsys) in terms of number of slices or area in the early design stage, since the equivalent HDL can be generated automatically. The drawback of simulink-based design flow is its lengthy simulation time, which can be mitigated by FPGA-based hardware emulation [43]. 25

26 B. Emulation Platform FPGA-based hardware emulation and rapid prototyping have become an attractive solution, which can provide up to 10 6 times faster simulation speed than software simulation [37]. Xilinx University Program (XUP) board (Virtex-2 Pro 30 part) [50] is used to develop the hardware/software cosimulation environment for small circuits. In this case, the hardware modules built in the Simulink is replaced with the configured FPGA to speed up simulation. Due to the limited capacity of XUP board, BEE2 platform is used for whole system emulation. The BEE2 consists of 5 Vertex-2 Pro 70 FPGAs (~10M equivalent logic gates total). Each FPGA embeds a PowerPC core which minimizes the latency between the microprocessor and reconfigurable logic. Four FPGAs (user FPGA) are used for computation and one for control (control FPGA) as shown in Fig. 17. With high speed bandwidth, low latency links, BEE2 provides a virtual single FPGA of five times the capacity [38]. User FPGA-1 User FPGA-2 Ctrl FPGA User FPGA-4 User FPGA-3 Fig. 17. BEE2 emulation platform. C. Simulation Results The BER performance of one PE is verified through the hardware/software co-simulation environment. In this preliminary experiment, ZF-DFE/BLAST algorithm is adopted, i.e. for each level of the search tree topology, only the closest lattice point is chosen as the decoded symbol [41] [49]. Since only a small portion of the solution space is examined, there exists a performance gap between this scheme and ML solution. However, we demonstrate a system with a larger antenna array and repetition coding can outstrip the ML performance with a smaller antenna array easily. The BER performance can be further improved to achieve ML performance without repetition coding by using multiple PEs, which is being designed. Fig. 18 (a) shows the BER performance of 64-QAM modulation for different 26

ELEC E7210: Communication Theory. Lecture 11: MIMO Systems and Space-time Communications

ELEC E7210: Communication Theory. Lecture 11: MIMO Systems and Space-time Communications ELEC E7210: Communication Theory Lecture 11: MIMO Systems and Space-time Communications Overview of the last lecture MIMO systems -parallel decomposition; - beamforming; - MIMO channel capacity MIMO Key

More information

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei The Case for Optimum Detection Algorithms in MIMO Wireless Systems Helmut Bölcskei joint work with A. Burg, C. Studer, and M. Borgmann ETH Zurich Data rates in wireless double every 18 months throughput

More information

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver Vadim Smolyakov 1, Dimpesh Patel 1, Mahdi Shabany 1,2, P. Glenn Gulak 1 The Edward S. Rogers

More information

Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems

Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems Markus Myllylä University of Oulu, Centre for Wireless Communications markus.myllyla@ee.oulu.fi Outline Introduction

More information

MODIFIED K-BEST DETECTION ALGORITHM FOR MIMO SYSTEMS

MODIFIED K-BEST DETECTION ALGORITHM FOR MIMO SYSTEMS VOL. 10, NO. 5, MARCH 015 ISSN 1819-6608 006-015 Asian Research Publishing Network (ARPN). All rights reserved. MODIFIED K-BES DEECION ALGORIHM FOR MIMO SYSEMS Shirly Edward A. and Malarvizhi S. Department

More information

MIMO Systems and Applications

MIMO Systems and Applications MIMO Systems and Applications Mário Marques da Silva marques.silva@ieee.org 1 Outline Introduction System Characterization for MIMO types Space-Time Block Coding (open loop) Selective Transmit Diversity

More information

A Sphere Decoding Algorithm for MIMO

A Sphere Decoding Algorithm for MIMO A Sphere Decoding Algorithm for MIMO Jay D Thakar Electronics and Communication Dr. S & S.S Gandhy Government Engg College Surat, INDIA ---------------------------------------------------------------------***-------------------------------------------------------------------

More information

Comparison of MIMO OFDM System with BPSK and QPSK Modulation

Comparison of MIMO OFDM System with BPSK and QPSK Modulation e t International Journal on Emerging Technologies (Special Issue on NCRIET-2015) 6(2): 188-192(2015) ISSN No. (Print) : 0975-8364 ISSN No. (Online) : 2249-3255 Comparison of MIMO OFDM System with BPSK

More information

Disclaimer. Primer. Agenda. previous work at the EIT Department, activities at Ericsson

Disclaimer. Primer. Agenda. previous work at the EIT Department, activities at Ericsson Disclaimer Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder This presentation is based on my previous work at the EIT Department, and is not connected to current

More information

Array Like Runtime Reconfigurable MIMO Detector for n WLAN:A design case study

Array Like Runtime Reconfigurable MIMO Detector for n WLAN:A design case study Array Like Runtime Reconfigurable MIMO Detector for 802.11n WLAN:A design case study Pankaj Bhagawat Rajballav Dash Gwan Choi Texas A&M University-CollegeStation Outline Background MIMO Detection as a

More information

Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder. Matthias Kamuf,

Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder. Matthias Kamuf, Know your Algorithm! Architectural Trade-offs in the Implementation of a Viterbi Decoder Matthias Kamuf, 2009-12-08 Agenda Quick primer on communication and coding The Viterbi algorithm Observations to

More information

Reduced Complexity by Incorporating Sphere Decoder with MIMO STBC HARQ Systems

Reduced Complexity by Incorporating Sphere Decoder with MIMO STBC HARQ Systems I J C T A, 9(34) 2016, pp. 417-421 International Science Press Reduced Complexity by Incorporating Sphere Decoder with MIMO STBC HARQ Systems B. Priyalakshmi #1 and S. Murugaveni #2 ABSTRACT The objective

More information

Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection

Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection Kenichi Higuchi (1) and Hidekazu Taoka (2) (1) Tokyo University of Science (2)

More information

Comparative Study of the detection algorithms in MIMO

Comparative Study of the detection algorithms in MIMO Comparative Study of the detection algorithms in MIMO Ammu.I, Deepa.R. Department of Electronics and Communication, Amrita Vishwa Vidyapeedam,Ettimadai, Coimbatore, India. Abstract- Wireless communication

More information

Iterative Soft Decision Based Complex K-best MIMO Decoder

Iterative Soft Decision Based Complex K-best MIMO Decoder Iterative Soft Decision Based Complex K-best MIMO Decoder Mehnaz Rahman Department of ECE Texas A&M University College Station, Tx- 77840, USA Gwan S. Choi Department of ECE Texas A&M University College

More information

IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION

IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION Jigyasha Shrivastava, Sanjay Khadagade, and Sumit Gupta Department of Electronics and Communications Engineering, Oriental College of

More information

Wireless Communication Systems: Implementation perspective

Wireless Communication Systems: Implementation perspective Wireless Communication Systems: Implementation perspective Course aims To provide an introduction to wireless communications models with an emphasis on real-life systems To investigate a major wireless

More information

CHAPTER 5 DIVERSITY. Xijun Wang

CHAPTER 5 DIVERSITY. Xijun Wang CHAPTER 5 DIVERSITY Xijun Wang WEEKLY READING 1. Goldsmith, Wireless Communications, Chapters 7 2. Tse, Fundamentals of Wireless Communication, Chapter 3 2 FADING HURTS THE RELIABILITY n The detection

More information

ABSTRACT. MIMO (Multi-Input Multi-Output) wireless systems have been widely used in nextgeneration

ABSTRACT. MIMO (Multi-Input Multi-Output) wireless systems have been widely used in nextgeneration ABSTRACT NARIMAN MOEZZI MADANI. Efficient Implementation of MIMO Detectors for Emerging Wireless Communication Standards. (Under the direction of Professor W. Rhett Davis). MIMO (Multi-Input Multi-Output)

More information

Multiple Input Multiple Output (MIMO) Operation Principles

Multiple Input Multiple Output (MIMO) Operation Principles Afriyie Abraham Kwabena Multiple Input Multiple Output (MIMO) Operation Principles Helsinki Metropolia University of Applied Sciences Bachlor of Engineering Information Technology Thesis June 0 Abstract

More information

Research Article Application-Specific Instruction Set Processor Implementation of List Sphere Detector

Research Article Application-Specific Instruction Set Processor Implementation of List Sphere Detector Hindawi Publishing Corporation EURASIP Journal on Embedded Systems Volume 2007, Article ID 54173, 14 pages doi:10.1155/2007/54173 Research Article Application-Specific Instruction Set Processor Implementation

More information

IN AN MIMO communication system, multiple transmission

IN AN MIMO communication system, multiple transmission 3390 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 55, NO 7, JULY 2007 Precoded FIR and Redundant V-BLAST Systems for Frequency-Selective MIMO Channels Chun-yang Chen, Student Member, IEEE, and P P Vaidyanathan,

More information

Performance Analysis of n Wireless LAN Physical Layer

Performance Analysis of n Wireless LAN Physical Layer 120 1 Performance Analysis of 802.11n Wireless LAN Physical Layer Amr M. Otefa, Namat M. ElBoghdadly, and Essam A. Sourour Abstract In the last few years, we have seen an explosive growth of wireless LAN

More information

K-Best Decoders for 5G+ Wireless Communication

K-Best Decoders for 5G+ Wireless Communication K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Department of Electrical and Computer Engineering Texas A&M University

More information

STUDY OF THE PERFORMANCE OF THE LINEAR AND NON-LINEAR NARROW BAND RECEIVERS FOR 2X2 MIMO SYSTEMS WITH STBC MULTIPLEXING AND ALAMOTI CODING

STUDY OF THE PERFORMANCE OF THE LINEAR AND NON-LINEAR NARROW BAND RECEIVERS FOR 2X2 MIMO SYSTEMS WITH STBC MULTIPLEXING AND ALAMOTI CODING International Journal of Electrical and Electronics Engineering Research Vol.1, Issue 1 (2011) 68-83 TJPRC Pvt. Ltd., STUDY OF THE PERFORMANCE OF THE LINEAR AND NON-LINEAR NARROW BAND RECEIVERS FOR 2X2

More information

Layered Space-Time Codes

Layered Space-Time Codes 6 Layered Space-Time Codes 6.1 Introduction Space-time trellis codes have a potential drawback that the maximum likelihood decoder complexity grows exponentially with the number of bits per symbol, thus

More information

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont.

TSTE17 System Design, CDIO. General project hints. Behavioral Model. General project hints, cont. Lecture 5. Required documents Modulation, cont. TSTE17 System Design, CDIO Lecture 5 1 General project hints 2 Project hints and deadline suggestions Required documents Modulation, cont. Requirement specification Channel coding Design specification

More information

Mehnaz Rahman Gwan S. Choi. K-Best Decoders for 5G+ Wireless Communication

Mehnaz Rahman Gwan S. Choi. K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication Mehnaz

More information

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System

Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 50, NO. 2, FEBRUARY 2002 187 Performance Analysis of Maximum Likelihood Detection in a MIMO Antenna System Xu Zhu Ross D. Murch, Senior Member, IEEE Abstract In

More information

MIMO in 3G STATUS. MIMO for high speed data in 3G systems. Outline. Information theory for wireless channels

MIMO in 3G STATUS. MIMO for high speed data in 3G systems. Outline. Information theory for wireless channels MIMO in G STATUS MIMO for high speed data in G systems Reinaldo Valenzuela Wireless Communications Research Department Bell Laboratories MIMO (multiple antenna technologies) provides higher peak data rates

More information

3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication

3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication 3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication Farhana Sheikh, Chia-Hsiang Chen, Dongmin Yoon, Borislav Alexandrov, Keith Bowman, * Anthony Chun, Hossein Alavi,

More information

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access NTT DoCoMo Technical Journal Vol. 8 No.1 Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access Kenichi Higuchi and Hidekazu Taoka A maximum throughput

More information

Antennas and Propagation. Chapter 6d: Diversity Techniques and Spatial Multiplexing

Antennas and Propagation. Chapter 6d: Diversity Techniques and Spatial Multiplexing Antennas and Propagation d: Diversity Techniques and Spatial Multiplexing Introduction: Diversity Diversity Use (or introduce) redundancy in the communications system Improve (short time) link reliability

More information

MULTIPATH fading could severely degrade the performance

MULTIPATH fading could severely degrade the performance 1986 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 12, DECEMBER 2005 Rate-One Space Time Block Codes With Full Diversity Liang Xian and Huaping Liu, Member, IEEE Abstract Orthogonal space time block

More information

Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity

Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity Er. Navjot Singh 1, Er. Vinod Kumar 2 Research Scholar, CSE Department, GKU, Talwandi Sabo, Bathinda, India 1 AP, CSE

More information

Advanced 3G and 4G Wireless communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur

Advanced 3G and 4G Wireless communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Advanced 3G and 4G Wireless communication Prof. Aditya K. Jagannatham Department of Electrical Engineering Indian Institute of Technology, Kanpur Lecture - 27 Introduction to OFDM and Multi-Carrier Modulation

More information

Space-Time Coding: Fundamentals

Space-Time Coding: Fundamentals Space-Time Coding: Fundamentals Xiang-Gen Xia Dept of Electrical and Computer Engineering University of Delaware Newark, DE 976, USA Email: xxia@ee.udel.edu and xianggen@gmail.com Outline Background Single

More information

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms Daniel Guenther Chair ISS Integrierte Systeme der Signalverarbeitung June 27th 2012 Institute for Communication Technologies and Embedded Systems

More information

An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems

An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems 9th International OFDM-Workshop 2004, Dresden 1 An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems Hrishikesh Venkataraman 1), Clemens Michalke 2), V.Sinha 1), and G.Fettweis 2) 1)

More information

Combined Transmitter Diversity and Multi-Level Modulation Techniques

Combined Transmitter Diversity and Multi-Level Modulation Techniques SETIT 2005 3rd International Conference: Sciences of Electronic, Technologies of Information and Telecommunications March 27 3, 2005 TUNISIA Combined Transmitter Diversity and Multi-Level Modulation Techniques

More information

VLSI IMPLEMENTATION OF LOW POWER RECONFIGURABLE MIMO DETECTOR. A Thesis RAJBALLAV DASH

VLSI IMPLEMENTATION OF LOW POWER RECONFIGURABLE MIMO DETECTOR. A Thesis RAJBALLAV DASH VLSI IMPLEMENTATION OF LOW POWER RECONFIGURABLE MIMO DETECTOR A Thesis by RAJBALLAV DASH Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for

More information

Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems

Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems Kiarash Amiri, (Rice University, Houston, TX, USA; kiaa@riceedu); Chris Dick, (Advanced Systems Technology

More information

Lecture 4 Diversity and MIMO Communications

Lecture 4 Diversity and MIMO Communications MIMO Communication Systems Lecture 4 Diversity and MIMO Communications Prof. Chun-Hung Liu Dept. of Electrical and Computer Engineering National Chiao Tung University Spring 2017 1 Outline Diversity Techniques

More information

Convolutional Coding Using Booth Algorithm For Application in Wireless Communication

Convolutional Coding Using Booth Algorithm For Application in Wireless Communication Available online at www.interscience.in Convolutional Coding Using Booth Algorithm For Application in Wireless Communication Sishir Kalita, Parismita Gogoi & Kandarpa Kumar Sarma Department of Electronics

More information

EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation

EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation November 29, 2017 EE359 Discussion 8 November 29, 2017 1 / 33 Outline 1 MIMO concepts

More information

CHAPTER 8 MIMO. Xijun Wang

CHAPTER 8 MIMO. Xijun Wang CHAPTER 8 MIMO Xijun Wang WEEKLY READING 1. Goldsmith, Wireless Communications, Chapters 10 2. Tse, Fundamentals of Wireless Communication, Chapter 7-10 2 MIMO 3 BENEFITS OF MIMO n Array gain The increase

More information

SPACE TIME coding for multiple transmit antennas has attracted

SPACE TIME coding for multiple transmit antennas has attracted 486 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 3, MARCH 2004 An Orthogonal Space Time Coded CPM System With Fast Decoding for Two Transmit Antennas Genyuan Wang Xiang-Gen Xia, Senior Member,

More information

Outline. Communications Engineering 1

Outline. Communications Engineering 1 Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband channels Signal space representation Optimal

More information

Using TCM Techniques to Decrease BER Without Bandwidth Compromise. Using TCM Techniques to Decrease BER Without Bandwidth Compromise. nutaq.

Using TCM Techniques to Decrease BER Without Bandwidth Compromise. Using TCM Techniques to Decrease BER Without Bandwidth Compromise. nutaq. Using TCM Techniques to Decrease BER Without Bandwidth Compromise 1 Using Trellis Coded Modulation Techniques to Decrease Bit Error Rate Without Bandwidth Compromise Written by Jean-Benoit Larouche INTRODUCTION

More information

Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems

Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems Advanced Science and echnology Letters Vol. (ASP 06), pp.4- http://dx.doi.org/0.457/astl.06..4 Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems Jong-Kwang Kim, Jae-yun Ro and young-kyu

More information

Amplitude and Phase Distortions in MIMO and Diversity Systems

Amplitude and Phase Distortions in MIMO and Diversity Systems Amplitude and Phase Distortions in MIMO and Diversity Systems Christiane Kuhnert, Gerd Saala, Christian Waldschmidt, Werner Wiesbeck Institut für Höchstfrequenztechnik und Elektronik (IHE) Universität

More information

6. FUNDAMENTALS OF CHANNEL CODER

6. FUNDAMENTALS OF CHANNEL CODER 82 6. FUNDAMENTALS OF CHANNEL CODER 6.1 INTRODUCTION The digital information can be transmitted over the channel using different signaling schemes. The type of the signal scheme chosen mainly depends on

More information

MIMO PERFORMANCE ANALYSIS WITH ALAMOUTI STBC CODE and V-BLAST DETECTION SCHEME

MIMO PERFORMANCE ANALYSIS WITH ALAMOUTI STBC CODE and V-BLAST DETECTION SCHEME International Journal of Science, Engineering and Technology Research (IJSETR), Volume 4, Issue 1, January 2015 MIMO PERFORMANCE ANALYSIS WITH ALAMOUTI STBC CODE and V-BLAST DETECTION SCHEME Yamini Devlal

More information

A New Transmission Scheme for MIMO OFDM

A New Transmission Scheme for MIMO OFDM IJSRD - International Journal for Scientific Research & Development Vol. 1, Issue 2, 2013 ISSN (online): 2321-0613 A New Transmission Scheme for MIMO OFDM Kushal V. Patel 1 Mitesh D. Patel 2 1 PG Student,

More information

Block Processing Linear Equalizer for MIMO CDMA Downlinks in STTD Mode

Block Processing Linear Equalizer for MIMO CDMA Downlinks in STTD Mode Block Processing Linear Equalizer for MIMO CDMA Downlinks in STTD Mode Yan Li Yingxue Li Abstract In this study, an enhanced chip-level linear equalizer is proposed for multiple-input multiple-out (MIMO)

More information

Physical-Layer Network Coding Using GF(q) Forward Error Correction Codes

Physical-Layer Network Coding Using GF(q) Forward Error Correction Codes Physical-Layer Network Coding Using GF(q) Forward Error Correction Codes Weimin Liu, Rui Yang, and Philip Pietraski InterDigital Communications, LLC. King of Prussia, PA, and Melville, NY, USA Abstract

More information

Review on Improvement in WIMAX System

Review on Improvement in WIMAX System IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 09 February 2017 ISSN (online): 2349-6010 Review on Improvement in WIMAX System Bhajankaur S. Wassan PG Student

More information

Transmit Antenna Selection in Linear Receivers: a Geometrical Approach

Transmit Antenna Selection in Linear Receivers: a Geometrical Approach Transmit Antenna Selection in Linear Receivers: a Geometrical Approach I. Berenguer, X. Wang and I.J. Wassell Abstract: We consider transmit antenna subset selection in spatial multiplexing systems. In

More information

Lab 3.0. Pulse Shaping and Rayleigh Channel. Faculty of Information Engineering & Technology. The Communications Department

Lab 3.0. Pulse Shaping and Rayleigh Channel. Faculty of Information Engineering & Technology. The Communications Department Faculty of Information Engineering & Technology The Communications Department Course: Advanced Communication Lab [COMM 1005] Lab 3.0 Pulse Shaping and Rayleigh Channel 1 TABLE OF CONTENTS 2 Summary...

More information

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi NTT DoCoMo Technical Journal Vol. 7 No.2 Special Articles on 1-Gbit/s Packet Signal Transmission Experiments toward Broadband Packet Radio Access Configuration and Performances of Implemented Experimental

More information

CT-516 Advanced Digital Communications

CT-516 Advanced Digital Communications CT-516 Advanced Digital Communications Yash Vasavada Winter 2017 DA-IICT Lecture 17 Channel Coding and Power/Bandwidth Tradeoff 20 th April 2017 Power and Bandwidth Tradeoff (for achieving a particular

More information

A New Approach to Layered Space-Time Code Design

A New Approach to Layered Space-Time Code Design A New Approach to Layered Space-Time Code Design Monika Agrawal Assistant Professor CARE, IIT Delhi maggarwal@care.iitd.ernet.in Tarun Pangti Software Engineer Samsung, Bangalore tarunpangti@yahoo.com

More information

An FPGA 1Gbps Wireless Baseband MIMO Transceiver

An FPGA 1Gbps Wireless Baseband MIMO Transceiver An FPGA 1Gbps Wireless Baseband MIMO Transceiver Center the Authors Names Here [leave blank for review] Center the Affiliations Here [leave blank for review] Center the City, State, and Country Here (address

More information

Lecture LTE (4G) -Technologies used in 4G and 5G. Spread Spectrum Communications

Lecture LTE (4G) -Technologies used in 4G and 5G. Spread Spectrum Communications COMM 907: Spread Spectrum Communications Lecture 10 - LTE (4G) -Technologies used in 4G and 5G The Need for LTE Long Term Evolution (LTE) With the growth of mobile data and mobile users, it becomes essential

More information

Design of Parallel Algorithms. Communication Algorithms

Design of Parallel Algorithms. Communication Algorithms + Design of Parallel Algorithms Communication Algorithms + Topic Overview n One-to-All Broadcast and All-to-One Reduction n All-to-All Broadcast and Reduction n All-Reduce and Prefix-Sum Operations n Scatter

More information

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved.

VOL. 3, NO.11 Nov, 2012 ISSN Journal of Emerging Trends in Computing and Information Sciences CIS Journal. All rights reserved. Effect of Fading Correlation on the Performance of Spatial Multiplexed MIMO systems with circular antennas M. A. Mangoud Department of Electrical and Electronics Engineering, University of Bahrain P. O.

More information

DESIGN OF STBC ENCODER AND DECODER FOR 2X1 AND 2X2 MIMO SYSTEM

DESIGN OF STBC ENCODER AND DECODER FOR 2X1 AND 2X2 MIMO SYSTEM Indian J.Sci.Res. (): 0-05, 05 ISSN: 50-038 (Online) DESIGN OF STBC ENCODER AND DECODER FOR X AND X MIMO SYSTEM VIJAY KUMAR KATGI Assistant Profesor, Department of E&CE, BKIT, Bhalki, India ABSTRACT This

More information

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR COMMUNICATION SYSTEMS Abstract M. Chethan Kumar, *Sanket Dessai Department of Computer Engineering, M.S. Ramaiah School of Advanced

More information

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik

UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS. Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik UNEQUAL POWER ALLOCATION FOR JPEG TRANSMISSION OVER MIMO SYSTEMS Muhammad F. Sabir, Robert W. Heath Jr. and Alan C. Bovik Department of Electrical and Computer Engineering, The University of Texas at Austin,

More information

CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS

CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS 44 CHAPTER 3 ADAPTIVE MODULATION TECHNIQUE WITH CFO CORRECTION FOR OFDM SYSTEMS 3.1 INTRODUCTION A unique feature of the OFDM communication scheme is that, due to the IFFT at the transmitter and the FFT

More information

Chapter 2 Channel Equalization

Chapter 2 Channel Equalization Chapter 2 Channel Equalization 2.1 Introduction In wireless communication systems signal experiences distortion due to fading [17]. As signal propagates, it follows multiple paths between transmitter and

More information

Digital Television Lecture 5

Digital Television Lecture 5 Digital Television Lecture 5 Forward Error Correction (FEC) Åbo Akademi University Domkyrkotorget 5 Åbo 8.4. Error Correction in Transmissions Need for error correction in transmissions Loss of data during

More information

Trellis-Coded Modulation [TCM]

Trellis-Coded Modulation [TCM] Trellis-Coded Modulation [TCM] Limitations of conventional block and convolutional codes on bandlimited channels Basic principles of trellis coding: state, trellis, and set partitioning Coding gain with

More information

Performance Evaluation of V-Blast Mimo System in Fading Diversity Using Matched Filter

Performance Evaluation of V-Blast Mimo System in Fading Diversity Using Matched Filter Performance Evaluation of V-Blast Mimo System in Fading Diversity Using Matched Filter Priya Sharma 1, Prof. Vijay Prakash Singh 2 1 Deptt. of EC, B.E.R.I, BHOPAL 2 HOD, Deptt. of EC, B.E.R.I, BHOPAL Abstract--

More information

Multiple Antennas. Mats Bengtsson, Björn Ottersten. Basic Transmission Schemes 1 September 8, Presentation Outline

Multiple Antennas. Mats Bengtsson, Björn Ottersten. Basic Transmission Schemes 1 September 8, Presentation Outline Multiple Antennas Capacity and Basic Transmission Schemes Mats Bengtsson, Björn Ottersten Basic Transmission Schemes 1 September 8, 2005 Presentation Outline Channel capacity Some fine details and misconceptions

More information

ENERGY EFFICIENT WATER-FILLING ALGORITHM FOR MIMO- OFDMA CELLULAR SYSTEM

ENERGY EFFICIENT WATER-FILLING ALGORITHM FOR MIMO- OFDMA CELLULAR SYSTEM ENERGY EFFICIENT WATER-FILLING ALGORITHM FOR MIMO- OFDMA CELLULAR SYSTEM Hailu Belay Kassa, Dereje H.Mariam Addis Ababa University, Ethiopia Farzad Moazzami, Yacob Astatke Morgan State University Baltimore,

More information

Lecture 5: Antenna Diversity and MIMO Capacity Theoretical Foundations of Wireless Communications 1

Lecture 5: Antenna Diversity and MIMO Capacity Theoretical Foundations of Wireless Communications 1 Antenna, Antenna : Antenna and Theoretical Foundations of Wireless Communications 1 Friday, April 27, 2018 9:30-12:00, Kansliet plan 3 1 Textbook: D. Tse and P. Viswanath, Fundamentals of Wireless Communication

More information

International Journal of Digital Application & Contemporary research Website: (Volume 1, Issue 7, February 2013)

International Journal of Digital Application & Contemporary research Website:   (Volume 1, Issue 7, February 2013) Performance Analysis of OFDM under DWT, DCT based Image Processing Anshul Soni soni.anshulec14@gmail.com Ashok Chandra Tiwari Abstract In this paper, the performance of conventional discrete cosine transform

More information

Multiple Antennas in Wireless Communications

Multiple Antennas in Wireless Communications Multiple Antennas in Wireless Communications Luca Sanguinetti Department of Information Engineering Pisa University lucasanguinetti@ietunipiit April, 2009 Luca Sanguinetti (IET) MIMO April, 2009 1 / 46

More information

Bit-Interleaved Coded Modulation: Low Complexity Decoding

Bit-Interleaved Coded Modulation: Low Complexity Decoding Bit-Interleaved Coded Modulation: Low Complexity Decoding Enis Aay and Ender Ayanoglu Center for Pervasive Communications and Computing Department of Electrical Engineering and Computer Science The Henry

More information

Performance Analysis of Optimal Scheduling Based Firefly algorithm in MIMO system

Performance Analysis of Optimal Scheduling Based Firefly algorithm in MIMO system Performance Analysis of Optimal Scheduling Based Firefly algorithm in MIMO system Nidhi Sindhwani Department of ECE, ASET, GGSIPU, Delhi, India Abstract: In MIMO system, there are several number of users

More information

Multiple Antenna Processing for WiMAX

Multiple Antenna Processing for WiMAX Multiple Antenna Processing for WiMAX Overview Wireless operators face a myriad of obstacles, but fundamental to the performance of any system are the propagation characteristics that restrict delivery

More information

UNIT-II LOW POWER VLSI DESIGN APPROACHES

UNIT-II LOW POWER VLSI DESIGN APPROACHES UNIT-II LOW POWER VLSI DESIGN APPROACHES Low power Design through Voltage Scaling: The switching power dissipation in CMOS digital integrated circuits is a strong function of the power supply voltage.

More information

BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOCK CODES WITH MMSE CHANNEL ESTIMATION

BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOCK CODES WITH MMSE CHANNEL ESTIMATION BER PERFORMANCE AND OPTIMUM TRAINING STRATEGY FOR UNCODED SIMO AND ALAMOUTI SPACE-TIME BLOC CODES WITH MMSE CHANNEL ESTIMATION Lennert Jacobs, Frederik Van Cauter, Frederik Simoens and Marc Moeneclaey

More information

Reception for Layered STBC Architecture in WLAN Scenario

Reception for Layered STBC Architecture in WLAN Scenario Reception for Layered STBC Architecture in WLAN Scenario Piotr Remlein Chair of Wireless Communications Poznan University of Technology Poznan, Poland e-mail: remlein@et.put.poznan.pl Hubert Felcyn Chair

More information

Modulation and Coding Tradeoffs

Modulation and Coding Tradeoffs 0 Modulation and Coding Tradeoffs Contents 1 1. Design Goals 2. Error Probability Plane 3. Nyquist Minimum Bandwidth 4. Shannon Hartley Capacity Theorem 5. Bandwidth Efficiency Plane 6. Modulation and

More information

Cognitive Radio Transmission Based on Chip-level Space Time Block Coded MC-DS-CDMA over Fast-Fading Channel

Cognitive Radio Transmission Based on Chip-level Space Time Block Coded MC-DS-CDMA over Fast-Fading Channel Journal of Scientific & Industrial Research Vol. 73, July 2014, pp. 443-447 Cognitive Radio Transmission Based on Chip-level Space Time Block Coded MC-DS-CDMA over Fast-Fading Channel S. Mohandass * and

More information

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations

Sno Projects List IEEE. High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations Sno Projects List IEEE 1 High - Throughput Finite Field Multipliers Using Redundant Basis For FPGA And ASIC Implementations 2 A Generalized Algorithm And Reconfigurable Architecture For Efficient And Scalable

More information

UTILIZATION OF AN IEEE 1588 TIMING REFERENCE SOURCE IN THE inet RF TRANSCEIVER

UTILIZATION OF AN IEEE 1588 TIMING REFERENCE SOURCE IN THE inet RF TRANSCEIVER UTILIZATION OF AN IEEE 1588 TIMING REFERENCE SOURCE IN THE inet RF TRANSCEIVER Dr. Cheng Lu, Chief Communications System Engineer John Roach, Vice President, Network Products Division Dr. George Sasvari,

More information

Low Complexity Decoding of Bit-Interleaved Coded Modulation for M-ary QAM

Low Complexity Decoding of Bit-Interleaved Coded Modulation for M-ary QAM Low Complexity Decoding of Bit-Interleaved Coded Modulation for M-ary QAM Enis Aay and Ender Ayanoglu Center for Pervasive Communications and Computing Department of Electrical Engineering and Computer

More information

Keywords SEFDM, OFDM, FFT, CORDIC, FPGA.

Keywords SEFDM, OFDM, FFT, CORDIC, FPGA. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Future to

More information

Optimization of Coded MIMO-Transmission with Antenna Selection

Optimization of Coded MIMO-Transmission with Antenna Selection Optimization of Coded MIMO-Transmission with Antenna Selection Biljana Badic, Paul Fuxjäger, Hans Weinrichter Institute of Communications and Radio Frequency Engineering Vienna University of Technology

More information

FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver

FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver Guohui Wang, Bei Yin, Kiarash Amiri, Yang Sun, Michael Wu, Joseph R Cavallaro Department of Electrical and Computer Engineering Rice University,

More information

Technical Aspects of LTE Part I: OFDM

Technical Aspects of LTE Part I: OFDM Technical Aspects of LTE Part I: OFDM By Mohammad Movahhedian, Ph.D., MIET, MIEEE m.movahhedian@mci.ir ITU regional workshop on Long-Term Evolution 9-11 Dec. 2013 Outline Motivation for LTE LTE Network

More information

Comparison of ML and SC for ICI reduction in OFDM system

Comparison of ML and SC for ICI reduction in OFDM system Comparison of and for ICI reduction in OFDM system Mohammed hussein khaleel 1, neelesh agrawal 2 1 M.tech Student ECE department, Sam Higginbottom Institute of Agriculture, Technology and Science, Al-Mamon

More information

An Alamouti-based Hybrid-ARQ Scheme for MIMO Systems

An Alamouti-based Hybrid-ARQ Scheme for MIMO Systems An Alamouti-based Hybrid-ARQ Scheme MIMO Systems Kodzovi Acolatse Center Communication and Signal Processing Research Department, New Jersey Institute of Technology University Heights, Newark, NJ 07102

More information

Maximum Likelihood Detection of Low Rate Repeat Codes in Frequency Hopped Systems

Maximum Likelihood Detection of Low Rate Repeat Codes in Frequency Hopped Systems MP130218 MITRE Product Sponsor: AF MOIE Dept. No.: E53A Contract No.:FA8721-13-C-0001 Project No.: 03137700-BA The views, opinions and/or findings contained in this report are those of The MITRE Corporation

More information

MIMO RFIC Test Architectures

MIMO RFIC Test Architectures MIMO RFIC Test Architectures Christopher D. Ziomek and Matthew T. Hunter ZTEC Instruments, Inc. Abstract This paper discusses the practical constraints of testing Radio Frequency Integrated Circuit (RFIC)

More information

Universal Space Time Coding

Universal Space Time Coding IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 5, MAY 2003 1097 Universal Space Time Coding Hesham El Gamal, Member, IEEE, and Mohamed Oussama Damen, Member, IEEE Abstract A universal framework

More information

Configurable Joint Detection Algorithm for MIMO Wireless Communication System

Configurable Joint Detection Algorithm for MIMO Wireless Communication System Configurable Joint Detection Algorithm for MIMO Wireless Communication System 1 S.Divyabarathi, 2 N.R.Sivaraaj, 3 G.Kanagaraj 1 PG Scholar, Department of VLSI, AVS Engineering College, Salem, Tamilnadu,

More information