A GPU Implementation for two MIMO OFDM Detectors

Size: px
Start display at page:

Download "A GPU Implementation for two MIMO OFDM Detectors"

Transcription

1 A GPU Implementation for two MIMO OFDM Detectors Teemu Nyländen, Janne Janhunen, Olli Silvén, Markku Juntti Computer Science and Engineering Laboratory Centre for Wireless Communications University of Oulu, Finland University of Oulu, Finland {teemu.nylanden, {janne.janhunen, Abstract Two real-valued signal models based on selective spanning with fast enumeration (SSFE) and layered orthogonal lattice detector (LORD) algorithms are implemented on a Nvidia graphics processing unit (GPU). A 2 2 multiple-input multipleoutput (MIMO) antenna system with 16-quadrature amplitude modulation (16-QAM) is assumed. The chosen level update vector for SSFE is based on computer simulation results carried out in MATLAB environment. We implemented the algorithms with Nvidia Quadro FX 1700 GPU and achieved a throughput of Mbps for SSFE and 16.8 Mbps for LORD. The results show that the general-purpose graphics processing unit (GPGPU) has the potential to achieve high throughput, presuming a detection algorithm that allows efficient parallel processing. The latency of the control code and partial Euclidean distance (PED) calculations are very small-scale, but the latency of memory loads and stores to the GPUs global memory are significant. We also compare results from the trellis based detector implementation for GPU, where a more powerful GPU and a different detection algorithm are used. The GPUs offer superior computing power and programmability compared to the application specific software defined radio (SDR) designs implemented so far. I. INTRODUCTION In the third generation partnership project (3GPP) long term evolution (LTE) targets, it is planned to transmit 100 Mbps through wireless connections [1]. High-rate wireless communication needs power efficient solutions to process the increasing amounts of data with limited hardware and low power consumption. The multiple-input multiple-output (MIMO) antenna system combined with the orthogonal frequency division multiplexing technique (MIMO OFDM) has been included in multiple wireless telecommunication standards, such as the IEEE wireless local area network (WLAN), IEEE wireless metropolitan area network (WMAN), Worldwide Interoperability for Microwave Access (WiMAX) and the 3GPP LTE. The multipath environment causes the MIMO channel to be frequency-selective and OFDM can transform such a channel into a set of parallel frequency-flat MIMO channels. The transform into a frequency-flat MIMO channel decreases the computational complexity of the receiver. Because multiple standards are being proposed for wireless and wired communication, flexibility is required from the terminal device. The advantage of the software defined radio (SDR) is its flexibility in an environment with multiple standards but also within a single standard. However, the increased computing loads of the future applications cause new challenges to SDR implementations. The GPUs possess the programmable flexibility and the computing power to rise to these challenges. The three major computational blocks in a MIMO OFDM receiver are fast Fourier transform (FFT), detection and channel decoding. As illustrated in [2], a graphics processing unit (GPU) implementation of a channel decoding block with lowdensity parity-check coding (LDPC) achieves a high throughput. The CUDA architecture offers built-in functions that provide for efficient FFT processing on a GPU. As this study will show, also the detection block can be mapped on a GPU, presuming that a detection algorithm, which can be efficiently parallelized is chosen. On the other hand, GPUs have not been designed to be used as SDR processors. Consequently, one of the goals of this work is to identify the shortcomings for future improvements. The maximum likelihood (ML) detector is optimal for finding the closest lattice point [3]. However, it is not often feasible for real implementations, because its computational complexity increases exponentially with the increasing number of transmit antennas. Selective spanning with fast enumeration (SSFE) [4] and layered orthogonal lattice detector (LORD) [5] calculate the near-ml solution with reduced computational complexity compared to full-complexity exhaustive search ML detectors. We studied the performance of the Nvidia Quadro FX 1700 to fulfill the real-time requirements of the MIMO OFDM detector. The SSFE algorithm is a near-ml MIMO detection algorithm that produces a deterministic and regular data flow [4]. For a real-valued system, the SSFE can be characterized by a level update vector m =[m 1,..., m 2N ], where N is the number of transmit antennas. The level update vector also defines the computational complexity of the algorithm. The LORD algorithm offers a MAP performance with 2 2 antenna system [6] and the algorithm achieves this with rather low computational complexity, if a 16 quadrature amplitude modulation (QAM) is assumed. Both algorithms proceed one level at the time by calculating the partial Euclidean distances (PEDs) for each level. Most of the computational complexity of the algorithms originates from PED calculations and slicing operations. The PED calculations and slicing operations can, however, be performed in parallel for each level. High computing power is, nonetheless, required to achieve the real-time requirements of the detector. The algorithm presented in [4] was modified to be real-valued /10/$ IEEE 293

2 instead of using the complex-valued presented in the original algorithm study. In addition, the original real-valued LORD [5] was implemented. GPUs are designed for graphics processing and therefore cannot be efficiently used as SDR processors as such. However, the purpose of this paper is to study the possibilities of massively parallel processing for MIMO OFDM detection. Restrictions, such as power consumption and data transfers to and from the device, are acknowleged by the authors, but ignored since they are not in the scope of this study. The rest of the paper is organized as follows. The system model and maximum likelihood detection are briefly presented in Section II. The algorithms, SSFE and LORD, alongside the simulation results are presented in Section III. Nvidia GPU, our detector implementations and the results are presented in Sections V VI, respectively. Section VII compares our implementations with the [7] implementation. The final section concludes the paper. II. SYSTEM MODEL A MIMO OFDM based multiple antenna system is assumed with N transmit and M receive antennas. Figure 1 presents the block diagram of the MIMO OFDM transmission architecture. In this study, the detector block of the receiver is implemented to a single-precision floating-point GPU. Encoding Interleaving Mapping S/P Decoding Fig. 1. Deinterleaving P/S Soft Detection Channel and SNR estimation OFDM modulation OFDM modulation OFDM demodulation OFDM demodulation Channel Block diagram of the MIMO OFDM transmission. The received signal on sth subcarrier can be presented as y s = H s x s + n s, s =1, 2..., S (1) where S is the number of subcarriers, y s C M is the received signal vector, x s C N is the transmitted symbol vector and n s C M is the noise vector. The symbol H s C M N denotes the channel matrix. The entries of x s are chosen independently of each other from a QAM constellation. The ML detector minimizes the Euclidean distance between the received signal y and the lattice points Hx, and selects the lattice point that minimizes the Euclidean distance to the received vector y, i.e., ˆx = arg min y Hx 2, (2) x A N where A is the symbol alphabet and denotes the Frobenius norm of a vector. The exhaustive search can be used to solve the ML detection problem. However, it becomes computationally unfeasible as the set of lattice points increases. The SSFE and LORD algorithms solve the ML approximation (2) by limiting the search to the lattice points within a search tree specified by the algorithms. The received symbol is placed somewhere between the lattice points due to additive noise in the channel. At this point, the maximum likelihood method would calculate the Euclidean distance between the received symbol and every constellation point, whereas the SSFE and LORD algorithms only calculate the distances to a limited set of constellation points within the search tree. The depth of the search tree depends on the number of receiver antennas. III. DETECTORS The channel matrix H can be QR decomposed (QRD) into two parts. If the number of transmit and receiver antennas are equal, the channel matrix can be presented as H = QR, where Q denotes a N N orthogonal matrix and R is a N N upper triangular matrix. After the QR decomposition, the equation y Hx 2 can be rewritten as y QRx 2 Q H y Rx 2 (3), where Q H denotes the Hermitian transpose of matrix Q. And by denoting Q H y = ŷ, we get ŷ Rx 2. (4) Both implemented algorithms include QRD as preprocessing. The upper triangular matrices R and ŷ generated in the QRD were set to be fixed in the GPUs constant memory. A. Selective Spanning with Fast Enumeration The SSFE algorithm provides a fixed throughput and computational complexity. It is also easily parallelized, and, thus, is an interesting alternative for implementation. The computational complexity of the algorithm depends only on the number of antennas and the level update vector. The level update vector also determines the output list size. By setting the level update vector m = [1111], only four PED calculations and slicing operations are performed. At the other extreme, the algorithm with level update vector m = [4444] achieves MAP performance, but a total of 256 PED calculations and slicing operations need to be performed. By carefully choosing the level update vector, a compromise between error rate and computational complexity is achieved. The SSFE algorithm is based on the tree search strategy, i.e., the algorithm traverses a search three by calculating all 294

3 the admissible PEDs and storing the PEDs to the intermediate list in the memory. The search will continue with the nodes determined by the level update vector on the next level until the leaf nodes are reached. After the final level, the final candidate list is used for log likelihood ratio (LLR) calculation. However, it should be noted that the final candidate list may not include the lowest EDs. Figure 2 presents the search tree of the SSFE algorithm, where level update vector m = [1144] results in an output list size of 16. A real-valued signal model, a 2 2 antenna system and 16-QAM are assumed. processing at the higher level antennas. This is performed by a slicing operation somewhat similar to the SSFE slicing operation. The high dependency on the constellation used quickly becomes the limiting factor with the LORD algorithm. C. Performance Example A good compromise on the list size for SSFE was decided on by running simulations. The list size has a significant impact on the computational complexity of the SSFE algorithm. Parameter studies were performed on a MIMO OFDM simulator running in MATLAB environment. In the simulator, one frame corresponds to one OFDM symbol, and consists of 300 individual symbol vectors, each mapped to one OFDM subcarrier. Table I presents the simulation parameters inspired by the 3G LTE specifications [8], [9], [10]. The corresponding frame error rate (FER) curves are presented in Figure 3. TABLE I SIMULATION PARAMETERS Fig. 2. Example of a SSFE search tree. B. The Layered Orthogonal Lattice Detector A layered orthogonal lattice detector (LORD) is a softoutput near-optimal lattice detector that relies on a channel orthogonalization process [6]. The LORD algorithm is very similar to the SSFE algorithm. The greatest difference is that the channel matrix is reordered in the preprocessing and separate QRDs as well as separate tree searches for each transmit antenna are required. Assuming a 2 2 system, the LORD algorithm achieves MAP performance. With a real-valued 16-QAM system, the SSFE algorithm would need a full search with m = [4444] to achieve the same FER performance, resulting in eight times higher number of calculations. However, the LORD algorithm is heavily dependent on the constellation and the number of antennas used. With higher order modulations the computational complexity rapidly increases. Compared to the SSFE algorithm, the LORD algorithm also wastes memory resources, which can cause problems with GPU type parallel processing. Figure 2 presents the search tree pruning for the LORD algorithm. A real-valued signal model, a 2 2 antenna system and 16-QAM are assumed. The search tree is exactly the same as with the SSFE algorithm with m = [1144]. The only difference is that with the LORD algorithm there are two search trees, one for each transmit antenna. Assuming a V -QAM modulation, the LORD algorithm covers all V 2 values for the in-phase (I) and quadrature-phase (Q) of the lowest level antenna. Each of the covered values is decoded with spatial decision feedback equalizing (DFE) Number of subcarriers 512 of which 300 used Bandwidth 5MHz Carrier frequency 2.4 GHz Cyclic prefix (CP) duration 4.69 μs Symbol duration μs MIMO scheme VBLAST Channel code Turbo code with six iterations Code rate 1/2 Channel model Typical urban, 6 taps User velocity 120 km/h Base station antenna separation 4λ Mobile antenna separation 0.5λ FER Fig. 3. 2x2 MIMO system, 16 QAM, correlated channel SSFE, m=[ ] LORD MAP SNR (db) FER comparison for real-valued QAM systems. Based on the simulations, the list size of 16 was found to offer a good compromise between computational complexity and FER performance. Further simulations were performed to discover the best configuration for the level update vector. It was found that m = [1224] offers the best FER performance with a list size of 16. However, m = [1144] can be better mapped for GPU processing, with only a small increase in FER. Figure 3 also shows that the LORD algorithm achieves MAP performance with the 2 2 antenna system and 16-QAM. The 295

4 computational complexity compared to SSFE with m = [1144] is, however, doubled. IV. COMPUTE UNIFIED DEVICE ARCHITECTURE The Nvidia Quadro FX 1700 is one of the mid-range products of the Nvidia Quadro product family. It consists of four streaming multiprocessors (SMs). Each of the SMs contains eight pipelined scalar processor (SP) cores. The cores are running at 920 MHz. The maximum number of active threads running on the Quadro FX 1700 is 3072, 768 per SM. The maximum peak rate supported by Quadro FX 1700 is about 89 GFLOPS [11]. The Quadro FX 1700 has a global memory of 512 MB of graphics double data rate 2 (GDDR2) running at 400 MHz. It has a 128-bit memory interface and a 12.8 GB/s memory bandwidth. The total amount of constant memory available is 64 kb, and 16 kb of shared memory is offered per block. The maximum power consumption of the Quadro FX 1700 is 42 W[11]. CUDA is a software programming model for programmers to write scalable parallel programs using C. There are CUDA extensions available for some other standard programming languages too, for example FORTRAN. CUDA is developed by Nvidia and it requires a Nvidia GPU [12]. In the CUDA programming model, a GPU is viewed as a computing device that works as a co-processor for the main central processing unit (CPU). The CPU is often called the host and the GPU is called the device. The massive computational capability of the GPU is based on their high level hardware parallelism. A GPU can have several SMs. Parallel portions of the program are executed as kernel functions on the device. A kernel is a function that is called from the CPU, but executed on the GPU. Only a single kernel is executed at a time, but thousands of threads can be executed simultaneously in parallel inside a single kernel function. A kernel is composed of a grid that consists of a set of equal size thread blocks. At every kernel launch, the grid and block dimensions to be used are fed to the kernel as an input. One block can contain up to 512 threads. The grid can consist of multiple equally sized thread blocks, so the total number of threads is equal to the number of threads per block times the number of blocks. However, the number of thread blocks is more dependent on the processed data than the number of streaming multiprocessors available [13]. Figure 4 illustrates the composition of a kernel grid [14]. To manage the thousands of threads being processed simultaneously, the SMs use the single-instruction multiplethread (SIMT) architecture. It maps each thread to a SP core and executes them independently, assigning them with their own instruction address and register state [13]. The SIMT architecture concentrates on execution of a single thread. The threads are gathered by the SIMT unit into groups of 32 parallel threads called warps. When a kernel function with one or more thread blocks is being executed, the SIMT unit divides the threads into warps and schedules them for execution. The DEVICE (1,1) (0,1) (0,2) Fig. 4. GRID 1 (0,1) (1,1) (1,2) (1,1) (2,1) (2,2) (3,0) (3,1) (3,2) An example of a kernel grid. (2,1) threads inside a warp start the execution simultaneously at the same program address, but they are free to branch and execute independently. The threads are also assigned with unique increasing thread IDs. A GPU can have a large amount of off-chip memory, referred to as global memory. In addition to the global memory, GPUs also have fast on-chip memory and register resources. Although the size of the global memory can be notable, it is an off-chip resource, and thus, substantially slower than the on-chip resources. The latency penalty due to memory transfers to and from global memory can be avoided to some extent by efficient use of on-chip resources. When mapping the algorithms on CUDA, it is important to minimize the global memory reads and writes, due to the long latency they incur. Before starting the execution of a kernel the required data has to be copied from the CPUs system memory to the GPUs global and constant memories. The data transfers to and from the device are significantly slow due to the slow PCI-express bus, which is why the data should be kept on the device memory as long as possible. This is a limiting factor with GPU detector implementations, but since the purpose of this study was to explore the computational capacity of the GPU for MIMO OFDM detection, the data transfer issues were discarded and the main focus was on computing power. V. MAPPING SSFE ON CUDA The massive parallelism offered by the GPUs makes it possible to run numerous parallel independent tree searches on a single GPU. The computations required in the SSFE and LORD algorithms can be efficiently parallelized and mapped for GPU processing. By mapping the SSFE algorithm with vector m = [1144], the computations can be efficiently performed in parallel with 16 threads. However, to allocate a full warp of 32 threads, at least two parallel subcarrier detections need to be performed in parallel. The simplest way to map the parallel searches would be to perform one parallel subcarrier detection per thread block. However, lesser number of active threads would be performing the calculations, since the maximum number of active thread blocks per SM is eight. To increase the number of parallel subcarrier detections without the need to stall any 296

5 warps, better performance can be achieved by mapping two or more parallel subcarrier detections in a single block. In parallel programming such as CUDA, conditional execution of code should be avoided if possible. When mapped for parallel processing with CUDA, the SSFE algorithm requires conditional execution of code at least in the slicing operations. The slicing operations in both algorithms are performed by exploiting the threadid and/or blockid variables. When branching occurs, the branches will be executed in serial. This will naturally deteriorate overall performance. In our implementation, the threadid is mainly used in slicing operations and the blockid is mostly used to select the received partial symbol vector for calculations. Because branching could not be totally avoided with either one of the algorithms, some portions of the code were executed in serial. A number of computer simulations with different grid and block configurations were performed. The detection kernel execution time was averaged over 1000 runs and the results were recorded using CUDA Visual Profiler. The simulation results are presented in Table II. TABLE II SSFE DETECTOR CONFIGURATIONS GRID SIZE THROUGHPUT OCCUPANCY (threads per block blocks) (Mbps) % The peak performance of Mbps was achieved by mapping 64 parallel subcarrier detections on the GPU. The parallel subcarrier detections were performed with 32 thread blocks consisting of 32 threads, which gives a total of 1024 active threads per kernel. As the number of parallel subcarrier detections increased also the amount of branching required increased. However, each single branch only performs a single memory fetch operation from the constant memory and is therefore executed with a very small latency. As illustrated in Table II, the performance starts to deteriorate when the number of thread blocks exceeds 32, or the number of threads per block exceeds 64. It is also shown that the higher occupancy of the GPU does not guarantee better performance. The performance deterioration results from the limited amount of fast register resources. The block size of 64 threads increases the occupancy level of the GPU, but only six thread blocks were active, since the GPU ran out of register resources. By using a GPU with a higher number of SMs more thread blocks could be used and better throughput achieved. Figure 5 illustrates the composition of a kernel grid used in this implementation. Due to the characteristics of the algorithm, a one dimensional grid with one dimensional thread blocks was used. Since only 16 threads were needed for one subcarrier detection, one thread block detects two subcarriers. The first 16 threads were used to detect the first subcarrier and the rest of the threads inside the block detected the second subcarrier. Block and thread indices were used to select which subcarrier was to be detected. DEVICE Fig. 5. GRID (32,0) (32,0) Grid and block composition for the SSFE implementation. The Quadro FX 1700 would be capable of running 3072 active threads simultaneously. So, only one third of its full capacity was harnessed, due to the algorithm characteristics. A. Memory usage After preprocessing the generated values for R and ŷ were set to be fixed into the constant memory to avoid unnecessary and costly data transfers between the host and device. The fast register and shared memory resources were used in the computations and only the final candidate and PED lists were written to the global memory and then transferred to the host. As discussed earlier, the focus of this study is on the computational power of GPUs, which is why the costly data transfers are left with less attention. The shared memory was used for variables that could be shared along all the threads inside a single block. The registers were used to store variables and intermediate results that were only used by a single thread. Table III shows the memory allocation for the different grid configurations. TABLE III SSFE MEMORY UTILIZATION GRID SIZE SHARED MEMORY REGISTERS (per block in bytes) (per thread) The block size dictates how efficiently threadid and blockid variables can be exploited in the computations. In this implementation, these variables are also involved in the branching necessitated by the algorithm. The threadid is mainly used for slicing operations, and the blockid is mainly used for sorting out the subcarriers to be processed. It is also 297

6 shown that the memory usage of the SSFE algorithm is small, making it a promising candidate for mobile solutions. Also the computational complexity is quite small with a suitably selected level update vector. VI. MAPPING LORD ON CUDA As already mentioned, the LORD algorithm offers MAP performance with a 2 2 antenna system. The computational complexity is, however, highly dependent on the constellation used. In our implementation, the real-valued 16-QAM was used, which kept the computational complexity rather low. With a 2 2 antenna system, two tree searches per subcarrier detection, compared to the one search in the SSFE algorithm are required. If a 16-QAM and a 2 2 antenna system is assumed, the computational complexity is doubled compared to SSFE with the vector m = [1144]. Then again, the LORD achieves MAP performance and the SSFE algorithm falls about 2 dbs short of MAP performance with the preceding configuration. We implemented the two tree searches to be performed in a single kernel. One whole parallel subcarrier detection for the LORD algorithm can be efficiently mapped with 32 threads, assuming a real-valued QAM system. The first 16 threads of a thread block performed the first tree search and the next 16 threads performed the second tree search concurrently. Due to the structure of the LORD algorithm, more branching was required compared to the SSFE algorithm. The excess and more complex branching, higher number of calculations and less efficient memory utilization result in lower throughput compared to the SSFE algorithm. Table IV presents the simulation results for the LORD algorithm. Less simulations with the LORD algorithm were performed, since the branching and higher memory utilization deteriorate the performance level at a highly accelerating pace. Table IV also shows that the GPU allocation starts to fall as the block size and number of blocks is increased to 32. According to the CUDA Occupancy Calculator, the occupancy should be 33 percent for this configuration, but the CUDA Visual Profiler reveals that the ineffective branching required only allows occupancy of 25 percent. TABLE IV LORD DETECTOR CONFIGURATIONS GRID SIZE THROUGHPUT OCCUPANCY (threads per block blocks) (Mbps) % The composition of the grid used in the LORD algorithm implementation is very similar to that presented in Figure 5. The LORD algorithm also uses 32 blocks in the implementation that results in peak performance, but the block size is reduced to 16 threads. This means that the LORD algorithm needs two blocks instead of one to perform a single subcarrier detection and that only 512 active threads are in use. Only the block index was used to select which subcarrier was being detected, but the thread indices were used in the slicing operations as well as in selecting which tree search was being performed. Figure 6 illustrates the composition of the kernel grid and thread blocks for the LORD algorithm. DEVICE Fig. 6. GRID (16,0) (32,0) Grid and block composition for the LORD algorithm. (16,0) A. Memory Usage While the SSFE uses only a subtle amount of memory, the memory requirements of the LORD algorithm are considerably larger. The computations for the two search trees themselves almost double the memory requirements compared to SSFE. Also the characteristics of the LORD algorithm require more variables for the calculations, which increases the memory requirements even more. The memory allocation for the LORD algorithm is presented in Table V. TABLE V LORD MEMORY UTILIZATION GRID SIZE SHARED MEMORY REGISTERS (per block in bytes (per thread) In addition, with higher modulations and with antenna configurations greater than 2 2, the high memory usage becomes the bottleneck of LORD. Especially, the scarce register resources are insufficient for the LORD algorithm to be efficiently mapped on the GPU with higher antenna and constellation configurations. The SSFE algorithm does not utilize as much memory as the LORD, presuming a proper selection of the level update vector. More parallel tree searches can therefore be performed by using SSFE. VII. COMPARISON In [15] and [7] a GPU implementation of a MIMO OFDM detector were presented. In [7] the implementation achieves a peak throughput of Mbps with a complex-valued QAM system. Table VI [11] presents the major differences between the GPUs used in the implementations. Although the implementation results of this study fall short from the results presented 298

7 in [7], the GPU in [7] was much more powerful than that used in this work. The GeForce 9600 GT used in [7] has twice the number of cores the Quadro FX 1700 has, the core speed and the memory clock speed are also double compared to the Quadro FX Taking the differences in GPU performance and the scalable programming model into consideration, the results presented in this work outrun the results achieved in [7]. However, it has to be noted that the LLR computations were not included in our implementation, unlike in [7]. TABLE VI GPU RESOURCE COMPARISON Geforce 9600 GT Quadro FX 1700 Core Clock 650 MHz 460 MHz Shader Clock 1625 MHz 920 MHz Memory Clock 900 MHz 400 MHz Memory Bandwidth 57.6 GB/s 12.8 GB/s FLOPS 208 GFLOPS GFLOPS Table VII presents a comparison between the implementation results in terms of throughput (Mbps), goodput (Mbps) and execution time (ms). Throughput defines how much the hardware can output data in a time unit. However, goodput takes into account also the error probability, which is typical for the detector in certain channel condition. The goodput is the detection rate times (1-FER) at the given SNR. Since the goodput is calculated after the decoder, the code rate is taken into account, which in this case is assumed to be 1 2. Note that the LORD algorithm performs better in bad channel conditions. However, when a better channel is available, the SSFE detector achieves a higher goodput. Our implementation can easily adapt to the changing channel conditions by switching between the detection algorithms. TABLE VII COMPARISON OF THE RESULTS SSFE, m=[1144] LORD Trellis based [7] Throughput n/a n/a Execution 14.2 us / us / ms / 2200 time subcarriers subcarriers subcarriers In [7], the implementation used 16 threads for one subcarrier detection with a QAM system, similar to our SSFE implementation. Compared to the two parallel subcarrier detections mapped in each block in the SSFE implementation, and one subcarrier detection per block with the LORD algorithm presented in this study, [7] mapped four parallel subcarrier detections in each block, making the block size of the implementation in [7] 64 compared to 32 and 16 with SSFE and LORD implementations in our studies, respectively. As earlier presented any, larger block size than 32 with SSFE and 16 with LORD decreased the performance of our implementation due to the limitations in fast memory resources. However, our implementation could be scaled for more powerful GPUs by adding the number of thread blocks and therefore the number of parallel subcarrier detections. The overall threads used in our implementations were 1024 for SSFE and 512 for LORD compared to the in [7]. Both of the implementations allocated only 33 percents of the GPUs resources. Although, a higher GPU allocation was achieved with some configurations in this study, but with reduced performance. VIII. CONCLUSION Two MIMO OFDM detector implementations for singleprecision floating-point GPU processing were presented. The implementations were designed for maximum throughput, but also the GPU utilization was taken into account. Some flaws, such as power consumption and costly data transfers, were ignored in this study, due to the fact that GPUs are not designed for SDR processing as such. The limited size of the fast onchip memory resources and the required branching were found to be the limiting factors for the GPU implementations. An interesting future solution would be a GPU that is designed specifically for baseband solutions. The emergence of open computing language (OpenCL) will ease the realization of such GPU. The implementations presented suit SDR processing well. For example, the SSFE algorithm can easily adapt to the different channel conditions simply by changing the level update vector. The GPU detector provides for flexible solutions to support the different configurations included in the future LTE systems. By remolding the memory and the I/Oarchitectures of the GPUs, the GPUs can meet the LTE performance requirements. The GPU based MIMO OFDM detector implementations proposed in this paper offer a promising solution for software defined radio, and GPUs specifically designed for baseband solutions will make them even more promising. REFERENCES [1] 3rd Generation Partnership Project (3GPP); Technical Specification Group Radio Access Network, Physical layer aspects for evolved UTRA (TR version (release 7)), 3rd Generation Partnership Project (3GPP), Tech. Rep., [2] G. Falco, V. Silva, and L. Sousa, How GPUs can outperform ASICs for fast LDPC decoding, in In Proceedings of the 23rd international Conference on Supercomputing ICS 09, New York, USA, Jun. 2009, pp [3] M. O. Damen, H. E. Gamal, and G. Caire, On maximum likelihood detection and the search for the closest lattice point, IEEE Transactions on Information Theory, vol. 49, no. 10, pp , Oct [4] M. Li, B. Bougard, L. V. D. Perre, and F. Catthoor, Optimizing near-ml MIMO detector for SDR baseband on parallel programmable architectures, in Proc. of the conference on Design, automation and test in Europe, Munich, Germany, Mar [5] M. Siti and M. Fitz, Layered orthogonal lattice detector for two transmit antenna communications, in Proceedings of the Forty-Third Annual Allerton Conference on Communication, Control, and Computing, Sep , pp [6] A. Tomasoni, M. Siti, M. Ferrari, and S. Bellini, T-lord: a mapapproaching soft-input soft-output detector for iterative mimo receivers, in Proceedings of the IEEE GLOBECOM 2007, Nov , pp [7] M. Wu, Y. Sun, and J. R. Cavallaro, Reconfigurable real-time MIMO detector on GPU, in In IEEE 43rd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, Oct

8 [8] 3rd Generation Partnership Project (3GPP), [9] 3rd Generation Partnership Project (3GPP); Technical Specification Group Radio Access Network, Physical layer aspects for evolved UTRA (TR version (release 7)), 3rd Generation Partnership Project (3GPP), Tech. Rep., [10] 3rd Generation Partnership Project (3GPP), TSGR1#41 R , EUTRA downlink numerology, 3rd Generation Partnership Project (3GPP), Tech. Rep., [11] GPUReview, Tech. Rep., [12] T. R. Halfhill, Parallel processing with CUDA, Microprocessor, Jan [13] NVIDIA, Programming guide version 2.1, NVIDIA Corporation, Tech. Rep., [14], CUDA basics, NVIDIA Corporation, Tech. Rep., [15] M. Wu, Y. Sun, and J. R. Cavallaro, A GPU implementation of a realtime MIMO, in In IEEE Workshop on Signal Processing Systems, Oct. 2009, pp

Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems

Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems Implementation and Complexity Analysis of List Sphere Detector for MIMO-OFDM systems Markus Myllylä University of Oulu, Centre for Wireless Communications markus.myllyla@ee.oulu.fi Outline Introduction

More information

SELECTIVE SPANNING WITH FAST ENUMERATION DETECTOR IMPLEMENTATION REACHING LTE REQUIREMENTS

SELECTIVE SPANNING WITH FAST ENUMERATION DETECTOR IMPLEMENTATION REACHING LTE REQUIREMENTS 18th European Signal Processing Conference (EUSIPCO-2010) Aalborg, Denmark, August 23-27, 2010 SELECTIVE SPANNING WITH FAST ENUMERATION DETECTOR IMPLEMENTATION REACHING LTE REQUIREMENTS Jarmo Niskanen,

More information

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei

The Case for Optimum Detection Algorithms in MIMO Wireless Systems. Helmut Bölcskei The Case for Optimum Detection Algorithms in MIMO Wireless Systems Helmut Bölcskei joint work with A. Burg, C. Studer, and M. Borgmann ETH Zurich Data rates in wireless double every 18 months throughput

More information

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen

Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen GIGA seminar 11.1.2010 Detector Implementations Based on Software Defined Radio for Next Generation Wireless Systems Janne Janhunen janne.janhunen@ee.oulu.fi 2 Outline Introduction Benefits and Challenges

More information

ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM

ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM ASIC Implementation Comparison of SIC and LSD Receivers for MIMO-OFDM Johanna Ketonen, Markus Myllylä and Markku Juntti Centre for Wireless Communications P.O. Box 4500, FIN-90014 University of Oulu, Finland

More information

SIC AND K-BEST LSD RECEIVER IMPLEMENTATION FOR A MIMO-OFDM SYSTEM

SIC AND K-BEST LSD RECEIVER IMPLEMENTATION FOR A MIMO-OFDM SYSTEM AND K-BEST SD RECEIVER IMPEMENTATION FOR A MIMO-OFDM SYSTEM Johanna Ketonen and Markku Juntti Centre for Wireless Communications P.O. Box 500, FIN-900 University of Oulu, Finland {johanna.ketonen, markku.juntti}@ee.oulu.fi

More information

Massively Parallel Signal Processing for Wireless Communication Systems

Massively Parallel Signal Processing for Wireless Communication Systems Massively Parallel Signal Processing for Wireless Communication Systems Michael Wu, Guohui Wang, Joseph R. Cavallaro Department of ECE, Rice University Wireless Communication Systems Internet Information

More information

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver

A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver A WiMAX/LTE Compliant FPGA Implementation of a High-Throughput Low-Complexity 4x4 64-QAM Soft MIMO Receiver Vadim Smolyakov 1, Dimpesh Patel 1, Mahdi Shabany 1,2, P. Glenn Gulak 1 The Edward S. Rogers

More information

MODIFIED K-BEST DETECTION ALGORITHM FOR MIMO SYSTEMS

MODIFIED K-BEST DETECTION ALGORITHM FOR MIMO SYSTEMS VOL. 10, NO. 5, MARCH 015 ISSN 1819-6608 006-015 Asian Research Publishing Network (ARPN). All rights reserved. MODIFIED K-BES DEECION ALGORIHM FOR MIMO SYSEMS Shirly Edward A. and Malarvizhi S. Department

More information

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access

Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access NTT DoCoMo Technical Journal Vol. 8 No.1 Field Experiments of 2.5 Gbit/s High-Speed Packet Transmission Using MIMO OFDM Broadband Packet Radio Access Kenichi Higuchi and Hidekazu Taoka A maximum throughput

More information

Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems

Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems Flex-Sphere: An FPGA Configurable Sort-Free Sphere Detector For Multi-user MIMO Wireless Systems Kiarash Amiri, (Rice University, Houston, TX, USA; kiaa@riceedu); Chris Dick, (Advanced Systems Technology

More information

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU Seunghak Lee (HY-SDR Research Center, Hanyang Univ., Seoul, South Korea; invincible@dsplab.hanyang.ac.kr); Chiyoung Ahn (HY-SDR

More information

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi

1. Introduction. Noriyuki Maeda, Hiroyuki Kawai, Junichiro Kawamoto and Kenichi Higuchi NTT DoCoMo Technical Journal Vol. 7 No.2 Special Articles on 1-Gbit/s Packet Signal Transmission Experiments toward Broadband Packet Radio Access Configuration and Performances of Implemented Experimental

More information

II. FRAME STRUCTURE In this section, we present the downlink frame structure of 3GPP LTE and WiMAX standards. Here, we consider

II. FRAME STRUCTURE In this section, we present the downlink frame structure of 3GPP LTE and WiMAX standards. Here, we consider Forward Error Correction Decoding for WiMAX and 3GPP LTE Modems Seok-Jun Lee, Manish Goel, Yuming Zhu, Jing-Fei Ren, and Yang Sun DSPS R&D Center, Texas Instruments ECE Depart., Rice University {seokjun,

More information

MULTIPLE-INPUT multiple-output (MIMO) systems

MULTIPLE-INPUT multiple-output (MIMO) systems 3360 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 6, JUNE 2010 Performance Complexity Comparison of Receivers for a LTE MIMO OFDM System Johanna Ketonen, Student Member, IEEE, Markku Juntti, Senior

More information

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms

Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms Fixed-Point Aspects of MIMO OFDM Detection on SDR Platforms Daniel Guenther Chair ISS Integrierte Systeme der Signalverarbeitung June 27th 2012 Institute for Communication Technologies and Embedded Systems

More information

Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems

Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems Advanced Science and echnology Letters Vol. (ASP 06), pp.4- http://dx.doi.org/0.457/astl.06..4 Reduced Complexity of QRD-M Detection Scheme in MIMO-OFDM Systems Jong-Kwang Kim, Jae-yun Ro and young-kyu

More information

IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION

IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION IMPROVED QR AIDED DETECTION UNDER CHANNEL ESTIMATION ERROR CONDITION Jigyasha Shrivastava, Sanjay Khadagade, and Sumit Gupta Department of Electronics and Communications Engineering, Oriental College of

More information

FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver

FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver FPGA Prototyping of A High Data Rate LTE Uplink Baseband Receiver Guohui Wang, Bei Yin, Kiarash Amiri, Yang Sun, Michael Wu, Joseph R Cavallaro Department of Electrical and Computer Engineering Rice University,

More information

A High-Throughput VLSI Architecture for SC-FDMA MIMO Detectors

A High-Throughput VLSI Architecture for SC-FDMA MIMO Detectors A High-Throughput VLSI Architecture for SC-FDMA MIMO Detectors K.Keerthana 1, G.Jyoshna 2 M.Tech Scholar, Dept of ECE, Sri Krishnadevaraya University College of, AP, India 1 Lecturer, Dept of ECE, Sri

More information

Link Adaptation Technique for MIMO-OFDM systems with Low Complexity QRM-MLD Algorithm

Link Adaptation Technique for MIMO-OFDM systems with Low Complexity QRM-MLD Algorithm Link Adaptation Technique for MIMO-OFDM systems with Low Complexity QRM-MLD Algorithm C Suganya, SSanthiya, KJayapragash Abstract MIMO-OFDM becomes a key technique for achieving high data rate in wireless

More information

Performance Analysis of n Wireless LAN Physical Layer

Performance Analysis of n Wireless LAN Physical Layer 120 1 Performance Analysis of 802.11n Wireless LAN Physical Layer Amr M. Otefa, Namat M. ElBoghdadly, and Essam A. Sourour Abstract In the last few years, we have seen an explosive growth of wireless LAN

More information

Neha Pathak #1, Neha Bakawale *2 # Department of Electronics and Communication, Patel Group of Institution, Indore

Neha Pathak #1, Neha Bakawale *2 # Department of Electronics and Communication, Patel Group of Institution, Indore Performance evolution of turbo coded MIMO- WiMAX system over different channels and different modulation Neha Pathak #1, Neha Bakawale *2 # Department of Electronics and Communication, Patel Group of Institution,

More information

Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity

Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity Sphere Decoding in Multi-user Multiple Input Multiple Output with reduced complexity Er. Navjot Singh 1, Er. Vinod Kumar 2 Research Scholar, CSE Department, GKU, Talwandi Sabo, Bathinda, India 1 AP, CSE

More information

Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection

Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection Realization of Peak Frequency Efficiency of 50 Bit/Second/Hz Using OFDM MIMO Multiplexing with MLD Based Signal Detection Kenichi Higuchi (1) and Hidekazu Taoka (2) (1) Tokyo University of Science (2)

More information

A Sphere Decoding Algorithm for MIMO

A Sphere Decoding Algorithm for MIMO A Sphere Decoding Algorithm for MIMO Jay D Thakar Electronics and Communication Dr. S & S.S Gandhy Government Engg College Surat, INDIA ---------------------------------------------------------------------***-------------------------------------------------------------------

More information

Field Experiment on 5-Gbit/s Ultra-high-speed Packet Transmission Using MIMO Multiplexing in Broadband Packet Radio Access

Field Experiment on 5-Gbit/s Ultra-high-speed Packet Transmission Using MIMO Multiplexing in Broadband Packet Radio Access Fourth-Generation Mobile Communications MIMO High-speed Packet Transmission Field Experiment on 5-Gbit/s Ultra-high-speed Packet Transmission Using MIMO Multiplexing in Broadband Packet Radio Access An

More information

Reception for Layered STBC Architecture in WLAN Scenario

Reception for Layered STBC Architecture in WLAN Scenario Reception for Layered STBC Architecture in WLAN Scenario Piotr Remlein Chair of Wireless Communications Poznan University of Technology Poznan, Poland e-mail: remlein@et.put.poznan.pl Hubert Felcyn Chair

More information

h 11 h 12 h 12 h 22 h 12 h 22 (3) H = h 11 h12 h h 22 h 21 (7)

h 11 h 12 h 12 h 22 h 12 h 22 (3) H = h 11 h12 h h 22 h 21 (7) 17th European Signal Processing Conference (EUSIPCO 9) Glasgow, Scotland, August 24-28, 9 EVALUATION OF MIMO SYMBOL DETECTORS FOR 3GPP LTE TERMINALS Di Wu, Johan Eilert and Dake Liu Department of Electrical

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION High data-rate is desirable in many recent wireless multimedia applications [1]. Traditional single carrier modulation techniques can achieve only limited data rates due to the restrictions

More information

4x4 Time-Domain MIMO encoder with OFDM Scheme in WIMAX Context

4x4 Time-Domain MIMO encoder with OFDM Scheme in WIMAX Context 4x4 Time-Domain MIMO encoder with OFDM Scheme in WIMAX Context Mohamed.Messaoudi 1, Majdi.Benzarti 2, Salem.Hasnaoui 3 Al-Manar University, SYSCOM Laboratory / ENIT, Tunisia 1 messaoudi.jmohamed@gmail.com,

More information

Channel Estimation by 2D-Enhanced DFT Interpolation Supporting High-speed Movement

Channel Estimation by 2D-Enhanced DFT Interpolation Supporting High-speed Movement Channel Estimation by 2D-Enhanced DFT Interpolation Supporting High-speed Movement Channel Estimation DFT Interpolation Special Articles on Multi-dimensional MIMO Transmission Technology The Challenge

More information

K-Best Decoders for 5G+ Wireless Communication

K-Best Decoders for 5G+ Wireless Communication K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Department of Electrical and Computer Engineering Texas A&M University

More information

Channel Estimation for Downlink LTE System Based on LAGRANGE Polynomial Interpolation

Channel Estimation for Downlink LTE System Based on LAGRANGE Polynomial Interpolation Channel Estimation for Downlink LTE System Based on LAGRANGE Polynomial Interpolation Mallouki Nasreddine,Nsiri Bechir,Walid Hakimiand Mahmoud Ammar University of Tunis El Manar, National Engineering School

More information

LD-STBC-VBLAST Receiver for WLAN systems

LD-STBC-VBLAST Receiver for WLAN systems LD-STBC-VBLAST Receiver for WLAN systems PIOTR REMLEIN, HUBERT FELCYN Chair of Wireless Communications Poznan University of Technology Poznan, Poland e-mail: remlein@et.put.poznan.pl, hubert.felcyn@gmail.com

More information

Array Like Runtime Reconfigurable MIMO Detector for n WLAN:A design case study

Array Like Runtime Reconfigurable MIMO Detector for n WLAN:A design case study Array Like Runtime Reconfigurable MIMO Detector for 802.11n WLAN:A design case study Pankaj Bhagawat Rajballav Dash Gwan Choi Texas A&M University-CollegeStation Outline Background MIMO Detection as a

More information

Multiple Input Multiple Output (MIMO) Operation Principles

Multiple Input Multiple Output (MIMO) Operation Principles Afriyie Abraham Kwabena Multiple Input Multiple Output (MIMO) Operation Principles Helsinki Metropolia University of Applied Sciences Bachlor of Engineering Information Technology Thesis June 0 Abstract

More information

Improvement of the Throughput-SNR Tradeoff using a 4G Adaptive MCM system

Improvement of the Throughput-SNR Tradeoff using a 4G Adaptive MCM system , June 30 - July 2, 2010, London, U.K. Improvement of the Throughput-SNR Tradeoff using a 4G Adaptive MCM system Insik Cho, Changwoo Seo, Gilsang Yoon, Jeonghwan Lee, Sherlie Portugal, Intae wang Abstract

More information

Downlink Scheduling in Long Term Evolution

Downlink Scheduling in Long Term Evolution From the SelectedWorks of Innovative Research Publications IRP India Summer June 1, 2015 Downlink Scheduling in Long Term Evolution Innovative Research Publications, IRP India, Innovative Research Publications

More information

Wireless Networks: An Introduction

Wireless Networks: An Introduction Wireless Networks: An Introduction Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Cellular Networks WLAN WPAN Conclusions Wireless Networks:

More information

What s Behind 5G Wireless Communications?

What s Behind 5G Wireless Communications? What s Behind 5G Wireless Communications? Marc Barberis 2015 The MathWorks, Inc. 1 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile Broadband IoT

More information

PERFORMANCE ANALYSIS OF DOWNLINK MIMO IN 2X2 MOBILE WIMAX SYSTEM

PERFORMANCE ANALYSIS OF DOWNLINK MIMO IN 2X2 MOBILE WIMAX SYSTEM PERFORMANCE ANALYSIS OF DOWNLINK MIMO IN 2X2 MOBILE WIMAX SYSTEM N.Prabakaran Research scholar, Department of ETCE, Sathyabama University, Rajiv Gandhi Road, Chennai, Tamilnadu 600119, India prabakar_kn@yahoo.co.in

More information

MULTIPATH fading could severely degrade the performance

MULTIPATH fading could severely degrade the performance 1986 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 12, DECEMBER 2005 Rate-One Space Time Block Codes With Full Diversity Liang Xian and Huaping Liu, Member, IEEE Abstract Orthogonal space time block

More information

Performance Evaluation Of OFDM Based Wireless Communication Systems Using Graphics Processing Unit (GPU) Based High Performance Computing.

Performance Evaluation Of OFDM Based Wireless Communication Systems Using Graphics Processing Unit (GPU) Based High Performance Computing. Performance Evaluation Of OFDM Based Wireless Communication Systems Using Graphics Processing Unit (GPU) Based High Performance Computing. A Thesis submitted in partial fulfillment of the Requirements

More information

MIMO in 3G STATUS. MIMO for high speed data in 3G systems. Outline. Information theory for wireless channels

MIMO in 3G STATUS. MIMO for high speed data in 3G systems. Outline. Information theory for wireless channels MIMO in G STATUS MIMO for high speed data in G systems Reinaldo Valenzuela Wireless Communications Research Department Bell Laboratories MIMO (multiple antenna technologies) provides higher peak data rates

More information

A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems

A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems Gangarajaiah, Rakesh; Liu, Liang; Stala, Michal; Nilsson, Peter; Edfors, Ove 013 Link to publication Citation for published

More information

EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation

EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation EE359 Discussion Session 8 Beamforming, Diversity-multiplexing tradeoff, MIMO receiver design, Multicarrier modulation November 29, 2017 EE359 Discussion 8 November 29, 2017 1 / 33 Outline 1 MIMO concepts

More information

An FPGA 1Gbps Wireless Baseband MIMO Transceiver

An FPGA 1Gbps Wireless Baseband MIMO Transceiver An FPGA 1Gbps Wireless Baseband MIMO Transceiver Center the Authors Names Here [leave blank for review] Center the Affiliations Here [leave blank for review] Center the City, State, and Country Here (address

More information

Implementation of LS, MMSE and SAGE Channel Estimators for Mobile MIMO-OFDM

Implementation of LS, MMSE and SAGE Channel Estimators for Mobile MIMO-OFDM 1 Implementation of LS, MMSE and SAGE Channel Estimators for Mobile MIMO-OFDM Johanna Ketonen and Markku Juntti Jari Ylioinas Joseph R. Cavallaro Centre for Wireless Communications Nokia Siemens Networks

More information

AN EFFICIENT LINK PERFOMANCE ESTIMATION TECHNIQUE FOR MIMO-OFDM SYSTEMS

AN EFFICIENT LINK PERFOMANCE ESTIMATION TECHNIQUE FOR MIMO-OFDM SYSTEMS AN EFFICIENT LINK PERFOMANCE ESTIMATION TECHNIQUE FOR MIMO-OFDM SYSTEMS 1 K. A. Narayana Reddy, 2 G. Madhavi Latha, 3 P.V.Ramana 1 4 th sem, M.Tech (Digital Electronics and Communication Systems), Sree

More information

CUDA-Accelerated Satellite Communication Demodulation

CUDA-Accelerated Satellite Communication Demodulation CUDA-Accelerated Satellite Communication Demodulation Renliang Zhao, Ying Liu, Liheng Jian, Zhongya Wang School of Computer and Control University of Chinese Academy of Sciences Outline Motivation Related

More information

Performance Evaluation of STBC-OFDM System for Wireless Communication

Performance Evaluation of STBC-OFDM System for Wireless Communication Performance Evaluation of STBC-OFDM System for Wireless Communication Apeksha Deshmukh, Prof. Dr. M. D. Kokate Department of E&TC, K.K.W.I.E.R. College, Nasik, apeksha19may@gmail.com Abstract In this paper

More information

OFDMA PHY for EPoC: a Baseline Proposal. Andrea Garavaglia and Christian Pietsch Qualcomm PAGE 1

OFDMA PHY for EPoC: a Baseline Proposal. Andrea Garavaglia and Christian Pietsch Qualcomm PAGE 1 OFDMA PHY for EPoC: a Baseline Proposal Andrea Garavaglia and Christian Pietsch Qualcomm PAGE 1 Supported by Jorge Salinger (Comcast) Rick Li (Cortina) Lup Ng (Cortina) PAGE 2 Outline OFDM: motivation

More information

ELEC E7210: Communication Theory. Lecture 11: MIMO Systems and Space-time Communications

ELEC E7210: Communication Theory. Lecture 11: MIMO Systems and Space-time Communications ELEC E7210: Communication Theory Lecture 11: MIMO Systems and Space-time Communications Overview of the last lecture MIMO systems -parallel decomposition; - beamforming; - MIMO channel capacity MIMO Key

More information

3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication

3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication 3.2Gbps Channel-Adaptive Configurable MIMO Detector for Multi-Mode Wireless Communication Farhana Sheikh, Chia-Hsiang Chen, Dongmin Yoon, Borislav Alexandrov, Keith Bowman, * Anthony Chun, Hossein Alavi,

More information

Further Vision on TD-SCDMA Evolution

Further Vision on TD-SCDMA Evolution Further Vision on TD-SCDMA Evolution LIU Guangyi, ZHANG Jianhua, ZHANG Ping WTI Institute, Beijing University of Posts&Telecommunications, P.O. Box 92, No. 10, XiTuCheng Road, HaiDian District, Beijing,

More information

Technical Aspects of LTE Part I: OFDM

Technical Aspects of LTE Part I: OFDM Technical Aspects of LTE Part I: OFDM By Mohammad Movahhedian, Ph.D., MIET, MIEEE m.movahhedian@mci.ir ITU regional workshop on Long-Term Evolution 9-11 Dec. 2013 Outline Motivation for LTE LTE Network

More information

Comparison of BER for Various Digital Modulation Schemes in OFDM System

Comparison of BER for Various Digital Modulation Schemes in OFDM System ISSN: 2278 909X Comparison of BER for Various Digital Modulation Schemes in OFDM System Jaipreet Kaur, Hardeep Kaur, Manjit Sandhu Abstract In this paper, an OFDM system model is developed for various

More information

2015 The MathWorks, Inc. 1

2015 The MathWorks, Inc. 1 2015 The MathWorks, Inc. 1 What s Behind 5G Wireless Communications? 서기환과장 2015 The MathWorks, Inc. 2 Agenda 5G goals and requirements Modeling and simulating key 5G technologies Release 15: Enhanced Mobile

More information

Review on Improvement in WIMAX System

Review on Improvement in WIMAX System IJIRST International Journal for Innovative Research in Science & Technology Volume 3 Issue 09 February 2017 ISSN (online): 2349-6010 Review on Improvement in WIMAX System Bhajankaur S. Wassan PG Student

More information

WHITEPAPER MULTICORE SOFTWARE DESIGN FOR AN LTE BASE STATION

WHITEPAPER MULTICORE SOFTWARE DESIGN FOR AN LTE BASE STATION WHITEPAPER MULTICORE SOFTWARE DESIGN FOR AN LTE BASE STATION Executive summary This white paper details the results of running the parallelization features of SLX to quickly explore the HHI/ Frauenhofer

More information

Performance Analysis of WiMAX Physical Layer Model using Various Techniques

Performance Analysis of WiMAX Physical Layer Model using Various Techniques Volume-4, Issue-4, August-2014, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Available at: www.ijemr.net Page Number: 316-320 Performance Analysis of WiMAX Physical

More information

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs 5 th International Conference on Logic and Application LAP 2016 Dubrovnik, Croatia, September 19-23, 2016 Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs

More information

Iterative Soft Decision Based Complex K-best MIMO Decoder

Iterative Soft Decision Based Complex K-best MIMO Decoder Iterative Soft Decision Based Complex K-best MIMO Decoder Mehnaz Rahman Department of ECE Texas A&M University College Station, Tx- 77840, USA Gwan S. Choi Department of ECE Texas A&M University College

More information

An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems

An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems 9th International OFDM-Workshop 2004, Dresden 1 An Improved Detection Technique For Receiver Oriented MIMO-OFDM Systems Hrishikesh Venkataraman 1), Clemens Michalke 2), V.Sinha 1), and G.Fettweis 2) 1)

More information

1

1 sebastian.caban@nt.tuwien.ac.at 1 This work has been funded by the Christian Doppler Laboratory for Wireless Technologies for Sustainable Mobility and the Vienna University of Technology. Outline MIMO

More information

A Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors

A Complete Real-Time a Baseband Receiver Implemented on an Array of Programmable Processors A Complete Real-Time 802.11a Baseband Receiver Implemented on an Array of Programmable Processors ACSSC 2008 Pacific Grove, CA Anh Tran, Dean Truong and Bevan Baas VLSI Computation Lab, ECE Department,

More information

New Cross-layer QoS-based Scheduling Algorithm in LTE System

New Cross-layer QoS-based Scheduling Algorithm in LTE System New Cross-layer QoS-based Scheduling Algorithm in LTE System MOHAMED A. ABD EL- MOHAMED S. EL- MOHSEN M. TATAWY GAWAD MAHALLAWY Network Planning Dep. Network Planning Dep. Comm. & Electronics Dep. National

More information

SPACE TIME coding for multiple transmit antennas has attracted

SPACE TIME coding for multiple transmit antennas has attracted 486 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 3, MARCH 2004 An Orthogonal Space Time Coded CPM System With Fast Decoding for Two Transmit Antennas Genyuan Wang Xiang-Gen Xia, Senior Member,

More information

Comparative Study of the detection algorithms in MIMO

Comparative Study of the detection algorithms in MIMO Comparative Study of the detection algorithms in MIMO Ammu.I, Deepa.R. Department of Electronics and Communication, Amrita Vishwa Vidyapeedam,Ettimadai, Coimbatore, India. Abstract- Wireless communication

More information

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR

DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR DESIGN, IMPLEMENTATION AND OPTIMISATION OF 4X4 MIMO-OFDM TRANSMITTER FOR COMMUNICATION SYSTEMS Abstract M. Chethan Kumar, *Sanket Dessai Department of Computer Engineering, M.S. Ramaiah School of Advanced

More information

A physical layer simulator for WiMAX Marius Oltean 1, Maria Kovaci 1, Jamal Mountassir 2, Alexandru Isar 1, Petru Lazăr 2

A physical layer simulator for WiMAX Marius Oltean 1, Maria Kovaci 1, Jamal Mountassir 2, Alexandru Isar 1, Petru Lazăr 2 A physical layer simulator for WiMAX Marius Oltean 1, Maria Kovaci 1, Jamal Mountassir 2, Alexandru Isar 1, Petru Lazăr 2 Abstract A physical layer simulator for the WiMAX technology is presented in this

More information

Comparison of MIMO OFDM System with BPSK and QPSK Modulation

Comparison of MIMO OFDM System with BPSK and QPSK Modulation e t International Journal on Emerging Technologies (Special Issue on NCRIET-2015) 6(2): 188-192(2015) ISSN No. (Print) : 0975-8364 ISSN No. (Online) : 2249-3255 Comparison of MIMO OFDM System with BPSK

More information

Capacity Enhancement in WLAN using

Capacity Enhancement in WLAN using 319 CapacityEnhancementinWLANusingMIMO Capacity Enhancement in WLAN using MIMO K.Shamganth Engineering Department Ibra College of Technology Ibra, Sultanate of Oman shamkanth@ict.edu.om M.P.Reena Electronics

More information

Xiao Yang 1 The Institute of Microelectronics, Tsinghua University, Beijing,100084, China

Xiao Yang 1 The Institute of Microelectronics, Tsinghua University, Beijing,100084, China Inversion Selection Method for Linear Data Detection in the Massive Multiple Input Multiple Output Uplink with Reconfigurable Implementation Results 1 The Institute of Microelectronics, Tsinghua University,

More information

Mehnaz Rahman Gwan S. Choi. K-Best Decoders for 5G+ Wireless Communication

Mehnaz Rahman Gwan S. Choi. K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication K-Best Decoders for 5G+ Wireless Communication Mehnaz Rahman Gwan S. Choi K-Best Decoders for 5G+ Wireless Communication Mehnaz

More information

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links DLR.de Chart 1 GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links Chen Tang chen.tang@dlr.de Institute of Communication and Navigation German Aerospace Center DLR.de Chart

More information

IMPLEMENTATION OF ADVANCED TWO-DIMENSIONAL INTERPOLATION-BASED CHANNEL ESTIMATION FOR OFDM SYSTEMS

IMPLEMENTATION OF ADVANCED TWO-DIMENSIONAL INTERPOLATION-BASED CHANNEL ESTIMATION FOR OFDM SYSTEMS IMPLEMENTATION OF ADVANCED TWO-DIMENSIONAL INTERPOLATION-BASED CHANNEL ESTIMATION FOR OFDM SYSTEMS Chiyoung Ahn, Hakmin Kim, Yusuk Yun and Seungwon Choi HY-SDR Research Center, Hanyang University, Seoul,

More information

OFDMA and MIMO Notes

OFDMA and MIMO Notes OFDMA and MIMO Notes EE 442 Spring Semester Lecture 14 Orthogonal Frequency Division Multiplexing (OFDM) is a digital multi-carrier modulation technique extending the concept of single subcarrier modulation

More information

UNDERSTANDING LTE WITH MATLAB

UNDERSTANDING LTE WITH MATLAB UNDERSTANDING LTE WITH MATLAB FROM MATHEMATICAL MODELING TO SIMULATION AND PROTOTYPING Dr Houman Zarrinkoub MathWorks, Massachusetts, USA WILEY Contents Preface List of Abbreviations 1 Introduction 1.1

More information

Hardware implementation of Zero-force Precoded MIMO OFDM system to reduce BER

Hardware implementation of Zero-force Precoded MIMO OFDM system to reduce BER Hardware implementation of Zero-force Precoded MIMO OFDM system to reduce BER Deepak Kumar S Nadiger 1, Meena Priya Dharshini 2 P.G. Student, Department of Electronics & communication Engineering, CMRIT

More information

Physical Layer Frame Structure in 4G LTE/LTE-A Downlink based on LTE System Toolbox

Physical Layer Frame Structure in 4G LTE/LTE-A Downlink based on LTE System Toolbox IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 1, Issue 3, Ver. IV (May - Jun.215), PP 12-16 www.iosrjournals.org Physical Layer Frame

More information

Iterative Detection and Decoding with PIC Algorithm for MIMO-OFDM Systems

Iterative Detection and Decoding with PIC Algorithm for MIMO-OFDM Systems , 2009, 5, 351-356 doi:10.4236/ijcns.2009.25038 Published Online August 2009 (http://www.scirp.org/journal/ijcns/). Iterative Detection and Decoding with PIC Algorithm for MIMO-OFDM Systems Zhongpeng WANG

More information

CHANNEL ESTIMATION FOR LTE UPLINK SYSTEM BY PERCEPTRON NEURAL NETWORK

CHANNEL ESTIMATION FOR LTE UPLINK SYSTEM BY PERCEPTRON NEURAL NETWORK CHANNEL ESTIMATION FOR LTE UPLINK SYSTEM BY PERCEPTRON NEURAL NETWORK A. Omri 1, R. Bouallegue 2, R. Hamila 3 and M. Hasna 4. 1 and 2 Laboratory 6 Tel @ Higher School of Telecommunication of Tunis. 1 omriaymen@qu.edu.qa,

More information

Planning of LTE Radio Networks in WinProp

Planning of LTE Radio Networks in WinProp Planning of LTE Radio Networks in WinProp AWE Communications GmbH Otto-Lilienthal-Str. 36 D-71034 Böblingen mail@awe-communications.com Issue Date Changes V1.0 Nov. 2010 First version of document V2.0

More information

Multiple-Input Multiple-Output OFDM with Index Modulation Using Frequency Offset

Multiple-Input Multiple-Output OFDM with Index Modulation Using Frequency Offset IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 3, Ver. I (May.-Jun. 2017), PP 56-61 www.iosrjournals.org Multiple-Input Multiple-Output

More information

Carrier Frequency Offset Estimation Algorithm in the Presence of I/Q Imbalance in OFDM Systems

Carrier Frequency Offset Estimation Algorithm in the Presence of I/Q Imbalance in OFDM Systems Carrier Frequency Offset Estimation Algorithm in the Presence of I/Q Imbalance in OFDM Systems K. Jagan Mohan, K. Suresh & J. Durga Rao Dept. of E.C.E, Chaitanya Engineering College, Vishakapatnam, India

More information

Research Article Application-Specific Instruction Set Processor Implementation of List Sphere Detector

Research Article Application-Specific Instruction Set Processor Implementation of List Sphere Detector Hindawi Publishing Corporation EURASIP Journal on Embedded Systems Volume 2007, Article ID 54173, 14 pages doi:10.1155/2007/54173 Research Article Application-Specific Instruction Set Processor Implementation

More information

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads

CUDA Threads. Terminology. How it works. Terminology. Streaming Multiprocessor (SM) A SM processes block of threads Terminology CUDA Threads Bedrich Benes, Ph.D. Purdue University Department of Computer Graphics Streaming Multiprocessor (SM) A SM processes block of threads Streaming Processors (SP) also called CUDA

More information

Enhanced SIC and Initial Guess ML Receivers for Collaborative MIMO of the LTE Uplink

Enhanced SIC and Initial Guess ML Receivers for Collaborative MIMO of the LTE Uplink Enhanced SIC and Initial Guess ML Receivers for Collaborative MIMO of the LTE Uplink Karim A. Banawan Electrical Engineering Department Faculty of Engineering, Alexandria University Alexandria, Egypt karimbanawan@yahoo.com

More information

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114

IJESRT. Scientific Journal Impact Factor: (ISRA), Impact Factor: 2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY PERFORMANCE IMPROVEMENT OF CONVOLUTION CODED OFDM SYSTEM WITH TRANSMITTER DIVERSITY SCHEME Amol Kumbhare *, DR Rajesh Bodade *

More information

Performance Analysis of Concatenated RS-CC Codes for WiMax System using QPSK

Performance Analysis of Concatenated RS-CC Codes for WiMax System using QPSK Performance Analysis of Concatenated RS-CC Codes for WiMax System using QPSK Department of Electronics Technology, GND University Amritsar, Punjab, India Abstract-In this paper we present a practical RS-CC

More information

BER Performance of CRC Coded LTE System for Various Modulation Schemes and Channel Conditions

BER Performance of CRC Coded LTE System for Various Modulation Schemes and Channel Conditions Scientific Research Journal (SCIRJ), Volume II, Issue V, May 2014 6 BER Performance of CRC Coded LTE System for Various Schemes and Conditions Md. Ashraful Islam ras5615@gmail.com Dipankar Das dipankar_ru@yahoo.com

More information

LDPC Coded OFDM with Alamouti/SVD Diversity Technique

LDPC Coded OFDM with Alamouti/SVD Diversity Technique LDPC Coded OFDM with Alamouti/SVD Diversity Technique Jeongseok Ha, Apurva. Mody, Joon Hyun Sung, John R. Barry, Steven W. McLaughlin and Gordon L. Stüber School of Electrical and Computer Engineering

More information

Configurable Joint Detection Algorithm for MIMO Wireless Communication System

Configurable Joint Detection Algorithm for MIMO Wireless Communication System Configurable Joint Detection Algorithm for MIMO Wireless Communication System 1 S.Divyabarathi, 2 N.R.Sivaraaj, 3 G.Kanagaraj 1 PG Scholar, Department of VLSI, AVS Engineering College, Salem, Tamilnadu,

More information

SOFTWARE IMPLEMENTATION OF THE

SOFTWARE IMPLEMENTATION OF THE SOFTWARE IMPLEMENTATION OF THE IEEE 802.11A/P PHYSICAL LAYER SDR`12 WInnComm Europe 27 29 June, 2012 Brussels, Belgium T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni Advanced System Technologies STMicroelectronics,

More information

A Novel of Low Complexity Detection in OFDM System by Combining SLM Technique and Clipping and Scaling Method Jayamol Joseph, Subin Suresh

A Novel of Low Complexity Detection in OFDM System by Combining SLM Technique and Clipping and Scaling Method Jayamol Joseph, Subin Suresh A Novel of Low Complexity Detection in OFDM System by Combining SLM Technique and Clipping and Scaling Method Jayamol Joseph, Subin Suresh Abstract In order to increase the bandwidth efficiency and receiver

More information

DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN,

DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, DSP Design in Wireless Communication LIANG LIU AND FREDRIK EDMAN, LIANG.LIU@EIT.LTH.SE Data Rate The Evolving Wireless Scene More bit/($ nj) More bit/sec 100Mb 10Mb 1Mb 100Kb 10Kb 1Kb 802.1a 802.11 (LAN)

More information

10 Gbps Outdoor Transmission Experiment for Super High Bit Rate Mobile Communications

10 Gbps Outdoor Transmission Experiment for Super High Bit Rate Mobile Communications Super High Bit Rate Mobile Communication MIMO-OFDM Outdoor Transmission Experiment 10 Gbps Outdoor Transmission Experiment for Super High Bit Rate Mobile Communications To further increase transmission

More information

2. LITERATURE REVIEW

2. LITERATURE REVIEW 2. LITERATURE REVIEW In this section, a brief review of literature on Performance of Antenna Diversity Techniques, Alamouti Coding Scheme, WiMAX Broadband Wireless Access Technology, Mobile WiMAX Technology,

More information