Parallel Programming Design of BPSK Signal Demodulation Based on CUDA

Similar documents
Research on DQPSK Carrier Synchronization based on FPGA

Exploration of Digital Frequency Band System

Analog and Telecommunication Electronics

CUDA-Accelerated Satellite Communication Demodulation

Open Access On Improving the Time Synchronization Precision in the Electric Power System. Qiang Song * and Weifeng Jia

GMSK iterative carrier recovery for AIS burst-mode

Realization of Programmable BPSK Demodulator-Bit Synchronizer using Multirate Processing

A design method for digital phase-locked loop Ru Jiyuan1,a Liu Yujia2,b and Xue Wei 3,c

Open Access Implementation of PSK Digital Demodulator with Variable Rate Based on FPGA

THIS work focus on a sector of the hardware to be used

Design of Spread-Spectrum Communication System Based on FPGA

VLSI Implementation of Digital Down Converter (DDC)

Carrier Phase Recovery. EE3723 : Digital Communications. Synchronization. Carrier Phase Recovery. Carrier Phase Synchronization Techniques.

Channelization and Frequency Tuning using FPGA for UMTS Baseband Application

Outline. Communications Engineering 1

Implementation of Digital Signal Processing: Some Background on GFSK Modulation

Code No: R Set No. 1

B SCITEQ. Transceiver and System Design for Digital Communications. Scott R. Bullock, P.E. Third Edition. SciTech Publishing, Inc.

Design and FPGA Implementation of an Adaptive Demodulator. Design and FPGA Implementation of an Adaptive Demodulator

Design and performance of LLRF system for CSNS/RCS *

Designing the Fox-1E PSK Modulator and FoxTelem demodulator

Analysis of Co-channel Interference in Rayleigh and Rician fading channel for BPSK Communication using DPLL

Phase-Locked Loops. Roland E. Best. Me Graw Hill. Sixth Edition. Design, Simulation, and Applications

Study on the UWB Rader Synchronization Technology

Comparison of ML and SC for ICI reduction in OFDM system

Digital Modulation Schemes

German Jordanian University Department of Communication Engineering Digital Communication Systems Lab. CME 313-Lab

B.Tech II Year II Semester (R13) Supplementary Examinations May/June 2017 ANALOG COMMUNICATION SYSTEMS (Electronics and Communication Engineering)

A Software Implemented Spread Spectrum Modem based on two TMS320C50 DSPs

Realization of 16-channel digital PGC demodulator for fiber laser sensor array

Analog and Telecommunication Electronics

Research and Implementation of 2x2 MIMO-OFDM System with BLAST Using USRP-RIO

Revision of Previous Six Lectures

Amplitude Frequency Phase

Optimized BPSK and QAM Techniques for OFDM Systems

QAM in Software Defined Radio for Vehicle Safety Application

Adaptive Modulation with Customised Core Processor

Amplitude and Phase Distortions in MIMO and Diversity Systems

Design and Implementation of Signal Processor for High Altitude Pulse Compression Radar Altimeter

Digital modulation techniques

Performance Evaluation of STBC-OFDM System for Wireless Communication

DATA INTEGRATION MULTICARRIER REFLECTOMETRY SENSORS

Iterative Detection and Decoding with PIC Algorithm for MIMO-OFDM Systems

Internal Examination I Answer Key DEPARTMENT OF CSE & IT. Semester: III Max.Marks: 100

OFDM Systems For Different Modulation Technique

LOW DATA RATE BPSK DEMODULATION IN PRESENCE OF DOPPLER

A new fully-digital HF radar system for oceanographical remote sensing

The Measurement and Analysis of Bluetooth Signal RF Lu GUO 1, Jing SONG 2,*, Si-qi REN 2 and He HUANG 2

DEVELOPMENT OF SOFTWARE RADIO PROTOTYPE

Design of Adjustable Reconfigurable Wireless Single Core

Lecture 12. Carrier Phase Synchronization. EE4900/EE6720 Digital Communications

BER Performance Comparison between QPSK and 4-QA Modulation Schemes

Communication Channels

Citation Wireless Networks, 2006, v. 12 n. 2, p The original publication is available at

- 1 - Rap. UIT-R BS Rep. ITU-R BS.2004 DIGITAL BROADCASTING SYSTEMS INTENDED FOR AM BANDS

Fundamentals of Digital Communication

Multi-GI Detector with Shortened and Leakage Correlation for the Chinese DTMB System. Fengkui Gong, Jianhua Ge and Yong Wang

IMPLEMENTATION OF SOFTWARE-BASED 2X2 MIMO LTE BASE STATION SYSTEM USING GPU

Chapter 4. Part 2(a) Digital Modulation Techniques

Appendix B. Design Implementation Description For The Digital Frequency Demodulator

The Loss of Down Converter for Digital Radar receiver

PLL simulation. Prepared by: Qian Wang Spinlab,Worcester Polytechnic Institute. Version 1.0

EXPERIMENT WISE VIVA QUESTIONS

Keywords: CIC Filter, Field Programmable Gate Array (FPGA), Decimator, Interpolator, Modelsim and Chipscope.

Wireless Communication Fading Modulation

Problems from the 3 rd edition

Performance Evaluation of ½ Rate Convolution Coding with Different Modulation Techniques for DS-CDMA System over Rician Channel

QUESTION BANK SUBJECT: DIGITAL COMMUNICATION (15EC61)

BIT SYNCHRONIZERS FOR PSK AND THEIR DIGITAL IMPLEMENTATION

Master Degree in Electronic Engineering

AN FPGA IMPLEMENTATION OF ALAMOUTI S TRANSMIT DIVERSITY TECHNIQUE

Design and Implementation of the DAB/DMB Transmitter Identification Information Decoder

STUDY OF A NEW PHASE DETECTOR BASED ON CMOS

Lecture 9: Spread Spectrum Modulation Techniques

Bit Error Rate Assessment of Digital Modulation Schemes on Additive White Gaussian Noise, Line of Sight and Non Line of Sight Fading Channels

Digital Signal Processing. VO Embedded Systems Engineering Armin Wasicek WS 2009/10

TELECOMMUNICATION SATELLITE TELEMETRY TRACKING AND COMMAND SUB-SYSTEM

A DSP IMPLEMENTED DIGITAL FM MULTIPLEXING SYSTEM

Digital Signal Processing Techniques

Department of Electronic and Information Engineering. Communication Laboratory

Design and Implementation of GNSS Disciplined Clock Based on Unbiased FIR Filter

BPSK Modulator and Demodulator

Signal Processing Toolbox

QPSK Modulation and Demodulation

Simplified, high performance transceiver for phase modulated RFID applications

Implementation and Performance Evaluation of a Fast Relocation Method in a GPS/SINS/CSAC Integrated Navigation System Hardware Prototype

Receiver Architectures

Lecture 3: Wireless Physical Layer: Modulation Techniques. Mythili Vutukuru CS 653 Spring 2014 Jan 13, Monday

Optical Coherent Receiver Analysis

DIGITAL COMMUNICATIONS SYSTEMS. MSc in Electronic Technologies and Communications

An Efficient Method of Computation for Jammer to Radar Signal Ratio in Monopulse Receivers with Higher Order Loop Harmonics

An improved optical costas loop PSK receiver: Simulation analysis

GPS software receiver implementations


Chapter 3 Communication Concepts

A Faded-Compensation Technique for Digital Land Mobile Satellite Systems

Digital Communication

8.5 Modulation of Signals

Lesson 7. Digital Signal Processors

The Application of Clock Synchronization in the TDOA Location System Ziyu WANG a, Chen JIAN b, Benchao WANG c, Wenli YANG d

Transcription:

Int. J. Communications, Network and System Sciences, 216, 9, 126-134 Published Online May 216 in SciRes. http://www.scirp.org/journal/ijcns http://dx.doi.org/1.4236/ijcns.216.9511 Parallel Programming Design of BPSK Signal Demodulation Based on CUDA Yandu Liu, Baoling Zhang, Haixin Zheng Equipment Academy, Beijing, China Received 12 April 216; accepted 24 May 216; published 3 May 216 Abstract Realizing digital signal demodulation on the general computer is an important research direction in the field of signal processing in recent years. In this paper, the algorithm of BPSK signal demodulation which has high real-time requirements is researched on the general computer. According to the characteristics of CPU + GPU heterogeneous computing, the parallel computation model of digital communication is put forward, and BPSK signal demodulation is realized on CUDA platform. Test results show that the computing time ratio of 1:1.7, when Eb N = 9.6dB the bit error rate can be achieved 1 5. Keywords BPSK, Demodulation, CUDA, Parallel 1. Introduction In recent years, with the constant improvement of the general computer performance, experienced from hardware platform towards digital platform of software radio technology, the platform of digital signal processing in communication system is beginning to change the direction of development. The signal after the A/D directly complete real-time processing in pure software processing way based on general computer platform. Digital Phase modulation, namely Phase Shift Keying (Phase Shift Keying, PSK), is a very important basic digital modulation technology, which using carrier Phase modulation technique information to express input signal. Under the condition of stability channel, phase shift keying compared with amplitude shift keying, frequency shift keying, not only has high noise resistance, but also can effectively use band, even in a phenomenon of fading and multipath channel also has a good effect [1]. Therefore, BPSK is a kind of excellent modulation method, and in medium and high speed data transmission has been widely applied. This paper is based on CPU + GPU heterogeneous platform, the real-time BPSK signal demodulation algorithm and the method based on CUDA parallel programs are researched. In view of the implementation, parallel programming test verify the feasibility of the system. 2. BPSK Signal Demodulation Algorithm By multiple BPSK signal is coherent demodulation method based on phase lock loop, such as square ring me- How to cite this paper: Liu, Y.D., Zhang, B.L. and Zheng, H.X. (216) Parallel Programming Design of BPSK Signal Demodulation Based on CUDA. Int. J. Communications, Network and System Sciences, 9, 126-134. http://dx.doi.org/1.4236/ijcns.216.9511

thod, decision feedback method, Costas loop method, etc. The differential demodulation method which use adjacent element phase jump is also used [2]. Although differential demodulation does not need to obtain coherent carrier, the algorithm is relatively simple, but its anti-noise performance significantly worse in the coherent demodulation. As the GPU is widely used in signal processing, coherent demodulation which has excellent performance is easy to implement. Costas loop is the most widely used suppressed carrier tracking loop in engineering, literature [3] prove its track suppress carrier signal with low SNR is the best device, its structure as shown in Figure 1. The input BPSK modulation signal is [4]: ( ) = ( ) cos ω + θ ( ) Here, mt ( ) is digital modulation signals; ( ) respectively are: st mt ct 1 t (1) = ang ( t nts) cos ωct + θ1 ω t is carrier angular frequency. The local oscillator output c vq = cos ωct + θ2 vi = sin ωct + θ2 Here, ω is variable frequency signal produced by the local oscillator, θ ( ) and ( ) c phase. After under orthogonal frequency conversion, the output is: 1 t 2 t (2) θ are reference zq = K p1 ang ( t nts) sin ωct + θ1 cos ωct + θ2 (3) zi = K p2 ang ( t nts) sin ωct + θ1 sin ωct + θ2 make θe = θ1 θ2, then Kp 1, K p2 is multiplication coefficient, after low pass ing: 1 yq = K p1k11 ( ) sin ( ) 2 ang t nts θ e t (4) 1 yi = K p2k12 ang ( t nts) cos θ e 2 Here, K11, K 12 is low pass coefficient. The result after ing, and the in-phase and orthogonal branch phase discrimination is: zq Lowpass yq s vq local oscillator vc Loop vd 9 phase shift vi zi Lowpass yi Figure 1. Costas loop structure. 127

1 vc = KpKp 1Kp2K11K12 sin 2 e( ) 8 θ t = Kd sin 2 θe K p is gain of phase discrimination, K d is loop gain, the output of loop is error signal for tracking θ e. According to the principle of coherent demodulation, extracted coherent carrier multiplied by the input of the modulated signal directly, and ing the output, baseband signal waveform can be got (Figure 2). And it can follow 25 KHZ dynamic Doppler (Figure 3). (5).8.6 The original signal Signals after demodulation.4 Amplitude(V).2 -.2 -.4 -.6 -.8 5 1 15 2 25 time(µs) Figure 2. BPSK signals after demodulation. 4 x 14 3.5 3 2.5 Doppler(Hz) 2 1.5 1.5 5 1 15 2 25 3 35 4 frames Figure 3. Follow dynamic doppler. 128

3. The Parallel Computing Model Based on CUDA 3.1. CUDA Launched by NVIDIA, CUDA is a kind of general parallel computing architecture, initial designed to speed up image real-time processing which run on the GPU development platform and full use of GPU s high memory bandwidth and very large scale of floating point calculation unit. It can handle large parallel problems, especially large-scale floating point data computing [5]. CUDA hardware architecture as shown in Figure 4. GPU is specially designed for the intensive and high parallelism computation, so calculation of the design will therefore more transistors used in data processing rather than data caching and flow control. In particular, the GPU is very suitable for processing the same program on multiple data parallel execution problem, so in CUDA platform is more suited for digital signal processing. 3.2. Parallel Computing Model Parallel computing is treated with multiple core to solve the problem at the same time. For digital signal processing which has a high requirement of real-time, parallel computing is the effective way to improve the real-time performance. Currently, the most widely used parallel computing model is a layered model which consists of three layers [6] [7]. Parallel Algorithm Design Layer: abstracted the calculation parameters of from different parallel computers, parallel algorithm design model is established, this layer mainly oriented algorithm researchers. Parallel Programming Design Layer: according software and hardware interface, using parallel programming language programming to achieve specific parallel algorithm, this layer is mainly oriented program designers. Parallel Program Execution Layer: under the system supports parallel machine compiler running target code, and the actual performance of the optimization procedure (Figure 5). According to the GPU hardware design characteristics, CUDA in layer parallel algorithm design has made a more detailed. Model assumes that the CUDA thread in physically separate GPUs execute, GPU as host coprocessor, adopt heterogeneous parallel mode, parallel computing program execute on GPU kernel, and the rest of the program execute on the CPU. And the research category of parallel program execution is a compiler, therefore, this paper mainly studies the parallel programming problem. 3.3. Digital Communication Parallel Computing Model Parallel algorithm is the core issue of parallel programming, and algorithm belongs to numerical parallel algorithm of digital communication system. Its design method is generally has two kinds: 1) direct parallelization of serial algorithm. Fully exploiting and utilization of the existing serial algorithm of parallelism, directly to the serial algorithm for parallel algorithm; 2) based on calculation and numerical calculation principle, does not take into account the corresponding serial algorithm, redesigned to parallel algorithm [8]. Memory Meory CPU DRAM GPU DRAM ALU ALU Cache ALU ALU l o r t n o C Figure 4. CUDA Architecture. 129

Digital communication system has a high modular degree and large amount of calculation which typical structure as shown in Figure 6. Because GPU device cannot display, data needs to be interacted between memory and memory by PCIe bus. And restricted to general computer speed limit, in the large-scale numerical calculation, the data transmission time occupy most of the program execution time. Figure 7 shows under different scale of data parallel computation time, the data size is small, transferred time almost occupied more than 99% of the program total execution time. Therefore, only when calculating the larger scale, to reflect the advantage of GPU computation. According to this characteristic of CUDA platform, parallel computing model of digital communication system should try to reduce the data transmission, give full play to the GPU high-performance computing ability. At the program beginning, the data need to be deal with should all transfer to memory of GPU. All the mass calculation performed by GPU. CPU and GPU in the process of program execution, only a small amount of data transmission, the CPU only run small calculation and data monitoring and display function (Figure 8). 4. BPSK Signal Demodulation Parallel Programming 4.1. BPSK Signal Demodulation Algorithm Structure According to Section 3.3 of the parallel computing model and Costas loop demodulation structure, parallel BPSK demodulation algorithm are shown in Figure 9. Intermediate frequency sampling data read and transferred by CPU to the GPU, completed the functions of digital orthogonal frequency conversion, low-pass ing, bit synchronization, phase detector, loop and decoding in the GPU. The phase error signal ed by loop transfer back to CPU to compute doppler PE PE PE PE Parallel algorithm design model Parallel programming model Parallel program execution model PE PE PE PE Figure 5. Parallel computing model. Orthogonal downconversion Lowpass Demodulation Decode Figure 6. Digital communication model. 13

45 4 The program execution time Data transfer time 35 µs 3 ) Computing Time 25 2 15 1 ( 5 1 5 1 6 1 7 1 8 The data size Figure 7. Computing time compared with transmission time. GPU Function1 Function2 Function3 Function4 The data transfer A small amount of data transfer The data transfer CPU Data monitoring Data display Figure 8. Parallel computing model of digital communication. Device Orthogonal downconversion NCO Lowpass Code synchronous Phase Discrimination Loop Decode Host Read data and transfer Computing doppler frequency BER statistics Save and display Figure 9. Parallel computing model of BPSK single demodulation. 131

frequency shift. Then the doppler frequency shift transfer to the GPU again to correct the output sine and cosine waveform produced by NCO. Lastly, the data decoded by GPU transfer back to the host and statistical BER. 4.2. The Mixer Design Mixer convert the signal from the intermediate frequency to fundamental frequency, which is the core of the software defined radio. Numerical control oscillator (NCO) is usually used to produce local hardware digital carrier for mixing. When programming parallel mixing programme, the corresponding data points with the corresponding phase of the sine and cosine waveform sampling points to do multiplication, application pseudo code is as follows: 4.3. The FIR Filter Design FIR is widely used for its good group delay in the digital communication system, it can ensure any amplitude frequency characteristics of strict linear phase frequency characteristics at the same time. It has a finite impulse response at the same time. Finite length for M FIR transfer function for: H z M k = h k z (6) k= ( ) ( ) In the time domain, the limited impulse to the corresponding input and output M ( ) = ( ) ( ) y n h k x n k (7) i= The parallel application pseudo code is as follows: 4.4. The Phase Discriminator Design Phase discriminator is mainly done to identify the function of the input signal is differ, is the key to the phase lock loop (PLL), in parallel programming, rely mainly on solving the sample point difference before and after, 132

application pseudo code is as follows: 5. Conclusion Test hardware platform selected Tesla K2 graphics. The size of input data is 1 ms analog data. The test computation time is within 1.7 ms, as shown in Figure 1, the program can correct demodulation of the original data, BER statistics as shown in Figure 11..8.6 The original baseband signal The signals after GPU demodulation.4 Amplitude(V).2 -.2 -.4 -.6 -.8 2 4 6 8 1 12 14 16 18 Time(µs) Figure 1. Signals after GPU demodulation. 1-1 BER 1-2 1-3 1-4 X: 9 Y: 6e-5 1-5 1-6 1-7 2 4 6 8 1 12 Figure 11. BER statistics. Eb/N 133

Realizing BPSK signal demodulation on general computer platform, reducing the difficulty of system design, development and cost. And the software of processing way increasing the flexibility of the system by loading different software can realize more functions. Through hardware upgrades and reorganization, the system performance can be further improved [9]-[11]. Based on general computer platform, especially the digital signal processing based on CUDA is an important development direction of the signal processing, but also a new trend of computer application and new research areas. References [1] Riter, S. (1969) An Optimum Phase Reference Detector for Fully Modulated Phase Shift Keyed Signal. IEEE AES-5, 4. [2] Core, M.T. and Tan, H.H. (22) BER for Optical Heterodyne DPSK Receivers Using Delay Demodulation and Integration Detection. IEEE Transactions on Communications, 5. [3] LI., G.X., An, Z.Q. and Yuan, S.J. (28) Study on Software Demodulation of DQPSK Signal Based on Digital Phase Measurement. Journal of Spacecraft TT&C Technology, 27. [4] Mitra, S.K. (21) Digital Signal Processing, A Computer-Based Approach. 2nd Edition, MeGraw-Hill Companies, Inc. [5] NVIDIA CUDA Programming Guide 5.. http://www.nvidia.com/object/cuda_develop.html [6] Sankaralingam, K., Keckler, S.W., Mark, W.R. and Burger, D. Universal Mechanisms for Data-Parallel Architectures. 36th Annual International Symposium on Microarchitecture. [7] Chen, G.L., Sun, G.Z., Xu, Y. and Lu, M. (28) Methodology of Research on Parallel Algorithms. Chinese Journal of Computers, 31. [8] Chen, Y., Wang, Y.Q. and Liu, Y. (211) Research on the Technology of Software Space TTC System Based on Computer Platform. The Measurement and Control Technology, 3. [9] Bose, V.G. (1999) Design and Implementation of Software Radios Using a General Purpose Processor. Ph.D. Thesis, Massachusetts Institute of Technology. [1] Bose, V.G. and Morris, R. (21) Dynamic Physical Layers for Wireless Networks Using Software Radio. International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT May 21. http://dx.doi.org/1.119/icassp.21.94393 [11] Vaudtabatgabrn, P.P. (199) Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications. Proceedings of the IEEE, 78, 56-93 134