Porting the 802.11p receiver on the ExpressMIMO Platform (LabSession OAI 2)
Introduction and Motivation OpenAirInterface Platform: Protoype Design for Software Defined Radio (SDR) Applications Support of a wide range of different wireless communication standards like UMTS, WLAN 802.11 a/g/p, WiMAX, GSM,... Design can easily be adapted to future standards like LTS Simultaneous (multimodal) processing of different standards Software Model to Emulate all processes on the OpenAirInterface Platform Transceiver Validation in a pure SW environment
What we will show you in this Lab 1) 802.11p receiver emulation in a pure SW environment using the library for ExpressMIMO baseband (libembb) 2) 802.11p receiver demonstration on the real HW platform
Transceiver Mapping - steps to follow 1. Emulation of the transceiver using the Library for ExpressMIMO baseband (libembb) Functional verification of the transceiver code in a pure SW environment (C/C++) Bit-accurate representation of the real DSP engines Synchronous execution of the DSPs Identification of the required DSPs and design of algorithms that can be executed by the available ressources First runtime analysis considering the pure processing time of the DSP engines Can the standard be processed in realtime? Identification of bottlenecks
Transceiver Mapping - steps to follow 1. 2. Cycle accurate HW/SW co-simulation (Modelsim) Only appropriate for standards with a short packet / frame length Performance improvement of latency critical standards by using execution parallelism on the platform o parallel processing of DSP engines o programming of the next operation while the previous one is not finished o DMA transfers in parallel to IP processing More realistic estimation of the execution time as the required control flow can be measured Analysis of the difference between a distributed and a global control flow
Transceiver Mapping - steps to follow 1. 2. Cycle accurate HW/SW co-simulation (Modelsim) Which operations can run in parallel? Can the next operation already be programmed while the current one is still running? Is it possible to precompute the commands and to store them in a local memory? (depends on the number of parameters changing dynamically during runtime)
Transceiver Mapping - steps to follow 1. 2. 3. Receiver Validation on the HW Platform First, validation using a known snapshot integrated in the source code Second, validation using real test signals received through the RF interface Only appropriate validation method for standards with a long packet / frame length (e.g. DAB)
802.11p PHY Overview Standard currently in draft Latest version D7.00 May 2009 PHY layer defined in IEEE 802.11 / 2007, Section 17 Same as 802.11g (OFDM) but with 10MHz bandwidth Latency critical short packet lengths acknowledgement packet must be send within a specific time LTE FM, DAB/DMB, SDARS, 802.11abg, Bluetooth 802.11p, Active Modules: WLAN Car2Car (e.g. 802.11p) Long range & Broadcast (e.g. DAB)
802.11p OFDM Parameters
802.11p Coding and Modulation Parameters
802.11p Frame Structure
802.11p Receiver Structure
802.11p Receiver Structure Energy Detection Where is most probably the beginning of the next packet? Time Synchronization over the short training symbols Where is the beginning of the next packet? Calculation of the channel estimate using the long training symbols How did the channel influence the received packet?
802.11p Receiver Structure Decoding of the Signal packet to retrieve RATE ( = coding rate) and LENGTH ( = length of the DATA packet) Channel Estimation & Carrier Phase Offset Estimation OFDM demodulation (BPSK 1/2) deinterleaving decoding using the Viterbi algorithm depending on the RATE, the modulation parameters (type of modulation, coding rate, coded bits per subcarrier, normalization factor) are set calculation of the number of transmitted OFDM symbols in the DATA packet
802.11p Receiver Structure Decoding of the DATA packets Channel Estimation & Carrier Phase Offset Estimation OFDM demodulation deinterleaving decoding using the Viterbi algorithm descrambling CRC decoder
ExpressMIMO Platform
ExpressMIMO Platform 1
Required DSP Engines Preprocessor Frequency correction (NCO) I/Q imbalance correction (phase / amplitude) Digital Retiming for arbitrary sampling rates Real-time interrupt generation Front-End Processor Time Synchronization Channel / Coarse Frequency Estimation Data Detection FFT Channel Decoder Viterbi decoder Deinterleaver Descrambling
Generic IP Shell DMA transfers in parallel to IP processing Programming of the next command while processing the current one In the future UC usage to enable a distributed control on the platform Preparation of the commands before receiver execution command stored in the VCI RAM and copied into the CTRL registers before the it is executed only dynamically changing parameters modified by LEON3 during runtime
Mapping the 802.11p receiver on the ExpressMIMO Platform
Emulation vs. HW Receiver Code Libembb generic interface getter and setter functions for command words (bfgen generated) getter and setter functions for command words (bfgen generated) Functional Model (C++) DSP Engine Command registers DSP Engine
PART 1: 802.11p Receiver Emulation using Libembb
Emulation vs. HW Receiver Code Libembb generic interface getter and setter functions for command words (bfgen generated) getter and setter functions for command words (bfgen generated) Functional Model (C++) DSP Engine Command registers DSP Engine
Code Structure (Emulation) libembb Receiver Code expressmimo_emu dot11_main.cc dot11_phy_procedures.cc dot11_sts_lts_detection.cc dot11_signal_field.cc dot11_data_packets_bpsk.cc dot11_data_packets_qpsk.cc dot11_data_packets_16qam.cc dot11_data_packets_64qam.cc expressmimo_emu.cc
Mapping the 802.11p receiver on the ExpressMIMO Platform dot11_main.cc dot11_phy_procedures.cc dot11_sts_lts_detection.cc dot11_signal_field.cc dot11_data_packets_bpsk.cc dot11_data_packets_qpsk.cc dot11_data_packets_16qam.cc dot11_data_packets_64qam.cc
Time Synchronization Preprocessor generates an interrupt as soon as 640 new input samples are available Energy Detection in combination with an overlapping FFT based correlation between 256 incoming samples a 4 Byte and the Short Training Symbol (160 samples, zero-extended to 256 samples) The FFTs overlap by 176 samples ( repetition each 80 samples) Peak detection to find the starting point of the packet
Time Synchronization FEP operations: energy calculation (256 samples) 2x256-point FFT / IFFT component-wise product (256 samples) max-argmax operation (256 samples) Comparison to threshold value LEON3: global control flow uc: distributed contol flow (preferred) total <8 us at 100 MHz
Time Synchronization Receiver Code 1. emm_fep_fft(256, dot11_vars->fft_offset, DFT_ADDR, 0); 2. emm_fep_component_wise_product_conj2(256, DFT_ADDR, STS_ADDR, CWP_ADDR, FEPTYPE_in1_cpx32 FEPTYPE_in2_cpx32 FEPTYPE_out_cpx32); 3. emm_fep_fft(256, CWP_ADDR, IDFT_ADDR, 1); 4. emm_fep_energy_max_argmax (256, IDFT_ADDR, RESULT_ADDR, FEPTYPE_in_cpx32 FEPTYPE_out_cpx32); 5. emm_memcpy_fep2leon((uint32_t*)max_argmax, (RESULT_ADDR+2)*4, 4*4); 6. dot11_vars->sts_synch_val[0] = max_argmax[0] & 0xffff; 7. dot11_vars->sts_max_pos[0] = max_argmax[2] & 0xffff; 8. VALUE COMPARISON
Time Synchronization Receiver Code 1. emm_fep_fft(256, dot11_vars->fft_offset, DFT_ADDR, 0); 2. emm_fep_component_wise_product_conj2(256, DFT_ADDR, STS_ADDR, CWP_ADDR, FEPTYPE_in1_cpx32 FEPTYPE_in2_cpx32 FEPTYPE_out_cpx32); 3. emm_fep_fft(256, CWP_ADDR, IDFT_ADDR, 1); 4. emm_fep_energy_max_argmax (256, IDFT_ADDR, RESULT_ADDR, FEPTYPE_in_cpx32 FEPTYPE_out_cpx32); 5. emm_memcpy_fep2leon((uint32_t*)max_argmax, (RESULT_ADDR+2)*4, 4*4); 6. dot11_vars->sts_synch_val[0] = max_argmax[0] & 0xffff; 7. dot11_vars->sts_max_pos[0] = max_argmax[2] & 0xffff; 8. VALUE COMPARISON
Time Synchronization libembb (expressmimo_emu.cc) emm_fep_fft(256, dot11_vars->fft_offset, DFT_ADDR, 0); void emm_fep_fft (uint32_t size, uint32_t offset_src, uint32_t offset_dst, uint32_t inverse) { FEP_CONTEXT *ctx = &ctxf; FEP_FFT; fep_start(ctx); }
Time Synchronization libembb (expressmimo.h) emm_fep_fft(256, dot11_vars->fft_offset, DFT_ADDR, 0); #define FEP_FFT { \ } uint32_t src_addr_index, dst_addr_index, src_memquarter, dst_memquarter; \ src_addr_index = get_addr_index_fft(offset_src); \ dst_addr_index = get_addr_index_fft(offset_dst); \ src_memquarter = get_fep_mss_bank(offset_src); \ dst_memquarter = get_fep_mss_bank(offset_dst); \ fep_set_l (ctx, size); \ fep_set_i (ctx, inverse); \ fep_set_bx(ctx, src_addr_index); \ fep_set_bz(ctx, dst_addr_index); \ fep_set_qx(ctx, src_memquarter); \ fep_set_qz(ctx, dst_memquarter); \ fep_set_wx(ctx, 3); \ fep_set_wz(ctx, 3); \ fep_set_op(ctx, FEP_OP_FT); \
Mapping the 802.11p receiver on the ExpressMIMO Platform dot11_main.cc dot11_phy_procedures.cc dot11_sts_lts_detection.cc dot11_signal_field.cc dot11_data_packets_bpsk.cc dot11_data_packets_qpsk.cc dot11_data_packets_16qam.cc dot11_data_packets_64qam.cc
Calculation of the Channel Estimate Calculated once for the whole packet FEP operations 64-point FFT of the Long Training Symbol (can be prestored in the FEP MSS) 64-point FFT component-wise product (64 sample vector)
Mapping the 802.11p receiver on the ExpressMIMO Platform dot11_main.cc dot11_phy_procedures.cc dot11_sts_lts_detection.cc dot11_signal_field.cc dot11_data_packets_bpsk.cc dot11_data_packets_qpsk.cc dot11_data_packets_16qam.cc dot11_data_packets_64qam.cc
SIGNAL Field Contains RATE and LENGTH RATE: type of modulation, coding rate LENGTH: between 0 and 4095 Characteristics Single OFDM symbol BSPK modulated Convolutional encoder with rate = ½ and generator polynomials g0=133 and g1 = 171 Calculation or use LUTs number of bits per subcarrier number of coded bits / data bits per OFDM symbol number of transmitted symbols number of data bits number of padding bits
SIGNAL Field Sequential processing of the invoked DSP engines (no parallelism possible) FEP Operations 64-point FFT Multiplication with the Channel Estimate o component-wise multiplication Carrier phase offset estimation o dot product over the four pilots o correction of the amplitude not necessary as the real part of the result is directly taken as input of the deinterleaver
SIGNAL Field Deinterleaving 64 input samples, 48 output samples combination of permutation tables in the standard + removing of the nulled and pilot carriers Decoding Viterbi Decoder
Mapping the 802.11p receiver on the ExpressMIMO Platform dot11_main.cc dot11_phy_procedures.cc dot11_sts_lts_detection.cc dot11_signal_field.cc dot11_data_packets_bpsk.cc dot11_data_packets_qpsk.cc dot11_data_packets_16qam.cc dot11_data_packets_64qam.cc
DATA Field Contains the SERVICE field, the PDSU, TAIL and PAD bits Characteristics Encoded message split over several OFDM (DATA) symbols BSPK, QPSK, 16-QAM or 64-QAM modulated Convolutional encoder with rate = ½ and generator polynomials g0=133 and g1 = 171 Higher rates (2/3, 3/4) can be achieved by employing zero-insertion All bits are scrambled (length-127-frame-synchronous-scrambler)
DATA Field Symbol grouping of FEP operations to process on larger vectors (performance gain) Max. group size is 8 (corresponds to the number of OFDM symbols transferred after each Preprocessor interrupt) Important for standards operating on short data sets Reduction of the required time to program the FEP operations
DATA Field FEP Operations 64-point FFT Multiplication with the Channel Estimate o component-wise multiplication
DATA Field FEP Operations Carrier phase offset estimation o dot product over the four pilots o correction of the amplitude not necessary for BPSK and QPSK as the result directly taken as input of the deinterleaver
Data Detection using the FEP BPSK/QPSK: 16-QAM: 64-QAM:
DATA Field Deinterleaving (per DATA symbol) combination of permutation tables in the standard + removing of the nulled and pilot carriers Decoding Viterbi Decoder Rate INTL IN (Byte) INTL OUT (Byte) BPSK 1/2 64 48 3/4 64 72 QPSK 1/2 128 96 3/4 128 144 16-QAM 1/2 512 192 3/4 512 288 64-QAM 2/3 1024 384 3/4 1024 432
LAB: Emulation of the 802.11p receiver ~/home/winterschool/emulation In dot11_phy_definitions.h: Print-outs of different intermediate results Variation of the number of DATA symbols to be grouped Different testsignals Usrp snaphot Testsignals generated according to the encoding procedure given by the 802.11-2007 standard precification make clean <testsignal> run bpsk_1_2 / bpsk_3_4 / qpsk_1_2 / qpsk_3_4 / qam16_1_2 / qam16_3_4 / qam64_2_3 / qam64_3_4 / usrp
PART 2: Running the 802.11p Receiver on ExpressMIMO
Emulation vs. HW Receiver Code Libembb generic interface getter and setter functions for command words (bfgen generated) getter and setter functions for command words (bfgen generated) Functional Model (C++) DSP Engine Command registers DSP Engine
Emulation vs. HW Receiver Code Libembb generic interface getter and setter functions for command words (bfgen generated) Functional Model (C++) DSP Engine getter and setter functions for command words (bfgen generated) Command registers DSP Engine not executable in realtime for standards with short data sets (huge overhead) Command preparation before launching the receiver
Code Structure (HW) Libembb Receiver Code expressmimo_hw dot11_main.c dot11_phy_procedures.c dot11_sts_lts_detection.c dot11_signal_field.c dot11_data_packets_bpsk.c dot11_data_packets_qpsk.c dot11_data_packets_16qam.c dot11_data_packets_64qam.c expressmimo_hw.c
Realtime Code Enhancements Parallel processing of the invoked DSP engines & DMA transfers Round robin scheduler Command preparation before starting the receiver to improve performance Only a few parameters change dynamically during runtime Preparation of the next command while the current one is still running
ExpressMIMO Platform
LAB: Running the 802.11p receiver on ExpressMIMO ~/home/winterschool/hw In dot11_phy_definitions.h: Print-outs of different intermediate results Variation of the number of DATA symbols to be grouped ~/home/mutekh./simstart.sh (to generate the plots like for Emulation)./simstart.sh text (to obtain results in text format)