Symposium: Real-time Digital Signal Processing for Optical Transceivers FPGA based Prototyping of Next Generation Forward Error Correction T. Mizuochi, Y. Konishi, Y. Miyata, T. Inoue, K. Onohara, S. Kametani, T. Sugihara, K. Kubo, T. Kobayashi, H. Yoshida and T. Ichikawa 16:45-17:10 September 22 nd, 2009 ECOC2009, Vienna Mitsubishi Electric Corporation, Information Technology R&D Center 2009, Mitsubishi Electric Corporation 1/25
Outline Expectations of stronger FECs for 100Gb/s transmission Soft decision based LDPC + RS FPGA prototyping Error correction experiment LSI for 100G digital coherent 2009, Mitsubishi Electric Corporation 2/25
Expectations of Stronger FECs for 100Gb/s Transmission 2009, Mitsubishi Electric Corporation 3/25
100G Needs Higher OSNR We should not lose sight of the fact that multi-level modulation needs a higher SNR than binary formats. As the level of an M-ary modulation scheme increases, the Euclidean distance decreases, and it becomes more difficult to distinguish between states. The rate of decrease of the Euclidean distance is faster than the rate of noise bandwidth reduction. PSK 25 M-PSK M-QAM QPSK 8-PSK 16-QAM 64-QAM OSNR penalty (db) 20 15 10 5 0 1 2 3 4 5 6 7 8 bit/symbol (bit rate = const.) 2009, Mitsubishi Electric Corporation 4/25
Toward 100Gb/s In order to deploy 100G over existing 40G systems, 1.3dB~2.7dB higher OSNR becomes mandatory. Stronger FEC can be a great help here 18 Required OSNR (db in 0.1nm) 16 14 12 10 8 6 4 2 OOK DPSK DQPSK OOK DP-16QAM DQPSK DPSK DP-QPSK 10 20 40 100 Bit rate (Gb/s) 1.3dB DP-16QAM DP-QPSK 2.7dB 2009, Mitsubishi Electric Corporation 5/25
FEC Deployment in Optical Communications The product of (linear) NCG and bit rate (in Gb/s) shows a clear trend in that an improvement of 1.4 times has been achieved every year. This improvement has been achieved not by FEC algorithm improvements, but by LSI technology evolution. Very strong FECs can be a key enabler for DSP-based 100G transmission. Net coding gain Bit rate product (Gb/s) (defined in terms of a post-fec BER of 10-15 ) 10 4 10 3 10 2 10 1 10 0 Shannon limit (Soft decision, 25% redundancy) 100Gb/s 40Gb/s 10Gb/s 2.5Gb/s x1.4 every year 100Gb/s (target) RS(255,239) 100Gb/s 40Gb/s 10Gb/s 2.5Gb/s T. Mizuochi, et al., IEEE Photonics Society Summer Topicals, WC1.1 1 st gen. RS(255,239) 2 nd gen. Concatenated codes, Iterative decoding 3 rd gen. Soft decision, Iterative decoding Year 86 88 90 92 94 96 98 00 02 04 06 08 10 12 14 16 18 2009, Mitsubishi Electric Corporation 6/25
Soft Decision based LDPC + RS 2009, Mitsubishi Electric Corporation 7/25
Low-Density Parity-Check Codes LDPC A linear code, defined by a very sparse parity check matrix Invented by Robert Gallager in his 1960 MIT Ph.D. dissertation. Long ignored. - R. G. Gallager, IRE Trans. Inform. Theory, Jan. 1962. Re-discovered by D. MacKay in 1996. Can achieve very strong error correction capability First calculation for optical communications - B. Vasic and I. B. Djordjevic., IEEE Photon. Technol. Lett., Aug. 2002. 2009, Mitsubishi Electric Corporation 8/25
Decoding Algorithms and Circuit Complexity Conventional algorithms Shuffled belief propagation (BP) High-performance, but quite complex Min-sum algorithm Easy calculation, but poor performance Cyclically approx. -min algorithm (Proposed) Simple LLR calculation Mathematical function approximated by - and minimum functions nearly identical performance to Shuffled BP nearly 1/5 the circuit size of Shuffled BP Circuit configuration for one codeword's bit (a weight of 3) Large Complexity Small min-sum Poor Proposed about 1.5dB Performance shuffled BP easy calculation Good Y. Miyata, et al., OFC/NFOEC2007, OWE5 2009, Mitsubishi Electric Corporation 9/25
How to eliminate the error floor Concatenated LDPC + RS Combating Error Floor (1) Increase the codeword length 20,000 bits or longer are needed (2) Increase the redundancy 35% or more is needed These can't be allowed in high speed optical communication systems Concatenating another weak code can effectively eliminate the unwanted error floor, without increasing circuit complexity. LDPC(9216,7936) + RS(992,956), 20.5% redundancy Y. Miyata, et al., OFC/NFOEC2008, OTuE4 Input information RS code (outer code) Encoder Interleave LDPC code (inner code) Encoder E/O Output information Decoder De-inter -leave Decoder Iteration Softdecision O/E 2009, Mitsubishi Electric Corporation 10/25
OTU4V Frame OTU4V frame for LDPC + RS - The length of the payload is the same as the OTUk frame - Enables transparent transmission of 100GbE client signals - Enables asynchronous multiplexing of multiple 10 Gb/s signals - Efficient parallel-processing of FEC enc./dec. and interleaver as a multiple of 128-parallelism overhead 128 288 248 239 OTU row row 11 R1 R1 L1 Interleave 128 OTU row row 22 R2 R2 L2 128 OTU row row 33 R3 R3 L3 RS parity 128 OTU row row 44 R4 R4 L4 9 40 LDPC parity Y. Miyata et al., OFC/NFOEC2009, NThB2 2009, Mitsubishi Electric Corporation 11/25
FPGA Prototyping 2009, Mitsubishi Electric Corporation 12/25
FPGA Prototyping Real-time emulation using high-speed FPGAs LDPC+RS FEC in FPGAs 125G MUX De-skew Soft Decision 2009, Mitsubishi Electric Corporation 13/25
Set-up for FPGA Prototyping BER Test 10.3 Gb/s PRBS31 Gear Box 12.5Gb/s Gear Box High Speed FPGA Prototyping Boards IL dil RS ENC FIFO IL RS DEC dil LDPC ENC FIFO LDPC DEC Iteration IL dil + Dummy Copy Fsync 15.6 Gb/s Pre- Skew De- Skew MX LN Mod 31.3 Gb/s OOK Soft-dec LSI 2bit PD TIA 31.3 Gb/s ASE Pipelined Architecture SFI-4 (700 Mb/s x 16 ch) 12.5Gb/s 31.3Gb/s CML (2 Gb/s x 16 chs) I/O Interface 1 st Stage 2 nd Stage 3 rd Stage 3 rd Stage 2 nd Stage 1 st Stage Terminate & Output I/F Terminate & Input I/F 14/24 2009, Mitsubishi Electric Corporation 14/25
31 Gsample/s Soft Decision LSI LSI Chip Package 0.13 m SiGe BiCMOS (f T =200 GHz) 9.7mm x 6.9mm 14W (+3.3V) Low temperature co-fired ceramic 30 mm x 29 mm x 2.15 mm 570 I/O pads 6.9mm 30mm 29mm 9.7mm T. Kobayashi, et al., OFC/NFOEC2009, OWeE2 2009, Mitsubishi Electric Corporation 15/25
Pipelined Architecture In order to emulate the operation of a massive circuit, e.g. iterative decoding, a pipelined architecture was constructed from concatenated FPGAs. FPGA board #1 11 22 FPGA m block p In RAM Through Decoder Out RAM FPGA board #n 11 22 m Insert p after n-frame time Data Sequence 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 core1 1 dec 1 1 dec 1 core2 2 dec dec 2 2... core8 8 dec 8 2009, Mitsubishi Electric Corporation 16/25
FPGA Boards Altera Stratix II 2 Mgates 100 MHz x 128=10 Gb/s throughput Pipeline 8 x FPGAs on a board n-concatenation = n x 8 x 2 (Mgates) RS DEC RS ENC LDPC ENC LDPC DEC 2009, Mitsubishi Electric Corporation 17/25
Error Correction Experiment 2009, Mitsubishi Electric Corporation 18/25
Experimental Results 31.3 Gb/s AWGN OOK, 2-bit soft dec., 4 iterations 10 0 10-1 10-2 Output BER 10-3 10-4 10-5 10-6 10-7 Calculated Experimental LDPC(9216,7936) only 10-8 10-9 10-10 10-11 10-12 Experimental LDPC(9216,7936) + RS(992,956) 8.9x10-3 2.5x10-13 Input Q = 7.5dB 10-13 10-1 10-2 Pre-FEC BER 10-3 2009, Mitsubishi Electric Corporation 19/25
Hard/Soft dec. and Number of Iterations Output BER 10-1 10-2 10-3 10-4 10-5 10-6 10-7 10-8 10-9 2-bit Soft dec. calculated, 4 iterations 8 iterations 16 iterations Hard dec. measured, 4 iterations Soft dec. 2.4 db better than hard dec Number of iterations 4, 8, 16 Expected NCG @10-15 9.9 db (1.2x10-2 1x10-15 ) 2-bit soft dec. 16 iterations 10-10 10-11 10-12 10-13 2-bit Soft dec. measured, 4 iterations 6 7 8 9 10 11 Pre-FEC Q (db) 2009, Mitsubishi Electric Corporation 20/25
Comparison with Shannon Limit NCG of >9.9 db can be expected for 100G DP-QPSK 2-bit soft dec., 16 iterations Unseen limit = Economical Shannon limit 1dB more gain needs >10M$ Net Coding Gain @ Output BER =10-15 (db) 14 13 12 11 10 9 8 7 6 5 7% 20% 25% Shannon limit H H Soft dec. Hard dec. 40 Gb/s EFEC 40 Gb/s RS(255,239) 0 5 10 15 20 25 30 FEC Redundancy (%) S S S H LDPC+RS 4 iterations (this work) 10 Gb/s Turbo 10 Gb/s EFEC 9.9 db expected 100Gb/s DP-QPSK LDPC+RS 2-bit soft dec. 16 iterations Economical Shannon limit S H Soft decision Hard decision 2009, Mitsubishi Electric Corporation 21/25
LSI for 100G Digital Coherent 2009, Mitsubishi Electric Corporation 22/25
Implementation in 100 Gb/s LSI OTU4V RS DEC Framer LSI Coherent Transceiver LSI 25G x 4 100GbE HD FEC SD FEC (LDPC) Tx DSP Rx DSP ADCs 100G Optics RS ENC OTU4 25G x 4 100GbE HD FEC SD FEC (LDPC) Tx DSP Rx DSP ADCs 100G Optics RS ENC FEC redundancy, Latency, Balance between hard dec and soft dec - 20% is preferable, 4~5x OTU4 - xx% hard dec in OTU4 framer + yy% soft dec in coherent transceiver LSI LSI Technology, Expected performance - 45nm CMOS, < 30~50 Mgates - hopefully NCG of 10~11dB at BER=10-15 2009, Mitsubishi Electric Corporation 23/25
Conclusions Expectations of stronger FECs for 100Gb/s transmission discussed 1.3~2.7 db stronger NCG than 40G EFEC is expected Soft decision FEC for coherent systems proposed Concatenated LDPC + RS, Cyclically approximated -min algorithm FPGA prototyping developed Pipelined architecture, 31.3 Gb/s throughput Error correction experiment carried out 7.5 db input Q can be corrected to 10-13 (2-bit soft dec., 4 iterations) 9.9 db NCG @10-15 is expected for 100G DP-QPSK 100G LSI issues discussed Hard decision FEC in OTU4 framer LSI + soft decision FEC in coherent LSI Further improvement of NCG : 10~11dB expected Real ASIC for 100G may emerge in 2010~2011 This work was in part supported by the Lambda Utility Project of the National Institute of Information and Communications Technology (NICT) of Japan 2009, Mitsubishi Electric Corporation 24/25