Differential Turbo Coded Modulation with APP Channel. Estimation

Differential Turbo Coded Modulation with APP Channel Estimation Sheryl L. Howard and Christian Schlegel Dept. of Electrical & Computer Engineering University of Alberta Edmonton, AB Canada T6G 2V4 Email: sheryl,schlegel@ece.ualberta.ca December 21, 2003 Abstract A serially concatenated coding system which can operate without channel state information (CSI) with use of a simple channel estimation technique is presented. This channel estimation technique utilizes the inner decoder s a posteriori probability (APP) information about the transmitted symbols to form a channel estimate for each symbol interval, and is termed APP channel estimation. The serially concatenated code (SCC) is comprised of an outer rate 2/3 binary error control code, separated by a bit interleaver from an inner code consisting of an 8-PSK bit mapping and differential 8-PSK modulation. Coherent decoding results in performance 0.6 db from 8-PSK capacity for large interleaver sizes. APP channel estimation decoding without CSI over constant, random walk and linear channel phase models shows near-coherent results, with fractions of a db loss in turbo cliff for the random walk phase model. Keywords: Differential modulation, serial concatenated codes, channel phase estimation, iterative decoding.

1 Introduction With the advent of turbo codes [1], [2] and iterative decoding of serially concatenated codes [4], the push for near-capacity performance of error control codes has become a reality. Use of these near-capacity codes with iterative decoding now allows receiver operation in very high noise/low signal-to-noise (SNR) environments. Turbo trellis coded modulation (TTCM) has shown performance gains with higher order modulation similar to those of binary turbo coding. Both parallel concatenated TCM with bit interleaving [5] and symbol interleaving [6] using 8-PSK, 16-QAM and 64-QAM modulation and serial concatenated TCM for 8-PSK [7], [8] have been examined. However, at the low SNR values achievable with turbo codes, issues such as phase synchronization become critical, especially for higher order modulations. Conventional phase synchronization utilizes a PLL (phase-locked loop) or Costas loop [9], resulting in phase ambiguities for PSK constellations. In addition, the squaring loss for higher order PSK modulation becomes significant, effectively prohibiting the use of these mechanisms for phase synchronization at the low SNRs at which turbo codes can operate. For 8-PSK suppressed-carrier signalling, the squaring loss of an eighth-power-law device at Es N 0 =9 db is upper-bounded by 10 db [10] with respect to PLL operation on an unmodulated carrier. Typical loop SNRs must be at least 6 db to achieve synchronization [11], [12]; thus the eighth-power squaring device must see a minimum loop SNR of 16 db to possibly achieve lock. This requires narrow loop bandwidth, which is not compatible with fast tracking and acquisition as needed for wireless packet systems. Effective phase synchronization for iteratively decoded systems utilizing higher order PSK modulation becomes highly problematic. The classical method of eliminating phase synchronization is differential M-PSK encoding with differential demodulation; however, a 3 db loss in SNR vs bit error rate (BER) occurs for M-PSK, with M > 2 [13]. This also applies to turbo coding. Differential BPSK modulation resulted in a 2.7 db loss in SNR [14] for a rate 1/2 turbo code using differential detection. Such significant loss 2

is counter to the near-capacity performance expected of turbo codes. Various techniques to mitigate the penalty incurred by differential detection have been used, such as multiple symbol differential detection [15], [16], [17], which has been applied to serially concatenated codes with iterative decoding, using linear prediction to obtain channel estimates [18]. This technique results in an exponential expansion of the decoding trellis. An expanded-state decoding trellis is also presented in [19] for serial concatenation of a rate 1/2 convolutional code with differential M-PSK modulation. The channel phase is discretized into N states, resulting in a linear trellis expansion of MN states. Similarly, iterative decoding of turbo codes with QPSK modulation incorporates channel estimation for fading channels by using quantized phase in an expanded supertrellis in [20]. Instead of expanding the already complex APP decoding trellis, channel estimation of PSAM (pilot-symbol-assisted modulation) BPSK turbo codes over fading channels is accomplished outside the APP decoder in [21], yet still within the iterative decoding block to allow for iteratively improving channel estimates. A BER of 10 4 within 0.5 db of coherent detection for slow fading is achieved. Turbo-embedded estimation (TEE) is an alternate approach that has been investigated for BPSK in [22] and extended to QPSK and 8-PSK [23]. This technique uses the most probable state during the forward recursion of the APP decoder to obtain a symbol estimate, which is fed to a simple tracking loop to compute an updated phase estimate. No state expansion of the APP trellis, and corresponding increase in complexity, occurs. However, TEE requires an initial phase estimate to begin decoding; this phase estimate is obtained from a known preamble whose length is approximately 1.5% of the total packet length. Results over the AWGN channel, with phase noise according to a recent DVB-S2 proposal [24], for rate 1/3 8-PSK modulation are 0.3 db from coherent decoding at a BER of 10 5, increasing with rate 1/2 8-PSK modulation to 0.7 db from coherent decoding at a BER of 2 10 3. From a coding perspective, differential modulation may be viewed as a recursive convolutional 3

code [25] and, as such, can be used as a viable inner code in a serially concatenated coding (SCC) system which will exhibit interleaver gain [4]. This idea has been used for binary SCCs [26], [25], and shown to provide good performance when coherently decoded. In this paper, we approach the construction of an SCC for higher order modulations from a code design perspective. We also show how use of an inner differential M-PSK code leads in a natural way to a decoder which functions without any prior channel phase information, providing performance nearly indistinguishable from that of an ideal receiver with complete phase knowledge. This decoder relies on the ability of the APP algorithm for the differential inner modulation code to provide useful soft information on the information symbols without any prior phase knowledge, even in the presence of significant time-varying channel phase offset, which is confirmed via EXIT analysis [27], [28]. This soft information is used by a low-complexity phase estimator to iteratively improve the channel metrics. Through iterations, decoding performance close to the complete phase knowledge scenario is achieved with integrated APP channel phase estimation. This method uses APP soft information from the differential decoder without expanding trellis complexity. Neither external phase estimation, such as a training sequence preamble or non-data-aided (NDA) phase estimation, nor differential demodulation are needed to begin the decoding process. An EXIT analysis approach [27], [28] is used to find matched outer codes that work optimally with the inner modulation code. The [3,2,2] parity check code is such a code, providing turbo cliff results only 0.6 db away [38] from the 8-PSK capacity at a system rate of 2 bits/symbol [33], [41]. An 8-PSK mapping specifically designed to optimize this system s error floor [29] is presented. We show the performance of our serially concatenated system compares favorably with two coherently-decoded serially concatenated systems, using 8-PSK modulation at a system rate of 2 bits/symbol. These include 1) serial concatenated trellis coded modulation (SCTCM) [7], [8] and 2) soft-decision iteratively decoded bit-interleaved coded modulation (BICM-ID) [32]. The paper is organized as follows: Section 2 describes the serially concatenated binary error 4

control code/differential 8-PSK system, examining the encoder, simulated channel, and decoder. Section 3 discusses analysis techniques used to optimize the system; distance spectrum analysis allows improvement of the error floor through a new mapping, while EXIT analysis predicts turbo cliff behavior and completes the error analysis. Section 4 presents our method of obtaining channel estimates when decoding without CSI, which we term APP channel estimation. Section 5 presents simulation results. Conclusions are discussed in Section 6. 2 System Description 2.1 Encoding System The encoder for our proposed serially concatenated transmission system is shown in Figure 1. A sequence of information bits u = [u 0,...,u L 1 ] of length L is encoded by an outer rate 2/3 binary error control code, such as the [3,2,2] parity code, into a sequence of coded bits v = [v 0,...,v3L 2 1] of length 3L. The coded bits v are interleaved bitwise through a random interleaver [2],[4]. These 2 interleaved coded bits v are mapped to 8-PSK symbols w, w k = E s e jθ k ; the symbol energy is normalized to E s = 1, and the 8-PSK symbols have unit magnitude and phase angle θ k [0, π/4,..., 7π/4] rads. Initially, we assume natural mapping, which labels the bit combinations 000 111 increasing consecutively counterclockwise from 0 rads. Figure 1 goes here. The 8-PSK symbols are then differentially encoded, so that the current transmitted differential symbol x k = w k x k 1. This differential 8-PSK encoding serves as the inner code of the serially concatenated system, and is viewed as a recursive non-systematic convolutional code [25]. A recursive inner code is necessary to provide interleaving gain in a serially concatenated system [4]. This inner code has a regular, fully-connected 8-state trellis, in which the trellis states correspond to the 5

current transmitted differential symbol. Transmission of a block of encoded symbols is initialized with x 0 = w 0, and terminated with a reference symbol x L 2 = E s e j0 to end in the zero state. A simplified three-stage differential 8-PSK trellis section (corresponding to three symbol time intervals) is shown in Figure 2. The actual trellis is fully connected but for clarity, only the first stage shows all connections. Two decoding paths are depicted in the trellis section. The solid bold path indicates the correct path, with the dashed bold path above it resulting from a phase rotation of π/2 radians. Both the original 8-PSK symbols w and the transmitted D8-PSK symbols x are shown as w, x above each corresponding trellis branch. Notice that while the π/2 phase rotation affects the transmitted D8-PSK symbols x, the information 8-PSK symbols w are identical on both paths. The trellis and code are rotationally invariant. This rotational invariance will be significant when considering system decoding without phase synchronization. Figure 2 goes here. 2.2 Channel Model We use a complex AWGN channel model with noise variance N 0 /2 in each dimension. Initially, the channel phase is assumed known at the receiver. The received symbols y consist of the transmitted symbols plus complex noise of variance N 0, i.e., y = x + n. Next, the assumption of phase knowledge is dropped, and we consider an unknown phase offset at the receiver. The channel now consists of complex AWGN n plus a possibly time-varying symbol phase offset φ: y k = x k e jφ k + nk. Three different phase models are considered: Constant but unknown phase offset φ k = Φ, k. This could model short packet transmission systems or frequency hopping, where the phase could be assumed approximately constant over one transmission block. 6

Gaussian random walk model. The unknown phase evolves as φ k = φ k 1 + Φ k, where the phase process Φ k is given by zero-mean i.i.d. Gaussian random variables with variance σ 2 Φ. This could model phase jitter from oscillator instability. Constant Frequency Offset f. Phase offset equals frequency offset multiplied by time, thus a constant frequency offset results in a linear phase offset of slope 2π f rads w.r.t. time. For simulation purposes, we assume the phase offset constant during one symbol interval T; the phase increases by a discrete amount of 2π f T rads for each consecutive symbol interval as φ k = φ k 1 + 2π ft. This could model a Doppler scenario. 2.3 Iterative Decoding Iterative decoding of this serially concatenated system proceeds according to turbo decoding principles [2], [4]. Figure 3 displays a block diagram of the decoding process. The APP channel estimation block is shown in the dashed rectangle, and is discussed in detail in Section 4. Note that differential demodulation is never used at the receiver for detection with or without phase knowledge, when APP channel estimation is employed. Assuming detection with known phase for now, received channel symbols y are converted to channel metrics p(y k x k ) = 1 πn 0 e y k x k 2 /N 0, which are fed to the inner APP decoder for the differential code, along with a priori information P a (w). No a priori information is available for the inner APP decoder during the first iteration, thus uniform a priori probabilities are used; P a ( wk = e j2lπ/8) = 1/8, l = 1,..., 8. Figure 3 goes here. The inner APP decoder uses the BCJR [36] or forward-backward algorithm operating on the 8- state trellis of the differential 8-PSK code, depicted in Figure 2, to calculate conditional symbol probabilities on both the 8-PSK symbols w and the transmitted D8-PSK symbols x. The APP 8-PSK symbol probabilities P APP (w) are converted first to APP bit probabilities P APP (v ) through 7

marginalization, for example, P e (v i = 0) = w k :v i =0 P e (w k (v i = 0, v i+1, v i+2 )) (1) The APP bit probabilities are converted to APP LLRs λ APP (v ). Extrinsic output LLRs λ e (v ) are obtained as usual by subtracting the a priori LLRs λ a (v ) from the APP LLRs λ APP (v ). Deinterleaving the extrinsic LLRs λ e (v ) provides a priori LLRs λ a (v) as input to the outer APP decoder. The outer APP decoder for a simple binary code may be implemented with low complexity; for example, the [3,2,2] parity code examined in this paper is simple enough, with only 3 bits and 4 possible codewords, to implement the 6 APP equations explicitly. The APP probabilities that express the parity constraints of the code are given by P APP (v 1 = 0) P a (v 1 = 0) (P a (v 2 = 0)P a (v 3 = 0) + P a (v 2 = 1)P a (v 3 = 1)) (2) P APP (v 1 = 1) P a (v 1 = 1) (P a (v 2 = 0)P a (v 3 = 1) + P a (v 2 = 1)P a (v 3 = 0)) (3) and analogously for v 2 and v 3. APP LLRs λ APP (v) are computed from these, and extrinsic LLRs λ e (v) obtained by subtracting off the a priori LLRs λ a (v). These extrinsic LLRs λ e (v) are then interleaved to obtain new a priori LLRs λ a (v ). These must be converted first to bit probabilities P a (v ), then to a priori symbol probabilities P a (w) for use in the next iteration of decoding in the inner APP symbol decoder. Symbol probabilities are calculated as the normalized product of their component bit probabilities. In this fashion, with inner and outer APP decoders exchanging extrinsic information each iteration, iterative decoding continues until convergence or a fixed number of iterations is reached, at which time a hard decision on the APP information bit LLRs λ APP (u) from the outer binary APP decoder determines the estimated information sequence û. 8

3 Code Properties and Performance Analysis Turbo code performance may be divided into three regions. In the first, the low SNR/high BER region, the turbo code does not perform well and iterative decoding has minimal effect. The turbo cliff or waterfall region follows, where the BER performance drops sharply to low values in only fractions of a db SNR increase. Finally, at high SNR, there may be an error floor or flare where the BER curve flattens out due to the predominance of low-weight error events. The latter two regions, the turbo cliff and error floor regions, require separate methodologies to analyze concatenated code performance. Extrinsic mutual information transfer, or EXIT, analysis [27], [28] is used to accurately predict turbo cliff onset SNR. Mutual information serves as a reliability measure of the soft information into and out of each component decoder. The second method is the minimum distance asymptote approximation [40] of the error performance in the high SNR error floor region, where performance of turbo coded systems flattens out by following the error curve of the most likely, minimum-weight sequence error event. Both methods are used to demonstrate the superior behavior of serially concatenated coded modulation and to optimize system performance. We first examine our system via minimum distance analysis. 3.1 Minimum Distance Analysis The trellis of a differential 8-PSK encoder is fully connected, so that the shortest error event is always only 2 branches long. With respect to the all-zeros sequence, the 7 possible two-branch error events are listed below; the first symbol in each branch is the original 8-PSK symbol and the second symbol is the differential 8-PSK symbol, as shown in Figure 2. Noting that merging branches all carry the same output symbol, only the diverging branch contributes to the minimum squared Euclidean distance (MSED). The two-branch error event MSED is simply that of naturally mapped 8-PSK, 0.586, found in (1) and (7). Without an outer code, the 9

input sequences to the D8-PSK encoder are unconstrained, and the D8-PSK MSED is 0.586. (1) 1 1 7 0 (4) 4 4 4 0 (7) 7 7 1 0 (2) 2 2 6 0 (5) 5 5 3 0 (3) 3 3 5 0 (6) 6 6 2 0 The 8-PSK mapping is not regular, i.e., all symbols separated by a given squared Euclidean distance (SED) do not have the same Hamming distance between them. For example, rotating 45 o from one symbol to the next gives a SED=0.586, but Hamming distance d H varies from 1 (between 000 at 0 rads and 001 at π/4 rads) to 3 (between 000 at 0 rads and 111 at 7π/4 rads). The distance between the all-zeros sequence and an error sequence is not representative of the distance between all correct and incorrect paths when the symbol mapping is not regular. Thus we must consider all sequences, not just the all-zeros sequence, recognizing that parallel branches of the differential trellis carry identical information symbols due to the rotationally invariant trellis and thus parallel paths are equivalent. However, consideration of the all-zeros sequence is sufficient for the moment to show the MSED of the differential code. Calculation of the minimum distance of turbo coded systems with random interleavers is significantly more involved than considering the minimum distance of the component codes [4], [40]. The input sequences to the inner decoder for our serially concatenated system are constrained to be interleaved codewords of the outer [3,2,2] parity-check code, which all have Hamming weight d H =2, and thus the entire interleaved sequence is constrained to even Hamming weight. This system is expected to manifest a rather flat error floor due to the very low MSED of 0.586; simulation results presented later will demonstrate such an error floor. We now consider the MSED error events, with the goal of better matching the 8-PSK mapping to the parity code sequence constraints to reduce the most likely error events. The minimum length 10

detour of the inner code is 6 bits long for a two-branch error event. If natural 8-PSK mapping is used, the seven possible bit sequences for the two-branch error events with the all-zeros sequence as reference are given in the left side of Table 1. Table 1 goes here. Five out of seven of the two-branch detours in Table 1 have even weight, and are permissible sequences. An 8-PSK signal mapping that generates primarily odd-weight two-branch error events would lower the probability of choosing a MSED sequence, and be better matched to the outer parity check code. Such an improved mapping, presented in [29], is given in the center section of Table 1. The input bit sequences for the two-branch error events using the improved mapping with respect to the all-zeros sequence are shown on the right. The improved mapping has only one even-weight sequence, 010-010, which generates a twobranch error event with a squared Euclidean distance (SED) of 4.0. The sequences 111-101 and 101-111 both have SED of 0.586, but are not eligible as two-branch error events because they are of odd weight. When we consider all other reference sequences besides the all-zeros sequence, there is only one two-branch error event resulting in MSED=0.586 and minimum d H =2. This MSED error event occurs with the sequence pair v, ˆv =010-011, 011-010. This improved mapping has the same number of even-weight MSED two-branch error events, but far fewer (1 vs. 16) d H =2 MSED error events. It can be shown [30] that a random interleaver is far more likely to contain a permutation allowing a d H =2 two-branch error event for both these mappings, with probability independent of interleaver length, than one for d H =4 or 6, which decreases as O(N 1 ). The improved mapping significantly reduces the number of d H =2 MSED two-branch error events compared to natural mapping. This reduction of MSED multiplicity lowers the error floor as will be seen in Section 3.3. It does not increase the MSED of the code, which remains at 0.586 with high likelihood. Use of a spread interleaver lowers the error floor further. An S-random interleaver [31] with 11

spreading S 6 will prevent the occurrence of a single two-branch error event, though it cannot prevent the occurrence of two two-branch error events. The MSED of the concatenated code with a S-random spread interleaver of S 6 is 1.172. 3.2 EXIT Analysis EXIT (extrinsic information transfer) analysis [27], [28] is a valuable technique for evaluating concatenated system performance in the turbo cliff, or waterfall, region. The mutual information I(X; E) between symbols X and the extrinsic soft information E with regards to X is used as a measure of the reliability of E generated by each component decoder. Likewise, I(X; A) measures the reliability of the a priori soft information A into the component decoder, with regards to X. Input a priori LLRs A are generated assuming a Gaussian distribution for p(a). Given A, with an associated I(X; A) = I A, the component APP decoder will produce E, with associated I(X; E) = I E. The inner decoder also requires channel metrics on the transmitted symbols, and thus is dependent on the channel SNR. The outer decoder of a serially concatenated system never sees the channel information and is independent of SNR. In this manner, an EXIT chart displaying I A, ranging from 0 to 1, versus I E for the APP decoder is produced. These component EXIT charts are used to study the convergence behavior of concatenated iteratively decoded systems. The interleaving process does not alter mutual information; while scrambling the soft information, interleaving does not change its distribution. In addition, interleaving destroys any correlation between successive symbols. This separation and independence between the two decoders allows the component decoder EXIT charts to be combined into a single EXIT graph depicting the iterative behavior of the turbo decoding process. Each component decoder is simulated individually, without the need for implementation and simulation of the actual concatenated system. EXIT analysis allows the component codes to be chosen for optimization of concatenated system performance, without lengthy simulation of each code combination. 12

For our concatenated system, the outer parity code produces binary soft information which can be processed as LLRs A o and E o, with associated I(A o ; V ) = I Ao and I(E o ; V ) = I Eo. As the interleaver operates bitwise, the extrinsic interleaved bit LLRs will be passed on to the inner D8- PSK APP decoder. However, as the inner decoder operates on symbols, these interleaved LLRs A i (or their corresponding bit probabilities P a (v )) must be converted to symbol probabilities P a (w) as discussed in Section 2. Conversely, the inner APP extrinsic information will be symbol probabilities P e (w), which must be converted to bit probabilities P e (v ), or their corresponding LLRs E i, for deinterleaving. I(A i ; V ) = I Ai and I(E i ; V ) = I Ei are calculated for the inner LLR values. Figure 4 displays the EXIT chart for our serially concatenated system with the [3,2,2] parity check code as outer code (I Eo on the horizontal axis, and I Ao on the vertical axis) and differential 8-PSK as the inner code (with axes swapped). The inner decoder EXIT curves depend on SNR, and are shown for 3.4 and 3.6 db. The improved 8-PSK mapping discussed in Section 3.1 is used. Natural 8-PSK mapping allows for earlier turbo cliff onset at SNR 3.4 db, but produces a higher error floor. An EXIT trajectory for the improved mapping at SNR 3.6 db is shown also. All curves are simulated using a 180000 bit interleaver. Figure 4 goes here. Notice the close fit between the outer [3,2,2] parity check and inner differential 8-PSK EXIT curves at SNR 3.4 db. The two codes are well-matched, in the sense that the combined codes minimize turbo cliff onset compared to a set of less well-matched codes. An EXIT curve for an outer rate 2/3 16 state maximal free distance recursive systematic convolutional code is also shown in Figure 4, together with an inner differential 8-PSK EXIT curve for SNR 5 db. Natural mapping is used. The free distance of this convolutional code is 5, compared to a free distance of 2 for the [3,2,2] parity check code. The increase in free distance of the outer code increases the minimum distance of the concatenated code and significantly lowers and reduces the error floor which exists with the parity check code. However, reduction of the error floor comes at 13

a large increase in turbo cliff onset SNR. As shown in Figure 4, pinchoff for the convolutional code and differential 8-PSK modulation occurs at SNR 5 db, 1.7 db past that of the concatenated system with the outer [3,2,2] parity code. It is clear from the EXIT curve of the outer rate 2/3 convolutional code that the differential EXIT curve must be raised significantly higher by increasing SNR to clear the outer EXIT curve, and provide a channel for iterative convergence. This increase in turbo cliff onset is due to the poor match, as shown by EXIT chart, between component decoders. 3.3 Code Performance Figure 5 shows the performance of our proposed serially concatenated coded modulation system with both natural and improved 8-PSK mapping and random interleaving. Results are shown for interleaver sizes of 15000 bits and 180000 bits. Notice the lowered error floor of the improved mapping. Also shown is the improved mapping with a fixed S-random spread interleaver of S=9 and 15000 bits. The spread interleaver lowers the error floor further. At a rate of 2 bits/symbol, 8- PSK capacity is at E b /N 0 = 2.9 db [33], [41]. Our concatenated system provides BER performance 0.6-0.8 db away from capacity for large interleaver size. The BER performance of our concatenated system underscores the effectiveness of simple design techniques, combined with turbo coding analysis techniques, in designing and optimizing systems for excellent performance. Not only are the two component decoders very simple to implement (an 8-state trellis decoder for the inner code and a lookup table for the outer code) but taken individually, their error control potential is virtually zero. The [3,2,2] parity check code cannot correct even single errors, and differential 8-PSK modulation alone is a very weak code. Together, however, they unfold the full potential of turbo coding, outperforming even large 8-PSK Ungerböck trellis codes [34] by 1 db. A 64-state 8-PSK TCM code achieves an asymptotic coding gain over uncoded QPSK of 5 db at a rate of 2 bits/symbol. As shown in Figure 5, our SCC system provides 14

performance results 6 db better than uncoded QPSK at a BER=10 5. Along the turbo cliff, 50 decoding iterations are required for convergence. EXIT analysis predicted that a large number of iterations along the turbo cliff would be required for convergence, due to the well-matched EXIT curves of the component codes, which result in a low SNR turbo cliff onset but provide only a narrow tunnel for iterative improvement. Convergence above 4.5 db occurs in 10 iterations or less. A minimum of 50 errors per data point were collected. As predicted by EXIT analysis, the larger interleaver size shows turbo cliff onset at SNR=3.3 db for natural mapping and 3.5 db for the improved mapping. Natural mapping provides a 0.2 db advantage in turbo cliff onset, at the cost of a higher error floor. For the smaller interleaver size, along the turbo cliff, convergence requires 50 iterations; at SNR 4 db, 30 iterations are required for convergence, decreasing to 10 iterations at SNR 5 db and above. EXIT analysis also predicted these rates of convergence. Figure 5 goes here. Our coherently decoded SCC achieves a BER of 2 10 6 at an SNR of 3.9 db, with an interleaver size of 15,000 bits using the improved 8-PSK mapping. In comparison, the serially concatenated TTCM system presented in [7], which assumes phase synchronization, provides a BER of 2 10 5 at SNR=3.7 db, the largest SNR value simulated, using an interleaver size of 16,385 bits and 8-PSK modulation at a rate of 2 bits/symbol over the AWGN channel. Bit-interleaved coded modulation using iterative decoding (BICM-ID) has been examined for 8- PSK modulation [32]. At a rate of 2 bits/symbol over an AWGN channel with phase synchronization, BICM-ID achieves a BER of 3 10 5 at SNR=4.2 db with an interleaver of 6000 bits. Our coherently decoded SCC provides comparable performance to the SCTCM system, and superior performance to BICM-ID, and in addition, offers the potential for decoding without CSI. We now examine system performance in the presence of phase noise, without CSI, making use of our APP channel estimator. 15

4 Decoding Without Channel Information Accurate phase acquisition and tracking on physical channels at low SNR to achieve coherent decoding is no easy task. Implementation of the required algorithms often consumes more VLSI area than the decoder itself. Thus, we now consider the case when the received channel phase is unknown; decoding must be performed without a priori CSI, i.e., non-coherently. A key observation is that the rotationally invariant property of the inner code allows us to extract a posteriori information on the information symbols w k, even without knowledge of the carrier phase. This soft information is provided as symbol probabilities from the inner APP decoder. Figure 6 shows the EXIT chart for the differential 8-PSK decoder with various phase offsets at SNR=4.5 db. Neither differential detection nor pilot symbols were used, and external channel information is not used, i.e., phase synchronization is not assumed. The D8-PSK APP decoder provides some extrinsic information in the presence of phase offset, even when no a priori information is available (along the vertical axis, corresponding to the initial iteration, when I Ai = 0). Even the worst case offset between two symbols, π/8 rads, still provides some extrinsic information initially. The presence of extrinsic information without any a priori information is significant; no external method of generating initial phase information will be necessary, such as a training sequence or non-data-aided (NDA) phase estimator, nor will differential demodulation be needed. This initial extrinsic information is not sufficient, however, to complete convergence; as seen in Figure 6, the component EXIT curves intersect, and decoding will be stuck there at a high bit error rate for all except φ = 0 o. Hence we propose to use this soft information from the inner decoder to estimate and track the channel phase through successive iterations. As we shall show, this leads to a decoder which achieves convergence even in the presence of significant channel phase offset. Figure 6 goes here. The inner APP decoder generates both 8-PSK symbol probabilities P(w) and D8-PSK symbol 16

probabilities P(x); the latter will be used as input to a channel estimator for subsequent iterations. Low complexity is important, given the emphasis on implementation simplicity leading to our choice of component codes. The channel estimator complexity must not overshadow that of the decoding system, and thus an optimal linear estimator such as the minimum mean square error (MMSE) estimator is not considered. We choose a simpler filtering estimator, presented in [43], [29], [38]. The channel model used is AWGN with complex noise variance N 0 and a complex time-varying channel gain h. In the case of unknown channel phase, h is a unit-length time-varying rotation. The received signal y is given as y k = h k x k + n k ; h k = e jφ k, n k : N(0, N 0 ) (4) From the first moment equation, E[y k ] = h k E[x k ], where E[x k ] is the expectation over the a posteriori symbol probabilities P(x k ) at time k from the inner APP decoder, an instantaneous channel estimate ĥk may be found as ĥ k = y kˆx k (5) where ˆx k is the mean E[x k ] modified to unit modulus, i.e., ˆx k = E[x k ]/ E[x k ]. At each iteration, the inner APP decoder sends soft probability estimates of the channel symbols x k to the channel estimator, which calculates the instantaneous channel estimates according to Equation 5. These channel estimates are then filtered through a lowpass digital filter f k, whose bandwidth allows tracking of the phase noise process, to generate the filtered channel estimates h k = ĥk f k. For a constant phase offset, h k = (1/N) N k=1 ĥk, where N is the frame length, is simply the average of the instantaneous channel estimates. For a time-varying Markov phase process such as the random walk or linear phase processes, the channel estimates are filtered through an exponentially decaying moving average filter as h k = (1 α) k j=1 αk j ĥ j, where α, the exponential decay parameter, is typically close to 1. These filtered channel estimates are used to generate coherent decoder branch 17

metrics P(y k x k ) exp ( y k h ) k x k 2 to be used by the inner APP decoder. No channel information is yet available during the initial iteration, so h k = 1 is chosen for the initial branch metrics. As we will show, each iteration improves the a posteriori values P(x) from the inner decoder, and thus an improved channel estimate h can be recalculated, as long as the SNR is above the turbo cliff of the coherent decoder. As mentioned previously, the differential trellis is rotationally invariant to integer multiples of π/4 rads phase shifts. If, however, in decoding the trellis, the beginning and final trellis states are assumed fixed to state 0 as is commonly done, endpoint errors will occur with a phase shift. The rest of the trellis will shift to a rotated sequence, but the endpoints cannot shift and remain fixed, causing errors. To prevent this, we use a floating decoding trellis for the inner APP decoder, with both beginning and final states assumed unknown and set to uniform probabilities. Figure 7 illustrates the rotational invariance, with a sample random walk phase process at top and the final APP phase estimate beneath. Twice, the phase estimate slips to a phase rotated by π/4 rads from the random walk phase. However, no decoding errors occur at these phase slips; the rotationally invariant inner trellis ensures there are no decoding errors if the phase estimator converges to a rotated phase, and the outer decoder cancels the bit errors at the phase jumps, so the phase estimator seamlessly slips between constellation symmetry angles. 5 Channel Estimation Simulation Results We now examine system performance in the presence of phase noise, without CSI, making use of our APP channel estimator. Three different channel phase processes are simulated: Constant Phase Offset: φ k = φ for φ = π/16 rads Random Walk Phase Process: φ k = φ k 1 + Φ k, Φ k : N(0, σ 2 Φ ) for σ2 Φ = 0.05 deg2. 18

Linear Phase Process: φ k = φ k 1 + 2π ft. We consider ft=0.001 and 0.0001. All channel models include AWGN. Decoding without channel state information is achieved using our integrated APP channel phase estimation algorithm. All channel phase estimation results use the same S-random interleaver with minimal spreading S = 3, size 15000 bits, and are compared with coherent results for the same interleaver. Fifty decoding iterations are used. Figure 8 compares the BER performance of decoding with and without CSI over an AWGN channel with a constant channel phase offset of π/16, with near-coherent results. The constant phase channel estimate h for the entire frame is an average of the instantaneous channel estimates ĥ as described in Section 4. Performance degrades somewhat as the phase offset approaches π/8. This is a metastable point, as π/8 is halfway between two valid differential sequences. The instantaneous channel estimates ĥ will oscillate to either side of the π/8 boundary, and convergence to the correct phase estimate is very slow for a phase offset of exactly π/8 rads. APP channel phase estimation results for a random walk phase process with σ 2 Φ =.05 deg2 are compared to coherent decoding results in Figure 9. The channel phase estimation filter coefficients are given by h k = (1 α) k j=1 αk j ĥ j with exponential decay parameter α =.99. APP channel estimation for the random walk phase process gives results 0.25 db from coherent decoding along the turbo cliff, where the 50 iterations used are not sufficient for convergence. A constant frequency/linear phase offset may model oscillator drift or a mobile Doppler scenario. A carrier frequency of 1 GHz and an oscillator drift of 1 ppm from a relatively poor quality oscillator provide f=1000 Hz. A symbol rate of 10 6 symbols/sec then corresponds to ft = 10 3. A higher quality oscillator with drift of 0.1 ppm gives f=100 Hz and ft = 10 4. Alternately, a velocity of 110 km/hr, a typical highway speed, corresponds also to a Doppler shift of 100 Hz. We consider both these values of constant frequency/linear phase offset, ft = 10 3 and 10 4, with larger ft expected to negatively impact performance. 19

Simulation results for a linear phase process with ft = 10 3 and 10 4 are also shown in Figure 9. The APP channel estimation for the linear phase process uses filter parameter α=0.995. Phase estimation results are approximately 0.6 db from coherent decoding for ft = 10 3 and near coherent performance for ft = 10 4. Figures 8 and 9 go here. 6 Conclusions We have shown that a simple serially concatenated system combining an outer [3,2,2] parity check code with an inner differential 8-PSK modulation code offers very good results with iterative decoding according to turbo principles. The rotationally invariant property of the inner code aids in channel phase estimation, and supplies sufficient soft information from the inner APP decoder to allow initial operation under unknown channel phase rotations. A simple channel estimation procedure using this soft information from the inner APP decoder achieves near-coherent performance without channel phase information, under both constant and time-varying simulated phase processes. Neither pilot symbols nor differential demodulation are used or needed. The random walk phase process channel estimation results in a turbo cliff shift of about 0.25 db to the right, while the linear phase process results in a 0.6 db shift of the turbo cliff. Both encoding and decoding portions of this system may be implemented with low complexity, and could be used in conjunction with packet transmission, where short messages increase the need for phase offset immunity. Due to a very low MSED, this simple code has a significant error floor. An improved 8-PSK mapping lowers the error floor at a slight 0.2 db SNR penalty in the turbo cliff onset region. Use of a spread interleaver lowers the error floor further over random interleaving. 20

References [1] C. Berrou, A. Glavieux and P. Thitimajshima, Near Shannon limit error-correcting coding and decoding: Turbo codes, Proceedings of the IEEE International Conference on Communications (ICC) 1993, Geneva, Switzerland, 1993, pp. 1064-1070. [2] C. Berrou, A. Glavieux and P. Thitimajshima, Near optimum error correcting coding and decoding: turbo-codes, IEEE Trans. Commun., vol. COM-44, no. 10, pp. 1261-1271, Oct. 1996. [3] S. Benedetto and G. Montorsi, Unveiling Turbo-Codes: Some Results on Parallel Concatenated Coding Schemes, IEEE Trans. Inform. Theory, vol. 43, no. 2, March 1996, pp. 409-428. [4] S. Benedetto, D. Divsalar, G. Montorsi and F. Pollara, Serial concatenation of interleaved codes: Performance analysis, design, and iterative decoding, IEEE Trans. Inform. Theory, 44(3), May 1998, pp. 909-926. [5] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, Parallel concatenated trellis coded modulation, Proc. IEEE International Conference on Communications (ICC) 1996, Dallas, TX, pp. 974-978, June 23-27, 1996. [6] P. Robertson and T. Wörz, Bandwidth-Efficient Turbo Trellis-Coded Modulation Using Punctured Component Codes, IEEE Journal of Selected Areas in Communications, vol. JSAC-16, no. 2, pp. 206-218, Feb. 1998. [7] D. Divsalar and F. Pollara, Turbo Trellis Coded Modulation with Iterative Decoding for Mobile Satellite Communications, IMSC 97, June 97. [8] D. Divsalar and F. Pollara, Serial and Hybrid Concatenated Codes with Applications, Intern. Symposium on Turbo Codes Brest, France, Sept. 3-5, 1997, pp. 80-87. [9] J.P. Costas, Synchronous Communications, Proc. IRE, vol. 44, Dec. 1956, pp. 1713-1718. [10] S.A. Butman, J.R. Lesh, The Effects of Bandpass Limiters on n-phase Tracking Systems, IEEE Trans. Inform. Theory, vol. 25, June 1977, pp. 569-576. [11] H. Meyr and G. Ascheid, Synchronization in Digital Communications, Vol. 1, Wiley Series in Telecommunications, 1990. [12] F.M. Gardner, Phaselock Techniques, 2nd edition, John Wiley & Sons, 1979. [13] J.G. Proakis, Digital Communications, 4th edition, McGraw-Hill, 2000. [14] E.K. Hall and S.G. Wilson, Turbo codes for noncoherent channels, Proc. GLOBECOM 97, Nov. 1997, pp. 66-70. [15] D. Divsalar and M.K. Simon, Multiple symbol differential detection of MPSK, IEEE Trans. Commun., vol. 38, pp. 300-308, Mar. 1990. [16] D. Makrakis and K. Feher, Optimal noncoherent detection of PSK signals, Electron. Lett., vol. 26, pp. 398-400, Mar. 1990. [17] M. Peleg and S. Shamai, Iterative decoding of coded and interleaved noncoherent multiple symbol detected DPSK, Electron. Lett., vol. 33, no. 12, pp. 1018-1020, June 1997. [18] P. Hoeher and J. Lodge, Turbo DPSK : Iterative differential PSK demodulation and channel 21

decoding, IEEE Trans. Commun., 47(6), June 1999, pp. 837-843. [19] M. Peleg, S. Shamai and S. Galán, Iterative decoding for coded noncoherent MPSK communications over phase-noisy AWGN channel, IEE Proc. Commun., vol. 147, Apr. 2000, pp. 87-95. [20] C. Komninakis and R. Wesel, Joint Iterative Channel Estimation and Decoding in Flat Correlated Rayleigh Fading, IEEE J. Sel. Areas Commun., vol. 19, No. 9, Sept. 2001, pp. 1706-1717. [21] M.C. Valenti and B.D. Woerner, Iterative channel estimation and decoding of pilot symbol assisted turbo codes over flat-fading channels, IEEE J. Sel. Areas Commun., vol. 9, Sept. 2001, pp. 1691-1706. [22] S. Cioni, G. E. Corazza, and A. Vanelli-Coralli, Turbo Embedded Estimation with imperfect Phase/Frequency Recovery, Proceedings of IEEE International Conference on Communications (ICC) 2003, Anchorage, AK, May 2003. [23] S. Cioni, G. E. Corazza, and A. Vanelli-Coralli, Turbo Embedded Estimation for High Order Modulation, Proceedings of the 3rd International Symposium on Turbo Codes & Related Topics, Brest, France, Sept. 1-5, 2003, pp. 447-450. [24] ESA DVB-S2 contribution, DVB-S2 Phase Jitter synthesis, Jan. 2003. [25] K. R. Narayanan and G. L. Stüber, A Serial Concatenation Approach to Iterative Demodulation and Decoding, IEEE Trans. Commun., vol. COM-47, no. 7, pp. 956-961, July 1999. [26] M. Peleg, I. Sason, S. Shamai and A. Elia, On interleaved, differentially encoded convolutional codes, IEEE Trans. Inform. Theory, 45(7), Nov. 1999, pp. 2572-2582. [27] S. ten Brink, Design of serially concatenated codes based on iterative decoding convergence, in 2nd International Symposium on Turbo Codes and Related Topics, Brest, France, 2000. [28] S. ten Brink, Convergence behavior of iteratively decoded parallel concatenated codes, IEEE Trans. Commun., 49(10), Oct. 2001, pp. 1627-1737. [29] S. Howard, C. Schlegel, L. Pérez, F. Jiang, Differential Turbo Coded Modulation over Unsynchronized Channels, Proc. of IASTED 3rd Int. Conf. on Wireless and Optical Communications (WOC) 2002, Banff, Alberta, Canada, 2002, pp. 96-101. [30] S. Howard, Probability of Random Interleaver Containing d H =2,4 Two-Branch Error Patterns for the Differential 8PSK/[3,2,2] Parity SCC, unpublished report, HCDC Laboratory, Dept. of Electrical and Computer Engineering, University of Alberta, December 2003. [31] D. Divsalar and F. Pollara, Turbo Codes for PCS Applications, Proc. ICC 95, Seattle, WA, June 1995. [32] X. Li, A. Chindapol and J.A. Ritcey, Bit-Interleaved Coded Modulation With Iterative Decoding and 8PSK Signaling, IEEE Trans. on Commun., Vol. 50, August 2002, pp. 1250-57. [33] G. Ungerboeck, Channel Coding with Multilevel/Phase Signals, IEEE Trans. Inform. Theory, vol. IT-28, Jan. 82, pp. 55-67. [34] G. Ungerboeck, Trellis-coded modulation with redundant signal sets, Part I: Introduction, IEEE Commun. Mag., vol. 25, No. 2, pp. 5-11, February 1987. [35] G. Ungerboeck, Trellis-coded modulation with redundant signal sets, Part II: State of the 22

art, IEEE Commun. Mag., vol. 25, No. 2, pp. 12-21, February 1987. [36] L.R. Bahl, J. Cocke, F. Jelinek and J. Raviv, Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate, IEEE Trans. Inform. Theory, vol. 20, Mar. 1974, pp. 284-287. [37] C. Schlegel, Trellis Coding, IEEE Press, Piscataway, NJ, 1997. [38] S. Howard, C. Schlegel, Differentially-Encoded Turbo Coded Modulation with APP Channel Estimation, Proceedings of IEEE GlobeCom 2003, San Francisco, CA, December 1-5, 2003. [39] S. ten Brink, Design of Concatenated Coding Schemes based on Iterative Decoding Convergence, Ph.D. dissertation, Universität Stuttgart, Shaker Verlag, Aachen 2002. [40] L.C. Pérez, J. Seghers and D.J. Costello, Jr., A distance spectrum interpretation of turbo codes, IEEE Trans. on Inform. Theory, vol. 42, pp. 1698-1709, Part I, Nov. 1996. [41] R.G. Gallager, Information Theory and Reliable Communications, New York, Wiley, 1968. [42] A. Grant and C. Schlegel, Differential turbo space-time coding, Proc. IEEE Information Theory Workshop 2001, pp. 120-122, Cairns, Sept. 2001. [43] C. Schlegel and A. Grant, Differential space-time codes, IEEE Trans. on Inform. Theory, vol. 49, no. 9, Sept. 2003, pp. 2298-2306. [44] T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley Series in Telecommunications, John Wiley & Sons, 1991 23

8PSK Natural Map even 0 0 1 1 1 1 x 0 1 0 1 1 0 0 1 1 1 0 1 x 1 0 0 1 0 0 x 1 0 1 0 1 1 x 1 1 0 0 1 0 1 1 1 0 0 1 x Improved 8PSK Mapping 0 0 1 1 1 1 0 1 0 0 0 1 0 1 1 1 0 0 1 0 0 0 1 0 1 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 0 1 8PSK Improved Map even 1 1 1 1 0 1 0 0 1 1 1 0 1 0 0 0 1 1 0 1 0 0 1 0 x 0 1 1 1 0 0 1 1 0 0 0 1 1 0 1 1 1 1 Table 1: Bit sequences of two-branch error events for natural and improved 8-PSK mappings u [3,2,2] Parity Encoder v v Interleaver (010) (011) (001) (100) (000) (101) (111) (110) Differential Modulator D x Figure 1: Serial Turbo Encoder for Differential Turbo Coded Modulation 7 6 5 w k, x k 1,5 2,7 2,5 4 2,4 3 1,3 2 2,2 1 state 0 time k 1 k k + 1 Figure 2: 8-PSK Differential Trellis Section 7 6 5 4 3 2 1 0 state time 24

p(y x) Inner Filter ĥ yˆx D8PSK APP Soft Decision Decoder P APP (w y) ˆx k P a (w) Symbl Bit Bit Symbl P APP (v ) P a (v ) P APP (x y) Bit LLR LLR Bit λ e (v ) - λ a (v ) π 1 π λ a (v) - λ e (v) Outer Parity APP Soft Decision Decoder Figure 3: Serial Turbo Decoder, D8PSK Inner Decoder and [3,2,2] Outer Decoder With APP Channel Estimation 1 0.9 0.8 Convolutional Code System IE inner, IA outer 0.7 0.6 0.5 0.4 Parity Code System 0.3 outer [3,2,2] parity code inner D8PSK new map: SNR 3.6 db inner D8PSK new map: SNR 3.4 db outer rate 2/3 16 state cc inner D8PSK nat map: SNR 5.0 db 0 0 0.2 0.4 0.6 0.8 1 IA, IE inner outer 0.2 0.1 Figure 4: EXIT chart for outer [3,2,2] parity code and inner differential 8-PSK modulation at SNR=3.4 and 3.6 db, with improved 8-PSK mapping; decoding trajectory shown for SNR=3.6 db. EXIT curve for outer rate 2/3 16 state convolutional code and differential 8-PSK at SNR=5 db also shown. 25

10 0 BER: Bit Error Rate 10 1 10 2 10 3 D8PSK: natural mapping nat map:180k intlv new map:180k intlv nat map:15k intlv new map:15k intlv new map: 15k S=9 spread intlv 10 4 10 5 10 6 10 7 3 3.5 4 4.5 5 5.5 6 E b N 0 SNR in db Figure 5: Performance of the serially concatenated D8-PSK system with outer [3,2,2] parity code. 1 0.9 0.8 0.7 IE inner,ia outer 0.6 0.5 0.4 0.3 0.2 0.1 D8PSK: 0 o offset D8PSK: 9 o offset D8PSK: 11.25 o offset D8PSK: 15 o offset D8PSK: 22.5 o offset [3,2,2] parity outer code 0 0 0.2 0.4 0.6 0.8 1 IA inner,ie outer Figure 6: EXIT chart of the differential 8PSK decoder for various constant phase offsets, SNR=4.5 db, and [3,2,2] parity check decoder. 26

Phase in Rads.1.08.06.04.02 0 -.02 -.04 -.06 -.08 -.1 random walk phase: black estimated phase: red 0 1000 2000 3000 4000 5000 8PSK symbols Figure 7: Random walk channel phase model and estimated phase with π/4 phase slips, decoded without errors. 10 1 BER: Bit Error Rate 1 coherent decoding APP channel estimation 10 2 10 3 10 4 Constant Phase Offset π/16 rads 10 5 10 6 10 7 3 3.5 4 4.5 5 5.5 E b N 0 SNR in db Figure 8: BER results without CSI, phase offset=π/16 rads, using APP channel estimation, compared with coherent decoding; interleaver size=15k bits, fixed S=3 interleaver 27