Approximate Message Passing: Applications to Communications Receivers Phil Schniter (With support from NSF grant CCF-1018368, NSF grant CCF-1218754, and DARPA/ONR grant N66001-10-1-4090) TrellisWare, Feb. 2014 1
The Generalized Linear Model: Consider observation y C M of unknown vector x C N that is sent through known linear transform A, generating hidden z = Ax, then observed through a probabilistic measurement channel p y z (y z). Our goal is to infer x from y. When p x and p y z are both Gaussian, the MMSE/MAP estimator is linear and easy to state in closed-form. The more interesting case is when p x and/or p y z are non-gaussian. Equally interesting is when M N: Compressive sensing tells us that K-sparse x C N can be accurately recovered from M O(KlogN/K) measurements when A is information-preserving (e.g., satisfies 2K-RIP). There are many applications of estimation under the generalized linear model in engineering, biology, medicine, finance, etc. 2
Example Applications: Pilot-aided channel estimation / compressed channel sensing x: sparse channel impulse response (length N) y: pilot observations (M < N with sparse channel) A: built from pilot symbols and other aspects of linear-modulation Imaging (medical, radar, etc.) x: spatial-domain image (rasterized) y: noisy measurements (AWGN, Gaussian, phaseless, etc.) A: typically Fourier-based (details are application dependent) Binary linear classification and feature selection x: prediction vector ( to class-separating hyperplane, sparse) y: binary experimental outcomes (e.g., {sick, healthy}) A: each row contains per-experient features (e.g., age, weight, etc.) 3
Generalized Approximate Message Passing (GAMP): Suppose we are interested in computing the MMSE or MAP estimate of x from y (under known A, p x, p y z ). For general A, p x, and p y z, this is difficult...in fact NP hard. However, for sufficiently large and dense A, and separable p x and p y z, there is a remarkable new iterative algorithm that gets close: GAMP. S. Rangan, Generalized approximate message passing for estimation with random linear mixing, arxiv:1010.5141, Oct. 2010. In the large-system limit (M,N with fixed M/N) when A is drawn iid sub-gaussian, and p x and p y z are separable (i.e., independent r.v.s), GAMP s performance is characterized by a state evolution whose fixed points, when unique, coincide with the MMSE or MAP optimal estimates. In practice, A is finite sized and structured (e.g., Fourier). Still, for any A, the fixed-points of the GAMP iterations correspond to the critical points of the MAP optimization objective, max x { lnpy z (y Ax)+lnp x (x) }. 4
A Revolution in Loopy Belief Propagation: The GAMP algorithm can be derived as an approximation of the sum-product (in the MMSE case) or max-product (in the MAP case) loopy BP algorithms. p y1 z 1 (y 1 a H 1 x) p y2 z 2 (y 1 a H 1 x) p ym z M (y 1 a H 1 x).. x 1 x 2 x N. p x1 (x 1 ) p x2 (x 2 ) p xn (x N ) The approximation makes use of the central limit theorem and Taylor series approximations that hold in the large-system limit. An interesting observation is that, because A is dense, the factor graph is extremely loopy. Loosely speaking, these loops are OK because (for normalized A) they get weaker as the problem gets larger. Note: Rigorous analyses of GAMP are based on the algorithm itself, not on the loopy-bp approximations. M. Bayati and A. Montanari, The dynamics of message passing on dense graphs, with applications to compressed sensing, IEEE Trans. Inform. Thy., Feb. 2011. 5
GAMP Heuristics (Sum-Product Case): 1. Message from y i node to x j node: p y1 z 1 (y 1 [Ax] 1 ) p 1 1 (x 1 ) x 1 p X (x 1 ) p i j (x j ) {x r } r j N via CLT ( {}}{ p yi z i yi r a ) irx r r j p i r(x r ) p yi z i (y i z i )N ( z i ;ẑ i (x j ),νi(x z j ) ) N z i p y2 z 2 (y 2 [Ax] 2 ) p ym z M (y M [Ax] M ). p M N (x N ) To compute ẑ i (x j ),ν z i (x j), the means and variances of {p i r } r j suffice, thus Gaussian message passing! Remaining problem: we have 2M N messages to compute (too many!).. x 2 x N. p X (x 2 ) p X (x N ) 2. Exploiting similarity among the messages {p i j } M i=1, AMPemploysaTaylor-seriesapproximation of their difference whose error vanishes as M for dense A (and similar for {p i j } N j=1 as N ). Finally, need to compute only O(M+N) messages! p y1 z 1 (y 1 [Ax] 1 ) p y2 z 2 (y 2 [Ax] 2 ) p ym z M (y M [Ax] M ). p 1 1 (x 1 ) p M N (x N ). x 1 x 2 x N. p X (x 1 ) p X (x 2 ) p X (x N ) The resulting algorithm requires two matrix-vector multiplications per iteration, and converges in typically 25 iterations. 6
GAMP Extensions: Standard GAMP assumes known, separable p x and p y z. However, in practice... Densities p x and p y z are usually unknown. Often, they are also non-separable (i.e., elements of x are statistically dependent; same for y z) We have developed an EM-based methodology to learn p x and p y z online and subsequently leverage this information for near-optimal Bayesian inference. J. P. Vila and P. Schniter, Expectation-Maximization Gaussian-Mixture Approximate Message Passing, IEEE Trans. Signal Process., Oct. 2013. We also have developed a turbo methodology that handles probabilistic dependencies among the elements of x and the elements of y z. P. Schniter, Turbo reconstruction of structured sparse signals, Proc. CISS, (Princeton, NJ), Mar. 2010. 7
Some Communications Applications of (EM/turbo) GAMP: 1. Communications over wideband channels joint channel-estimation/equalization/decoding P. Schniter, A Message-Passing Receiver for BICM-OFDM over Unknown Clustered-Sparse Channels, IEEE J. Sel. Topics Signal Process., Dec. 2011. P. Schniter, Belief-propagation-based joint channel estimation and decoding for spectrally efficient communication over unknown sparse channels, Physical Communication, Mar. 2012. 2. Communications over underwater channels joint channel-tracking/equalization/decoding P. Schniter and D. Meng, A Message-Passing Receiver for BICM-OFDM over Unknown Time-Varying Sparse Channels, Allerton Conf., Sep. 2011. 3. Communications in impulsive noise joint channel-estimation/equalization/impulse-mitigation/decoding M. Nassar, P. Schniter, and B. Evans, A Factor-Graph Approach to Joint OFDM Channel Estimation and Decoding in Impulsive Noise Environments, IEEE Trans. Signal Process., to appear. 8
1. Comms over Wideband Channels: At large communication bandwidths, channel impulse responses are sparse. Below left shows channel taps x = [x 0,...,x L 1 ], where x n = x(nt) for bandwidth T 1 = 256 MHz, x(t) = h(t) p RC (t), and h(t) is generated randomly using 802.15.4a outdoor NLOS specs. 0 IEEE 802.15.4a outdoor NLOS 0.5 Measured underwater channel 20 40 60 real part 0 db 80 100 120 140 160 taps: big channel var: big PDP threshold var: small taps: small imag part 0.5 0.5 0 50 100 150 200 250 300 350 400 450 500 lag 180 0 50 100 150 200 250 lag 0.5 50 100 150 200 250 300 350 400 450 500 lag 9
Simplified Channel Model: First, let s simplify things to talk concretely about sparse channels... Consider a discrete-time channel that is block-fading with block size N, frequency-selective with L taps (where L < N), sparse with S non-zero complex-gaussian taps (where 0 < S L), where both the channel coefficients and support are unknown to the receiver. Important questions: 1. What is the capacity of this channel? 2. How can we build a practical comm system that operates near this capacity? 10
Noncoherent Capacity of the Sparse Channel: For the unknown N-block-fading, L-length, S-sparse channel described earlier, we established that [1] 1. In the high-snr regime, the ergodic capacity obeys C sparse (SNR) = N S N log(snr)+o(1). 2. To achieve the prelog factor R sparse = N S N, it suffices to use pilot-aided OFDM (with N subcarriers, of which S are pilots) with joint channel estimation and data decoding. Key points: The effect of unknown channel support manifests only in the O(1) offset. [1] uses constructive proofs, but the decoder proposed there is not practical. [1] A. Pachai-Kannu and P. Schniter, On communication over unknown sparse frequency selective block-fading channels, IEEE Trans. Info. Thy., Oct. 2011. 11
Practical Communication over the unknown Sparse Channel: We now propose a communication scheme that... is practical, with decode complexity O(N log 2 N +N S ) per block, (empirically) achieves the optimal prelog factor R sparse = N S N, significantly outperforms compressed channel sensing (CCS) schemes. Our scheme uses... a conventional transmitter: pilot-aided BICM OFDM, a novel receiver: based on GAMP. 12
Factor Graph for pilot-aided BICM-OFDM: uniform prior info bits code & interlv pilots & training coded bits symbol mapping QAM symbs OFDM obsv channel taps sparse prior c 0,1 M 0 s 0 y 0 b 1 c 0,2 x 1 b 2 c 1,1 c 1,2 M 1 s 1 y 1 x 2 b 3 c 2,1 c 2,2 M 2 s 2 y 2 x 3 c 3,1 M 3 s 3 y 3 SISO (de)coding c 3,2 GAMP = random variable = posterior factor To jointly infer all random variables, we perform loopy-bp via the sum-product algorithm, using GAMP approximations in the GAMP sub-graph. 13
Numerical Results Perfectly Sparse Channel: Transmitter: LDPC codewords with length 10000 bits. 2 M -QAM with 2 M {4,16,64,256} and multi-level Gray mapping. OFDM with N = 1024 subcarriers. P pilot subcarriers and/or T training MSBs. Channel: Length L = 256 = N/4. Sparsity S = 64 = L/4. Reference Schemes: Pilot-aided LASSO was implemented using SPGL1 with genie-aided tuning. Pilot-aided LMMSE, support-aware MMSE, and info-bit+support-aware MMSE channel estimates were also tested. 14
BER & Outage vs SNR (with P=L pilots and T=0 MSBs): bpcu 4 3.5 3 2.5 2 1.5 1 0.5 log 10 (BER) of GAMP 10 12 14 16 18 20 SNR db 0 0.5 1 1.5 2 2.5 3 3.5 4 bpcu 4 3.5 3 2.5 2 1.5 1 0.5 GAMP BER=0.001 contours (64-QAM) BSG GAMP GAMP BSG GAMP BSGGAMP LASSO SG LASSO LMMSE 10 12 14 16 18 20 SNR db SG LASSO LMMSE LASSO LMMSE Key points: GAMP outperforms both LASSO and the support genie (SG). GAMP performs nearly as well as the info-bit+support-aware genie (BSG). With P = L, all approaches yield prelog factor R = N L N the optimal R sparse = N S N = 15 16. 15 = 3 4, which falls short of
BER & Outage vs SNR (with P=0 pilots & T=SM training MSBs): training-to-sparsity ratio: T/(SM) log 10 (BER) (256 QAM, 3.75 bpcu, 20dB SNR) 4 3.5 3 2.5 2 1.5 1 0.5 0 0 1 2 3 4 pilot-to-sparsity ratio: P/S 0.5 1 1.5 2 2.5 bpcu 4 3.5 3 2.5 2 1.5 1 BER=0.01 contours (256-QAM) GAMP GAMP GAMP GAMP 0.5 10 12 14 16 SNR db 18 20 Key points: GAMP favors P=0 pilot subcarriers and T =SM training MSBs. Precisely the necc/suff redundancy of the capacity-maximizing system! GAMP achieves the sparse-channel s capacity-prelog factor, R sparse = N S N. 16
In reality, channel taps are not perfectly sparse, nor i.i.d: For example, consider channel taps x = [x 0,...,x L 1 ], where x n = x(nt) for bandwidth T 1 = 256 MHz, x(t) = h(t) p RC (t), and h(t) is generated randomly using 802.15.4a outdoor NLOS specs. 0 20 typical realization 1000 800 histogram at lag 5 histogram at lag 23 1500 40 60 600 400 200 1000 500 db 80 100 120 140 160 taps: big channel var: big PDP threshold var: small taps: small 180 0 50 100 150 200 250 lag 0 6000 4000 2000 0 5 0 5 0.1 0 0.1 x 10 5 0 1 0.5 0 0.5 1 histogram at lag 128 histogram at lag 230 8000 6000 4000 2000 0 0.01 0 0.01 The tap distribution varies as the lag increases, becoming more heavy-tailed. The big taps are clustered together in lag, as are the small ones. 17
Proposed channel model: Saleh-Valenzuela (e.g., 802.15.4a) models are accurate but difficult to exploit in receiver design. We propose a structured-sparse channel model based on a 2-state Gaussian Mixture model with discrete-markov-chain structure on the state: CN(x j ;0,µ 0 j p(x j d j ) = ) if d j=0 small CN(x j ;0,µ 1 j ) if d j=1 big Pr{d j+1 = 1} = p 10 j Pr{d j = 0}+(1 p 01 j )Pr{d j = 1} Our model is parameterized by the lag-dependent quantities: {µ 1 j} : big-state power-delay profile {µ 0 j} : small-state power-delay profile {p 01 j } : big-to-small transition probabilities {p 10 j } : small-to-big transition probabilities Can learn these statistical params from observed realizations via the EM alg. 18
Factor graph for pilot-aided BICM-OFDM: uniform prior info bits code & interlv pilots & training coded bits symbol mapping QAM symbs OFDM obsv channel taps sparse prior tap states cluster prior b 1 b 2 b 3 c 0,1 c 0,2 c 1,1 c 1,2 c 2,1 c 2,2 c 3,1 c 3,2 s 0 y 0 x 1 d 1 s 1 y 1 x 2 d 2 s 2 y 2 x 3 d 3 s 3 y 3 SISO decoding GAMP MC = random variable = posterior factor To jointly infer all random variables, we perform loopy-bp via the sum-product algorithm, using GAMP approximations in the GAMP sub-graph. 19
Numerical results: Transmitter: OFDM with N = 1024 subcarriers. 16-QAM with multi-level Gray mapping LDPC codewords with length 10000 yielding spectral efficiency of 2 bpcu. P pilot subcarriers and T training MSBs. Channel: 802.15.4a outdoor-nlos (not our Gaussian-mixture model!) Length L = 256 = N/4. Reference Channel Estimation / Equalization Schemes: soft-input soft-output (SISO) versions of LMMSE and LASSO. perfect-csi genie. 20
BER versus E b /N o for P = 224 pilots and T = 0 training MSBs: BER 10 0 LMMSE 1 LMMSE 2 LMMSE fin LASSO 1 LASSO 2 LASSO fin 10 1 GAMP 1 GAMP 2 GAMP fin GAMP 1 MC 5 GAMP 2 MC 5 GAMP fin MC 5 PCSI 10 2 10 3 10 4 7 8 9 10 11 12 13 14 15 E b /N o [db] Our scheme shows 4dB improvement over (turbo) LASSO. Our scheme only 0.5dB from perfect-csi genie! 21
BER versus E b /N o for P = 0 pilots and T = 448 training MSBs: BER 10 0 LMMSE 1 LMMSE 2 LMMSE fin LASSO 1 LASSO 2 LASSO fin 10 1 GAMP 1 GAMP 2 GAMP fin GAMP 1 MC 5 GAMP 2 MC 5 GAMP fin MC 5 PCSI 10 2 10 3 10 4 7 8 9 10 11 12 13 14 15 E b /N o [db] Use of training MSBs gives 1dB improvement over use of pilot subcarriers! 22
2. Communications over Underwater Channels: SPACE-08 Underwater Experiment 2920156F038 C0 S6 Time-varying channel response estimated using WHOI M-sequence: 500 500 65 lag 450 400 350 300 250 200 150 100 0.5 absolute magnitude 0.4 0.3 0.2 0.1 lag 450 400 350 300 250 200 150 100 60 55 50 45 db 40 35 30 25 50 50 20 time 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 Hz 25 20 15 10 5 0 5 10 15 20 25 The channel is nearly over-spread: f d T s L = 20 1 10000 Can t afford to ignore structure of temporal variations! 400 = 0.8! 23
BICM-OFDM Factor Graph with Temporal Channel Structure: uniform prior b 1 b 2 b 3 SISO (de)coding info bits code & interlv pilots & training c 0,1 c 0,2 c 1,1 c 1,2 c 2,1 c 2,2 c 3,1 c 3,2 coded bits symbol mapping q 0 q 1 q 2 q 3 QAM symbs y 0 y 1 y 2 y 3 OFDM obsv GAMP h 1 h 2 h 3 channel taps BG prior a 1 s 1 a 2 s 2 a 3 s 3 amplitude & support time t Channel taps are modeled as independent Bernoulli-Gaussian processes: each tap s amplitude follows a temporal Gauss-Markov chain each tap s on/off state follows a temporal discrete-markov chain 24
Performance versus SNR: Settings: experimentally measured underwater channel 16-QAM 1024 total tones 0 pilot tones 256 training MSBs LDPC length 10k LDPC rate 0.5 BER 10 0 10 1 10 2 temporal no temporal 10 3 10 11 12 13 14 15 16 SNR (db) Exploiting the persistence in channel support and channel amplitudes was critical in this difficult underwater application. 25
3. Communications in Impulsive Noise: In many wireless and power-line communication systems, the (time-domain) noise is not Gaussian but impulsive. The marginal noise statistics are well captured by a 2-state Gaussian mixture (i.e., Middleton class-a) model. Noise burstiness is well captured by a discrete Markov chain on the noise state. 26
Factor Graph for pilot-aided BICM-OFDM: 27
Numerical Results Uncoded Case: Settings: 5 channel taps GM noise 256 total tones 15 pilot tones 80 null tones 4-QAM Proposed joint channel/impulsive-noise/symbol estimation (JCIS) scheme gives 15 db gain over previous state-of-the-art and performs within 1 db of MFB! 28
Numerical Results Coded Case: Settings: 10 channel taps GM noise 1024 total tones 150 pilot tones 0 null tones 16-QAM LDPC Rate 0.5 Length 60k Proposed joint channel/impulsive-noise/symbol/bit estimation (JCISB) scheme gives 15 db gain over traditional DFT-based receiver! 29
Conclusions: Inference in the generalized linear model yields an important but challenging class of problems. The generalized approximate message passing (GAMP) is a important new tool for solving such problems (under sufficiently large and dense transforms). Problems of this form manifest in BICM-OFDM comms receivers, where one wants to optimally decode bits in the presence of unknown channels, symbols, and noise. Often, the channel and noise processes have interesting statistical structures (e.g., sparsity, clustering, time-variation) and decoding performance can be dramatically improved when these structures are properly exploited. For such problems, GAMP can be plugged into the standard turbo receiver architecture to yield near-optimal performance with manageable complexity. 30