Correlator Development at Haystack Roger Cappallo Haystack-NRAO Technical Mtg. 2006.10.26
History of Correlator Development at Haystack ~1973 Mk I 360 Kb/s x 2 stns. 1981 Mk III 112 Mb/s x 4 stns. 1986 Mk IIIa 256 Mb/s x 8 stns. 1999 Mk IV 1 Gb/s x 16 stns. (new correlator ASIC) All designs have been XF, albeit with increasing numbers of lags
Recent Mk4 Correlator Work Interface to Mk5A & Mk5B playback systems Conversion from hp/ux to linux Software maintenance (ugh!)
Current & Future Correlator Work Future VLBI correlators likely to be manageable in software (or perhaps hardware-assisted software a la the Cray XD-1) Correlator FoV shaping for SKA MWA Correlator
Motivation for FoV Shaping SKA specifications extremely challenging High resolution, wide FoV, high D.R. Many have argued that solution is large-n Small antennas (small d a ) large FoV Superb uv coverage high dynamic range and many other advantages Concerns about data volume Perley and Clark (2003) data handling problem d -6 a Cornwell (2004) no, it s worse: d -8 a (Both studies assume full cross-correlation)
Fields of View VLBI: High resolution small FoV Antenna receives flux from sub-arcmin patch, sensitivity low enough so sky is empty other than target source not true for SKA! Correlator restricts imaging FoV to be yet smaller Data rates For VLBI resolution and SKA FOV, petabytes/sec needed Can reduce by station beamforming, or big dishes+fpas In practice, high resolution requires restriction of FOV, but Antenna size is not the only way to do it!
Resolution vs. FOV
Correlator FoV Correlators have a field of view Different names: Time-average and bandwidth smearing Delay-rate beam Caused by coherence loss over time/frequency range Integration occurs over (f, t) cell Maps to region of (u, v) plane Sources distant from correlator phase center generate phase slope in (u, v) plane Problem: (f, t) to (u, v) mapping is strong function of baseline Correlator FoV is inherently inconsistent Array PSF becomes variable across primary FoV Solution: control the correlator FoV?
Matched-FoV correlation Windowing in image plane equivalent to convolution in UV plane (with Fourier transform of windowing function) Short baselines need linear combination over larger t & f Apply a function in (f, t) space, per visibility! Apply weights to samples inside correlator Match FoV for all visibilities Cleanly removes sensitivity to distant sources Also removes their sidelobes Reduces effective FoV Reduces data rate/volume accordingly
New correlator elements Station bitstream Fourier transform Station bitstream Fourier transform Accumulate in bin Weight lookup Accumulate to output shaped visibility
Limitations Baseline length range Short baselines need a lots of (f, t) space Calibration parameters must be stable over this space Simple tests suggest 100:1 baseline length range Effects of data editing Non-contiguous time/frequency coverage RFI excision Scan boundaries, finite scan times The smaller the desired FOV, the more severe the effects Both these limitations are the subject of detailed simulation Also, practical schemes for doing the necessary operations in hardware are being explored
Correlation for Large-N Arrays Fundamental problem have to bring ~N 2 /2 signals together for combination e.g. for the MWA/LFD there are 5x10 5 pairs; for an SKA based on 12 m dishes, there are 4x10 8 pairs In order of decreasing expense, this can be done by replication and wiring within: External cabling Backplane traces Board traces routing traces local fabric
Correlation for Large-N Arrays cont d. Our early efforts at SKA designs involved large systolic arrays, for example
A B C D E F G H antenna groups A..H each 2 Gb/s fiber o/e o/e o/e o/e o/e o/e o/e o/e carries 64 antennas a o/e b o/e c o/e d o/e 64 e o/e uvw binner f g o/e o/e e/o 1 Gb/s to uvw adder tree h o/e antenna groups a..h
Correlation for Large-N Arrays cont d. Logical solution, arrived independently at Haystack and by Bunton et al., involves splitting by frequency and doing as much antenna cross-correlation in one place as possible Ideally: Use an FX design, and subdivide frequency channels finely enough to do all antenna pairs in one place Practical limitations (primarily local RAM) lead to a hierarchical MWA design partitioned with cells of 16x16 antennas per multiplier, with all antennas on one board for a 0.5 MHz slice
MWA-LFD General Properties 500 antenna tiles, 80-300 MHz Each a 4x4 crossed dipole array Electronic analog steering of tile beam Total collecting area ~8000 m 2 at 150 MHz Direct sampling of RF after amplification and filtering 8 bits/sample fine with low RFI environment in Mileura Full cross-correlation architecture Simpler, easier, cheaper, better for wide FOV Leveraging rapid advances in digital electronics 32 MHz processed bandwidth Distributed FX architecture, -based
LFD General Properties (cont d) Tiles scattered across 1.5 km region Angular resolution: a few arcmin Superb instantaneous PSF characteristics Central condensation for sensitivity at large spatial scales
Physical Layout Antenna tile (~4m diam.) Cluster (50-100m diam.) tile Array (~1.5km diam.) node clusters Tile beamformer Coax out Fiber out Central Processing
Configuration and UV Coverage
Tile Design 16 dipoles ~4m x 4m ground screen Dual-polarization 80-300 MHz Analog beamformer 30 min elevation Early Demonstrator Target cost $2000 each
Digital Receiver For each of 16 analog inputs, band from 80-300 MHz Nyquist sampled 1 st stage filter bank (running at ~640 MHz) generates ~256 x 1 MHz channels, of which 32 are selected for further processing Complex spectral points (5 Re + 5 Im bits) are reordered, aggregated, packetized, and transmitted to the appropriate correlator PFB board Total data rate to correlator: 320 Gb/s, via 128 optical fibers
Correlator PFB Board Receives optical inputs from nodes Reorders data into desired fat order for 2 nd stage PFB (cube rotation) Filters 32 x 1 MHz channels into 4K x 8 KHz channels Reorders data via another cube rotation for input to correlator boards Exports data to correlator in 0.5 MHz slices over electrical high speed serial interconnect
Widefield Correlator Cross-multiply of complex voltage spectra, without fringe rotation or gain correction 125K baselines x 4 pol s * 32 MS/s requires 16 TCMACs (about same as WIDAR) Complex (4bit) multiply done in single multiplier Correlator is partitioned by frequency slice, into 64 boards, each processing 0.5 MHz of bandwidth Local accumulation for 512 pts (64 ms), then LTA w/ RAM accumulates to 0.5s Yields 2x10 9 visibilities per 0.5 s AP dump 4x10 9 vis/s transmitted to realtime computer
Correlation Cell Input 2x16 antenna samples of single time point 4bit complex multiply in 18x18-bit multiplier Accumulate to block RAM Calculate 256 correlations for each of 512 successive time samples Data reordered in filterbank Applicable to both xntd and LFD 8-bit shift register Dual Port Memory MUX Latch Data loading Two 4+4 bit outputs Complex multiplier Resources one 18-bit multipliers Dual 4+4 bit inputs - 9+9 bit output 18+18 bit outputs + MUX Latch Single Dual Port Memory - 2 x 256 36-bit words Serial input data 8-bit shift register Dual Port Memory 36-bit shift register Serial output data
Beamformers Used for CME tracking, pulsars, variable source monitoring, etc. Form 16 beams Done independently for 8 KHz channels Arbitrary pointing, though sensitivity may be ~15 db lower outside of tile beam Linear combination of antenna signals and factors including antenna gains, geometric and ionospheric phase factors, weighting. Total computation ~ 0.5 Top/s Distributed across correlator boards
Realtime Computer Specs Dataflow in: Visibilities: 4x10 9 /sec (128 Gb/s) Beams: 512x10 6 complex samp/s (8 Gb/s) Internal dataflow: ~30 Gb/s Computation: ~80 Gop/s Computing environment: At least 12 major apps with varied requirements Must combine performance with flexibility Candidate system: Cray XD-1