CASPER Workshop 2010 Low-Power Communications and Neural Spike Sorting CASPER Tools in Front-to-Back DSP ASIC Development Henry Chen henryic@ee.ucla.edu August, 2010
Introduction Parallel Data Architectures group @ UCLA EE Prof. Dejan Marković + 14 grad students http://icslwebs.ee.ucla.edu/dejan/researchwiki/ Effective handling of power efficiency and design complexity issues in scaled CMOS technologies Emphasis on DSP algorithms for communications and neuroscience
Energy-Delay Design Tradeoff suboptimal infeasible
Energy-Delay Optimization
Design Freedoms Algorithm Architecture (parallelize, pipeline, time-multiplex) Circuit (static/dynamic, DVFS, high/low V T, etc.) Design choices have cross-level implications
Hierarchical Optimization
Multi-Step Development Cycle Multiple engineering teams & design descriptions Algorithm (Matlab, C, C++) Architecture (Verilog, VHDL) Circuit (RTL, schematic/layout, standard cells) Test (pattern generator, logic analyzer) Each step requires translation & re-verification I.e., floating-point to fixed-point
Approach Unified Matlab/Simulink description Algorithm design and verification FPGA hardware emulation Optimized ASIC implementation ASIC test and verification Sounds like a familiar tune
Front-to-Back Solution Architectural & algorithmic exploration/optimization High-performance/high-throughput computation Closed-loop test environment
Neural Spike Sorting Design Optimization for khz-rate DSP Sarah Gibson, Vaibhav Karkare
Spike Sorting Neural microelectrodes often pick up signals from multiple neurons surrounding probe Each action potential needs to be resolved to its originating neuron Single-unit activity needed for basic neuroscience and biomedical applications
Spike Sorting Process Absolute value threshold Nonlinear energy operator Stationary wavelet transform product Principle component analysis Discrete wavelet transform Discrete derivatives Integral transform
On-chip Spike Sorting Conventional method: wired, software post-process Proposed: implantable wireless spike sorting
Need for khz-rate DSPs Signal bandwidths ~20kHz in biomedical apps. Implantable chips need to be << 800µW/mm 2 to prevent tissue damage Must have long battery life (~5-10yrs) or scavenge energy
Algorithm Selection Spike Detection MOPS Area (mm 2 ) Absolute Val. 0.4806 0.06104 NEO 4.224 0.02950 SWTP 86.75 56.70 Feature Extraction MOPS Area (mm 2 ) PCA 1.265 0.2862 DWT 3.125 0.06105 DD 0.1064 0.04725 IT 0.05440 0.03709
Interleaved Architecture
Multi-mode Spike Sorting Chip Power 130µW for 64 channels 52µW for 16 channels Area Die: 7.07mm 2 Core: 4mm 2 Classification accuracy: Over 90% for SNR > 0dB Power Density: 30µW/mm 2 91% data reduction
Flexible Radio Digital Front-Ends Chip Testing for GHz-rate DSP Rashmi Nanda
Why Flexible DFEs? Multi-radio integration (cellular, WiFi, GPS) Cognitive radios Evolving wireless standards WiMAX Cellular LTE
Conventional RX Architecture Downsides: No reconfigurability for multi-standard support Analog components don t get same power/area benefits from technology scaling
Replacing the RX Front-End
RX DFE Convert from ADC frequency F S1 to modem frequency F S2 with negligible SNR degradation
ASIC Test @ BWRC, ca. 2006
ASIC Testing w/ ROACH
Test Setup Matlab 1GbE PowerPC QDR SRAM BRAM FPGA LVDS IO ASIC Test Board ASIC
Testing Requirements High TX clock rate (400MHz target) Beyond practical limits of IBOB s V2P Long test vectors (~4Mb) Asynchronous clock domains for TX and RX
Asynchronous Clock Domains Manually merged separate designs for test vector and readback datapaths Fixed 60MHz RX 255-315 MHz TX
Results Tested to 315MHz (1.89GHz RF) w/ loadable vectors in QDR Up to 340MHz (2.04GHz RF) with pre-compiled vectors in ROMs Tech. / V DD 0.13µm/1.5V 65 nm / 1.0V Area??? 0.4 mm 2 Maximum 104 MHz 2.52 GHz input sample rate Maximum 5 MHz 20 MHz signal bandwidth Current consumption 21 ma 13 ma Maximum SNR 84 db (5MHz bandwidth) 55 db (20 MHz bandwidth)
Limitations DDR output FF critical path @ 340MHz (clock out) QDR SRAM bus interface critical path @ 315MHz Output clock jitter?
Summary Unified description for DSP design allows multi-level iterative optimization Can be applied to any power/performance specs Matlab + BORPH/KATCP provides great test environment for DSP applications
Thank you!