Short-course Compressive Sensing of Videos

Short-course Compressive Sensing of Videos Venue CVPR 2012, Providence, RI, USA June 16, 2012 Richard G. Baraniuk Mohit Gupta Aswin C. Sankaranarayanan Ashok Veeraraghavan

Tutorial Outline Time Presenter Topic 1:30 2:00 Mohit Gupta Columbia University 2:00 3:00 Aswin Sankaranarayanan Rice University Introduction and Motivation Compressive Sensing Theory and Sparse Representations 3:30 4:30 Ashok Veeraraghavan Rice University 4:30 5:00 Mohit Gupta Columbia University Compressive Video Sensing Systems Discussion of CS in Other Domains and Related Problems

Space Shuttle Discovery Flight Deck Gigapan: 2.74 Gigapixels http://www.gigapan.com/gigapans/102753

Still Life Gigapan: 0.88 Gigapixels http://www.gigapan.com/gigapans/105851

Playing Drums Frame Rate: 50 fps

Playing Drums Frame Rate: 500 fps

Splashing Marbles Frame Rate: 50 fps

Splashing Marbles Frame Rate: 500 fps

Detail Fascinates

Automotive Testing Frame Rate: 2000 fps

Biomechanical Analysis Frame Rate: 2000 fps

Military Testing Frame Rate: 2000 fps

Selling Insurance Frame Rate: 4000 fps

Promoting HDTV Frame Rate: 1000 fps

Microscopy Frame Rate: 500 fps

Golf Swing Test Frame Rate: 10000 fps

Nature Frame Rate: 1000 fps Images captured for the BBC production "Life"

Nature Frame Rate: 1000 fps

High-Speed Schlieren Imaging Frame Rate: 500 fps

Capturing Photons Frame Rate: Trillion fps http://cameraculture.media.mit.edu/femtotransientimaging

Fun Frame Rate: 2000 fps

Cost: High-Performance Video Cameras Product Name Cost for Demo Unit Cost for New Unit SA5 775K M1 (MONO 8 GIGS) $68,500 $90,000 SA5 775K M2 w/ mech. Shutter SA5 1000K C2 RV COLOR -16 GIGS BC2 HD with Keypad SA2 M2 (MONO 16 GIG HIGH DEF ) $77,000 $80,000 $90,000 $55,500 $103,120 $113,120 $132,400 $100,000 No consumer high-performance sensors (>1MP, >1000fps) Photron cameras. Quotation source: Email from techimaging.com representative.

Why are these sensors so expensive? 1. Light Limitation Incident Illumination Space-Time Volume Scene y t Reflected Illumination Sensor x

Light Limitation Signal Level (electrons) Exposure time Incident Illumination Pixel Size F-number Scene Reflectivity Quantum Efficiency [Cossairt et al., 2012]

Light Limitation I src (lux) 2 x 10-3 1 x 10-2 2 10 10 2 10 3 10 4 number of electrons 6.2 x 10-4 3.1 x 10-3 0.62 3.08 30.4 304 3040 1000 FPS, 10MP camera: Exposure time of t = 1/1000 seconds, Pixel size of = 4µm. [Cossairt et al., 2012]

Light Limitation I src (lux) 2 x 10-3 1 x 10-2 2 10 10 2 10 3 10 4 number of electrons 6.2 x 10-4 3.1 x 10-3 0.62 3.08 30.4 304 3040 Highly sensitive sensors required [Cossairt et al., 2012]

Why are these sensors so expensive? 2. Noise signal photon noise dark noise read noise Slide: Courtesy Marc Levoy

SNR Over The Years http://www.dxomark.com/index.php/publications/ DxOMark-Insights/SNR-evolution-over-time Sensor technology has improved significantly over the years. But total number of voxels per unit volume has risen to offset these improvements. So, SNR has remained static. For higher-performance cameras of the future, sensor technology has to keep up with the rising number of voxels. Slide: Courtesy Marc Levoy

Why are these sensors so expensive? 3. Bandwidth Frame-rate is limited by the sensor readout rate Analog-to-digital conversion Time required to clear charge from the parallel register. Shutter opening delay in CCDs employing mechanical shutters. 1MP x 1000fps x 16-bit pixels = 4GB/s Expensive!

Spatio-Temporal Resolution Tradeoff Single image Spatial Resolution = 1X Temporal Resolution = 1X 30

Spatio-Temporal Resolution Tradeoff Captured Interpolated Thin-out Movie Movie (Row-wise sub-sampling) Spatial Resolution = 1/4X Temporal Resolution = 4X

Spatio-Temporal Resolution Tradeoff Captured Interpolated Thin-out Movie Movie (Row-wise sub-sampling) Spatial Resolution = 1/36X Temporal Resolution = 36X 32

Spatio-Temporal Resolution Tradeoff High-speed, High-res Video

Why are these sensors so expensive? 4. Non-visible wavelength sensors Infrared image Infrared camera (FLIR T620) Resolution: 640x480. Cost: $26,000. Expensive!

Do we need to capture all this data?

Redundancy in Visual Data Raw Captured Image (1.2MB) JPEG Compressed Image (40 KB) 30X Compression without significant loss of visual quality

Redundancy in Visual Data Raw Captured Video (270MB) H.264 Compressed Video (1.8 MB) 150X Compression without significant loss of visual quality

Redundancy in Visual Data Raw Captured Video (270MB) H.264 Compressed Video (1.8 MB) Massive data acquisi-on Most of the data is redundant and can be thrown away

The Sparseland Model for Images (Videos) Each image (video) patch = Sparse linear combination of dictionary atoms Slide courtesy: Guillermo Sapiro

The Sparseland Model for Images (Videos) Examples of dictionary: Wavelets, DCT, learned dictionaries Slide courtesy: Guillermo Sapiro

Capturing Relevant Data One can regard the possibility of digital compression as a failure of sensor design. If it is possible to compress measured data, one might argue that too many measurements were taken. David Brady

Capturing Relevant Data Can we design sensing systems that capture only the relevant data?

What is Compressive Sensing? Compressive Sensing is data acquisition protocols which directly acquire just the important information. Compressive Sensing is about acquiring and recovering a signal in the most efficient way possible.

Compressive Sensing Acquisition: Time-domain measurements Frequency-domain measurements Important to Take `Good Measurements

Compressive Sensing: Challenges What are good measurements? How to take measurements in sparse domain?

Compressive Sensing: Enablers Incoherent Measurements for Sparse Recovery Signal Measurements Signal is local, measurements are global Each measurement picks up a little information about each component See papers by Candes, Romberg, Tao, Donoho for details Slide courtesy: Emmanuel Candes

Compressive Sensing: Challenges What are good measurements? How to take coded measurements?

Compressive Sensing: Enablers Computational Imaging and Optical Devices Conventional Camera Computational Camera A computational camera uses a combination of novel optics to map rays to pixels in some unconventional fashion. The captured image is optically coded and may not be meaningful in its raw form. The computational module decodes the captured image. See papers by Nayar, Levoy, Raskar, Freeman, Durand.

Novel Optical Devices Digital Micro-mirror Device (DMD) [10KHz] Single Pixel Camera [Rice] Liquid Crystal on Silicon (LCoS) [5KHz] Compressive Video Acquisition System [MERL, Rice, Columbia]

Why is Video Compressive Sensing Hard?

Specifications of the Human Eye Spatial resolution: Approximately 500 MegaPixels Temporal resolution: Approximately 15-20 Frames Per Second http://www.clarkvision.com/articles/eye-resolution.html

Eye as a Jitter Camera Eye FOV `Single-view Resolution

Eye as a Jitter Camera Eye FOV Higher Resolution Jitter Camera [Ben Ezra and Nayar]

Jittering in Time? Time is ephemeral in nature. Hard to take multiple measurements of the same duration. Can use multiple cameras, but it is an expensive solution Other practical issues such as registration. High-speed video using a camera array [Levoy et al.]

Tutorial Outline Time Presenter Topic 1:30 2:00 Mohit Gupta Introduction and Motivation 2:00 3:00 Aswin Sankaranarayanan Compressive Sensing Theory and Sparse Representations 3:30 4:30 Ashok Veeraraghavan Compressive Video Sensing Systems 4:30 5:00 Mohit Gupta Discussion of CS in Other Domains and Related Problems