Lecture 19: Depth Cameras. Kayvon Fatahalian CMU : Graphics and Imaging Architectures (Fall 2011)

Similar documents
Active Stereo Vision. COMP 4102A Winter 2014 Gerhard Roth Version 1

Lecture 18: Light field cameras. (plenoptic cameras) Visual Computing Systems CMU , Fall 2013

CSE 165: 3D User Interaction. Lecture #7: Input Devices Part 2

Design of Temporally Dithered Codes for Increased Depth of Field in Structured Light Systems

CEE598 - Visual Sensing for Civil Infrastructure Eng. & Mgmt.

Intro to Virtual Reality (Cont)

Light-Field Database Creation and Depth Estimation

SMARTSCAN Smart Pushbroom Imaging System for Shaky Space Platforms

ECEN 4606, UNDERGRADUATE OPTICS LAB

6.A44 Computational Photography

Today I t n d ro ucti tion to computer vision Course overview Course requirements

Hochperformante Inline-3D-Messung

Virtual Reality I. Visual Imaging in the Electronic Age. Donald P. Greenberg November 9, 2017 Lecture #21

Introduction to Computer Vision

Image Filtering in Spatial domain. Computer Vision Jia-Bin Huang, Virginia Tech

Time of Flight Capture

Exercise questions for Machine vision

Probabilistic Robotics Course. Robots and Sensors Orazio

Gesture Recognition with Real World Environment using Kinect: A Review

THE VISIONLAB TEAM engineers - 1 physicist. Feasibility study and prototyping Hardware benchmarking Open and closed source libraries

Coding and Modulation in Cameras

Coded Aperture for Projector and Camera for Robust 3D measurement

product overview pco.edge family the most versatile scmos camera portfolio on the market pioneer in scmos image sensor technology

Video Registration: Key Challenges. Richard Szeliski Microsoft Research

Nikon COOLSCAN V ED Major Features

Lenses, exposure, and (de)focus

Image Formation. Dr. Gerhard Roth. COMP 4102A Winter 2015 Version 3

Computer Vision Slides curtesy of Professor Gregory Dudek

The techniques with ERDAS IMAGINE include:

Cameras. Shrinking the aperture. Camera trial #1. Pinhole camera. Digital Visual Effects Yung-Yu Chuang. Put a piece of film in front of an object.

Development of intelligent systems

Air Marshalling with the Kinect

Single Camera Catadioptric Stereo System

Book Cover Recognition Project

ME 6406 MACHINE VISION. Georgia Institute of Technology

Reikan FoCal Fully Automatic Test Report

CPSC 4040/6040 Computer Graphics Images. Joshua Levine

Omni-Directional Catadioptric Acquisition System

Criteria for Optical Systems: Optical Path Difference How do we determine the quality of a lens system? Several criteria used in optical design

Nikon SUPER COOLSCAN 5000 ED Major Features

How to Choose a Machine Vision Camera for Your Application.

Laser Scanning 3D Display with Dynamic Exit Pupil

ECEN 4606, UNDERGRADUATE OPTICS LAB

Coded photography , , Computational Photography Fall 2018, Lecture 14

Images and Graphics. 4. Images and Graphics - Copyright Denis Hamelin - Ryerson University

Image acquisition. In both cases, the digital sensing element is one of the following: Line array Area array. Single sensor

GESTURE RECOGNITION WITH 3D CNNS

High Performance Imaging Using Large Camera Arrays

HDR videos acquisition

Range Sensing strategies

CS559: Computer Graphics. Lecture 2: Image Formation in Eyes and Cameras Li Zhang Spring 2008

Unit 1: Image Formation

Cameras. Digital Visual Effects, Spring 2008 Yung-Yu Chuang 2008/2/26. with slides by Fredo Durand, Brian Curless, Steve Seitz and Alexei Efros

Flash Photography: 1

The Xbox One System on a Chip and Kinect Sensor

Digital Photogrammetry. Presented by: Dr. Hamid Ebadi

Study guide for Graduate Computer Vision

ABSTRACT 2. DESCRIPTION OF SENSORS

Spring 2018 CS543 / ECE549 Computer Vision. Course webpage URL:

Coded photography , , Computational Photography Fall 2017, Lecture 18

Selection of Temporally Dithered Codes for Increasing Virtual Depth of Field in Structured Light Systems

23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS. Sergii Bykov Technical Lead Machine Learning 12 Oct 2017

Dappled Photography: Mask Enhanced Cameras for Heterodyned Light Fields and Coded Aperture Refocusing

PLazeR. a planar laser rangefinder. Robert Ying (ry2242) Derek Xingzhou He (xh2187) Peiqian Li (pl2521) Minh Trang Nguyen (mnn2108)

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015

Multi-application platform for education & training purposes in photonical measurement engineering & quality assurance with image processing

High Fidelity 3D Reconstruction

Data Sheet SMX-160 Series USB2.0 Cameras

Full Waveform Digitizing, Dual Channel Airborne LiDAR Scanning System for Ultra Wide Area Mapping

White Paper High Dynamic Range Imaging

Revolutionizing 2D measurement. Maximizing longevity. Challenging expectations. R2100 Multi-Ray LED Scanner

Midterm Examination CS 534: Computational Photography

Cameras. Outline. Pinhole camera. Camera trial #1. Pinhole camera Film camera Digital camera Video camera

Journal of Mechatronics, Electrical Power, and Vehicular Technology

Simultaneous geometry and color texture acquisition using a single-chip color camera

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

Basic Digital Image Processing. The Structure of Digital Images. An Overview of Image Processing. Image Restoration: Line Drop-outs

Structured-Light Based Acquisition (Part 1)

Fein. High Sensitivity Microscope Camera with Advanced Software 3DCxM20-20 Megapixels

Reikan FoCal Fully Automatic Test Report

Toward an Augmented Reality System for Violin Learning Support

A Structured Light Range Imaging System Using a Moving Correlation Code

Introduction to DSP ECE-S352 Fall Quarter 2000 Matlab Project 1

Information & Instructions

Blind navigation with a wearable range camera and vibrotactile helmet

Computational Photography: Illumination Part 2. Brown 1

Robot Visual Mapper. Hung Dang, Jasdeep Hundal and Ramu Nachiappan. Fig. 1: A typical image of Rovio s environment

Practical Image and Video Processing Using MATLAB

VC 11/12 T2 Image Formation

pco.edge 4.2 LT 0.8 electrons 2048 x 2048 pixel 40 fps up to :1 up to 82 % pco. low noise high resolution high speed high dynamic range

Deblurring. Basics, Problem definition and variants

Spatially Resolved Backscatter Ceilometer

GESTURE RECOGNITION SOLUTION FOR PRESENTATION CONTROL

Why learn about photography in this course?

Robust Hand Gesture Recognition for Robotic Hand Control

Advanced 3D Optical Profiler using Grasshopper3 USB3 Vision camera

Images and Displays. Lecture Steve Marschner 1

GEO 428: DEMs from GPS, Imagery, & Lidar Tuesday, September 11

Photography PreTest Boyer Valley Mallory

lecture 24 image capture - photography: model of image formation - image blur - camera settings (f-number, shutter speed) - exposure - camera response

Transcription:

Lecture 19: Depth Cameras Kayvon Fatahalian CMU 15-869: Graphics and Imaging Architectures (Fall 2011)

Continuing theme: computational photography Cheap cameras capture light, extensive processing produces desired image Today: - Capturing depth in addition to light intensity

Why might we want to know the depth of scene objects? Scene Understanding Navigation Tracking Mapping Segmentation

Depth from time-of-flight Conventional LIDAR - Laser beam scans scene (rotating mirror) - Low frame rate to capture entire scene Time-of-flight cameras - No moving beam, capture image of scene with each light pulse - Special CMOS sensor records a depth image - High frame rate - Today: still low resolution, expensive (but dropping fast)

Computing depth from images Binocular stereo 3D reconstruction of P: depth from disparity Focal length: f Baseline: b Disparity: d = x - x P z f b f x x Simple reconstruction example: cameras aligned (coplanar sensors), separated by known distance, same focal length

Correspondence problem How to determine which pairs of pixels in image 1 and image 2 correspond to the same scene point?

Epipolar constraint Determine Pixel Correspondence Pairs of points that correspond to same scene point epipolar line epipolar plane epipolar line Epipolar Constraint Reduces correspondence problem to 1D search along conjugate epipolar lines Slide credit: S. Narasimhan

Solving correspondence (basic algorithm) For each epipolar line For each pixel in the left image compare with every pixel on same epipolar line in right image pick pixel with minimum match cost Improvement: match windows This should look familiar... Correlation, Sum of Squared Difference (SSD), etc. Assumptions? Slide credit: S. Narasimhan

Correspondence: robustness challenges Scene with no texture (many parts of the scene look the same) Non-lambertian surfaces (scene appearance dependent on view) Pixel pairs may not be present (occlusion from one view)

Depth from defocus Aperture: a P Circle-of-confusion: c z Thin lens approximation: a z f c

Structured light One light source emitting known beam, one camera If the scene is at reference plane, image recorded by camera is known Reference plane zref z b Known light source f Single spot illuminant is inefficient! (must to scan scene with spot to get depth) d

Structured light Simplify correspondence problem by encoding spatial position in illuminant Image: Zhang et al. Projected light pattern Camera image

Microsoft Kinect Illuminant (Infrared Laser + diffuser) RGB CMOS Sensor 640x480 (w/ Bayer mosaic) Monochrome Infrared CMOS Sensor (Aptina MT9M001) 1280x1024 ** ** Kinect returns 640x480 disparity image, teardowns suspect sensor configured for 2x2 binning down to 640x512, then crop

Infrared image of Kinect illuminant output Credit: www.futurepicture.org

Infrared image of Kinect illuminant output Credit: www.futurepicture.org

Computing disparity for scene Region-growing algorithm for compute efficiency ** (Assumption: spatial locality likely implies depth locality) 1. Choose output pixels in infrared image, classify as UNKNOWN or SHADOW (based on whether speckle is found) 2. While significantly large percentage of output pixels are UNKNOWN - Choose an UNKNOWN pixel. Correlate surrounding NxM pixel window with reference image to compute disparity D=(dx,dy) (note: search window is a horizontal swath of image, plus some vertical slack) - If sufficiently good correlation is found: - Mark pixel as a region anchor - Attempt to grow region around the anchor: - Place region anchor in FIFO, mark as ACTIVE - While FIFO not empty - Extract pixel P from FIFO (known disparity for P is D) - Attempt to establish correlations for UNKOWN neighboring (left,right,top,bottom) pixels of P by searching region D + (+/-1,+/1) - If correlation found, mark pixel as ACTIVE, set, parent to P, add to FIFO - Else, mark pixel as EDGE, set depth to depth of P. ** Source: PrimeSense Patent WO 2007/043036 A1. (Likely not be actual algorithm used by Kinect)

Kinect block diagram Disparity calculations performed by PrimeSense ASIC in Kinect, not by XBox 360 (or PC) CPU Infrared Sensor RGB Sensor Illuminant Kinect Image processing ASIC USB bus Cheap sensors: ~ 1 MPixel 640x480 x 30fps RGB image 640x480 x 30fps Disparity image Cheap illuminant: laser + diffuser makes random dot pattern (not a traditional projector) Custom image-processing ASIC to compute disparity image (scale-invariant region correlation involves non-trivial compute cost) Box 360 CPU

Extracting the player s skeleton [Shotton et al. 2011] (enabling full-body game input) Challenge: how to determine player s position/motion from depth images... without consuming a large fraction of the XBox 360 s compute capability Depth Image Character Joint Angles

Key idea: segment pixels into body regions [Shotton et al. 2011] Published description represents body with 31 regions

Pixel classification [Shotton et al. 2011] For each pixel: compute features from depth image Pixel classifier learned from large database of motion capture data Result: (Prob. pixel x in depth image I is part of body part c) Two example depth features Per-pixel probabilities aggregated to compute 3D spatial density function for each body part, joint angles inferred from this density

Performance result Real-time skeleton estimation from depth image requires < 10% of Xbox 360 CPU XBox GPU-based implementation @ 200Hz (research implementation, not used in product)

XBox 360 + Kinect system Infrared Sensor RGB Sensor Illuminant Disparity computations (create depth image) Image processing ASIC Kinect USB bus 640x480 x 30fps RGB image 640x480 x 30fps Disparity image Skeleton inference CPU CPU CPU GPU XBox 360 1 MB Shared L2 10 MB Embedded DRAM

Summary Kinect hardware = cheap depth sensor - Structured light pattern generated by scattering infrared laser - Depth obtained from triangulation, not time-of-flight - Custom ASIC to convert infrared image into depth values Interpretation of the depth values is performed on CPU - Player skeleton estimation made computational feasible by machine learning approach Future - Calls for higher field of view, higher resolution depth