Lecture 24: Special Topic: Virtual Reality Computer Graphics and Imaging UC Berkeley CS184/284A, Spring 2016 Credit: Kayvon Fatahalian created the majority of these lecture slides
Virtual Reality (VR) vs Augmented Reality (AR) VR = virtual reality User is completely immersed in virtual world (sees only light emitted by display) AR = augmented reality Display is an overlay that augments user s normal view of the real world (e.g., Terminator) Image credit: Terminator 2 (naturally)
VR Head-Mounted Displays (HMDs) Oculus Rift Sony Morpheus HTV Vive Google Cardboard
AR Headsets Microsoft Hololens Meta
Field of View Regular 2D panel displays have windowed FOV User orients themselves to the physical window of the display VR/AR displays provide 360 degree FOV Displays attached to head Head orientation is tracked physically Rendered view synchronized to head orientation in realtime (much more on this later)
3D Visual Cues Panel displays give 3D cues from monocular rendering Occlusion, perspective, shading, focus blur, Uses z-buffer, 4x4 matrices, lighting calculation, lens calculations VR/AR displays add further 3D cues Stereo: different perspective view in left/right eyes Physically send different images into each eye Parallax (user-motion): different views as user moves Uses head-tracking technology coupled to perspective rendering
VR Gaming Bullet Train Demo (Epic)
VR Video Jaunt VR (Paul McCartney concert)
VR Video
VR Teleconference / Video Chat http://vrchat.com/
Today: Tracking and Rendering Challenges of VR Since you are now all experts in rendering, today we will talk about the unique challenges of rendering in the context of modern VR headsets. We will also touch on head tracking. VR presents many other difficult technical challenges display technologies accurate tracking of face, head, and body position haptics (simulation of touch) sound synthesis user interface challenges (inability of user to walk around environment, how to manipulate objects in virtual world) content creation challenges and on and on
VR Headset Components
Google Cardboard Use mobile phone display inside inexpensive headset with lenses Use phone s camera and gyro for tracking view direction Stereo 360 degree experience, no head-motion parallax Mobile phone display Phone camera used for tracking head position Lenses in cardboard holder Image credits: slashgear.com, Google
Oculus Rift Oculus Rift headset has most documentation of current systems, so will use for this explanation. CS184/284A, Lecture 24 Ren Ng, Spring 2016
Oculus Rift Image credit: ifixit.com https://www.ifixit.com/teardown/oculus+rift+cv1+teardown/60612
Oculus Rift Intra-ocular distance adjustment Image credit: ifixit.com
Oculus Rift Image credit: ifixit.com
Oculus Rift Fresnel eyepiece lens 1080x1200 display, 90 Hz Image credit: ifixit.com
Oculus Rift Lenses Fresnel eyepiece lens Image credit: ifixit.com
Role of Optics field of view 1. Create wide field of view 2. Place focal plane at several meters away from eye (close to infinity) Note: parallel lines reaching eye converge to a single point on display (eye accommodates to plane near infinity) eye OLED display Lens diagram from Open Source VR Project (OSVR) (Not the lens system from the Oculus Rift) http://www.osvr.org/
Accommodation and Vergence Accommodation: changing the optical power of the eye (lens) to focus at different distances Eye accommodated to focus on a distant object Eye accommodated to focus on a nearby object Vergence: rotation of the eye in its socket to ensure projection of object is centered on the retina
Accommodation Vergence Conflict Given design of current VR displays, consider what happens when objects are up-close to eye in virtual scene Eyes must remain accommodated to near infinity (otherwise image on screen won t be in focus) But eyes must converge in attempt to fuse stereoscopic images of object up close Brain receives conflicting depth clues (discomfort, fatigue, nausea) This problem stems from nature of display design. If you could just make a display that emits the light field that would be produced by a virtual scene, then you could avoid the accommodation - vergence conflict
Aside: Near-Eye Light Field Displays Goal: recreate light field in front of eye Lanman and Luebke, SIGGRAPH Asia 2013.
Head Tracking
Head Tracking Need to track 3D position and orientation of head and eyes to render left/right viewpoints correctly High positional accuracy needed (e.g. 1 mm), because user can move very close to objects and very precisely relative to them Rendering needs to reflect this view Ideas on how to track position and orientation of a VR headset? CS184/284A, Lecture 24 Ren Ng, Spring 2016
Google Cardboard: Tracking Using Headset Camera Tracking uses rear-facing camera / gyro to estimate user s viewpoint 2D rotation tracking generally works well 3D positional tracking a challenge in general environments CS184/284A, Lecture 24 Ren Ng, Spring 2016
Environment-Supported Vision-Based Tracking? Image credit: gizmodo.com Early VR test room at Valve, with markers positioned throughout environment
Oculus Rift IR LED Tracking System Oculus Rift + IR LED sensor
Oculus Rift LED Tracking System (DK2) Headset contains: 40 IR LEDs Gyro + accelerometer (1000Hz) External 60Hz IR Camera Image credit: ifixit.com Photo taken with IR-sensitive camera (IR LEDs not visible in real life)
Oculus Rift IR LED Tracking Hardware Photo taken with IR-sensitive camera https://www.ifixit.com/teardown/oculus+rift+constellation+teardown/61128
Oculus Rift IR Camera IR filter (blocks visible spectrum) Camera lens CMOS sensor Note: silicon is sensitive to visible and IR wavelengths https://www.ifixit.com/teardown/oculus+rift+constellation+teardown/61128
Recall: Passive Optical Motion Capture Retroflective markers attached to subject IR illumination and cameras Markers on subject Positions by triangulation from multiple cameras 8+ cameras, 240 Hz, occlusions are difficult Slide credit: Steve Marschner
Active Optical Motion Capture Each LED marker emits unique blinking pattern (ID) Reduce marker ambiguities / unintended swapping Have some lag to acquire marker IDs Phoenix Technology Phase Space
Oculus Rift Uses Active Marker Motion Capture Credit: Oliver Kreylos, https://www.youtube.com/watch?v=o7dt9im34oi Motion capture: unknown shape, multiple cameras VR head tracking: known shape, single camera
6 DOF Head Pose Estimation Head pose: 6 degrees of freedom (unknowns) 3D position and 3D rotation of headset (e.g. can represent as 4x4 matrix) Inputs: Fixed: relative 3D position of markers on headset (e.g. can represent each marker offset as 4x4 matrix) Fixed: camera viewpoint (ignoring distortion, also a 4x4 projective mapping of 3D scene to 2D image) Each frame: 2D position of each headset marker in image Pose calculation: Write down equations mapping each marker to image pixel location as a function of 6 degrees of freedom Solve for 6 degrees of freedom (e.g. least squares) CS184/284A, Lecture 24 Ren Ng, Spring 2016
HTC Vive Tracking System ( Lighthouse ) Structured light transmitter Photodiode arrays on headset and hand-held controllers
Vive Headset & Controllers Have Array of IR Photodiodes (Prototype) Headset and controller are covered with IR photodiodes Image credit: uploadvr.com IR photodiode
HTC Vive Structured Light Emitter ( Lighthouse ) Light emitter contains array of LEDs (white) and two spinning wheels with lasers Sequence of LED flash and laser sweeps provide structured lighting throughout room Credit: Gizmodo: http://gizmodo.com/this-is-how-valve-s-amazing-lighthouse-tracking-technol-1705356768
HTC Vive Tracking System For each frame, lighthouse does the following: LED pulse, followed by horizontal laser sweep LED pulse, followed by vertical laser sweep Each photodiode on headset measures time offset between pulse and laser arrival Determines the x and y offset in the lighthouse s field of view In effect, obtain an image containing the 2D location of each photodiode in the world (Can think of the lighthouse as a virtual camera ) CS184/284A, Lecture 24 Ren Ng, Spring 2016
HTC Vive Tracking System ( Lighthouse ) Credit: rvdm88 / youtube. https://www.youtube.com/watch?v=j54dottt7k0
Tracking Summary Looked at three tracking methods Camera on headset + computer vision + gyro External camera + marker array on headset External structured light + sensor array on headset 3D tracking + depth sensing an active research area SLAM, PTAM, DTAM Microsoft Hololens, Google Tango, Intel Realsense, CS184/284A, Lecture 24 Ren Ng, Spring 2016
Rendering Challenges in VR
Name of the Game, Part 1: Low Latency The goal of a VR graphics system is to achieve presence, tricking the brain into thinking what it is seeing is real Achieving presence requires an exceptional low-latency system What you see must change when you move your head! End-to-end latency: time from moving your head to the time new photons hit your eyes Measure user s head movement Update scene/camera position Render new image Transfer image to headset, then transfer to display in headset Actually emit light from display (photons hit user s eyes) Latency goal of VR: 10-25 ms Requires exceptionally low-latency head tracking Requires exceptionally low-latency rendering and display
Thought Experiment: Effect of Latency Consider a 1,000 x 1,000 display spanning 100 field of view 10 pixels per degree Assume: You move your head 90 in 1 second (only modest speed) End-to-end latency of system is a slow 50 ms (1/20 sec) Result: Displayed pixels are off by 4.5 ~ 45 pixels from where they would be in an ideal system with 0 latency Example credit: Michael Abrash
Eyes designed by SuperAtic LABS from the thenounproject.com Name of the Game, Part 2: High Resolution 160 o ~5 o Human: ~160 view of field per eye (~200 overall) (Note: does not account for eye s ability to rotate in socket) Future retina VR display: 57 ppd covering 200 = 11K x 11K display per eye = 220 MPixel iphone 6: 4.7 in retina display: 1.3 MPixel 326 ppi 57 ppd Strongly suggests need for eye tracking and foveated rendering (eye can only perceive detail in 5 region about gaze point
Foveated Rendering high-res image med-res image low-res image Idea: track user s gaze, render with increasingly lower resolution farther away from gaze point Three images blended into one for display
Requirement: Wide Field of View View of checkerboard through Oculus Rift (DK2) lens 100 Lens introduces distortion Pincushion distortion Chromatic aberration (different wavelengths of light refract by different amount) Icon credit: Eyes designed by SuperAtic LABS from the thenounproject.com Image credit: Cass Everitt
Software culusworlddemo Compensation for Lens Distortion Step 1: Render using traditional pipeline at full resolution for each eye Figurescene 4: Screenshot of thegraphics OculusWorldDemo application. Step 2: Warp images in manner that scene appears correct after physical lens distortion (Can use separate distortions to R, G, B to approximately correct chromatic aberration) Image credit: Oculus VR developer guide
Challenge: Rendering via Planar Projection Recall: rasterization-based graphics is based on perspective projection to plane Distorts image under high FOV, as needed in VR rendering Recall: VR rendering spans wide FOV Image credit: Cass Everitt Pixels span larger angle in center of image (lowest angular resolution in center) Future investigations may consider: curved displays, ray casting to achieve uniform angular resolution, rendering with piecewise linear projection plane (different plane per tile of screen)
Consider Object Position Relative to Eye X time X (position of object relative to eye) time X (position of object relative to eye) Case 1: object stationary relative to eye: (eye still and red object still OR red object moving left-to-right and eye moving to track object OR red object stationary in world but head moving and eye moving to track object) Case 2: object moving relative to eye: (red object moving from left to right but eye stationary, i.e., it s looking at a different stationary point in world) Spacetime diagrams adopted from presentations by Michael Abrash Eyes designed by SuperAtic LABS from the thenounproject.com
Effect of Finite Frame Rate and Latency: Judder time X X frame 0 X frame 0 frame 1 frame 1 frame 2 frame 2 frame 3 frame 3 Case 2: object moving from left to right, eye stationary (eye stationary with respect to display) Continuous representation. Case 2: object moving from left to right, eye stationary (eye stationary with respect to display) Light from display (image is updated each frame) Case 1: object moving from left to right, eye moving continuously to track object (eye moving relative to display!) Light from display (image is updated each frame) Explanation: since eye is moving, object s position is relatively constant relative to eye (as it should be, eye is tracking it). But due discrete frame rate, object falls behind eye, causing a smearing/strobing effect ( choppy motion blur). Recall from earlier slide: 90 degree motion, with 50 ms latency results in 4.5 degree smear Spacetime diagrams adopted from presentations by Michael Abrash
Reducing Judder: Increase Frame Rate X X X time Case 1: continuous ground truth red object moving left-to-right and eye moving to track object OR red object stationary but head moving and eye moving to track object frame 0 frame 1 frame 2 frame 3 Light from display (image is updated each frame) frame 0 frame 1 frame 2 frame 3 frame 4 frame 5 frame 6 frame 7 Light from display (image is updated each frame) Higher frame rate results in closer approximation to ground truth Spacetime diagrams adopted from presentations by Michael Abrash
Reducing Judder: Low Persistence Display X X X time frame 0 frame 0 frame 1 frame 1 frame 2 frame 2 frame 3 frame 3 Case 1: continuous ground truth Light from full-persistence display Light from low-persistence display red object moving left-to-right and eye moving to track object OR red object stationary but head moving and eye moving to track object Full-persistence display: pixels emit light for entire frame Low-persistence display: pixels emit light for small fraction of frame Oculus DK2 OLED low-persistence display - 75 Hz frame rate (~13 ms per frame) - Pixel persistence = 2-3ms Spacetime diagrams adopted from presentations by Michael Abrash
Artifacts Due to Rolling OLED Backlight Image rendered based on scene state at time t 0 Image sent to display, ready for output at time t 0 + Δt Rolling backlight OLED display lights up rows of pixels in sequence Let r be amount of time to scan out a row Row 0 photons hit eye at t 0 + Δt Row 1 photos hit eye at t 0 + Δt + r Row 2 photos hit eye at t 0 + Δt + 2r Implication: photons emitted from bottom rows of display are more stale than photos from the top! Consider eye moving horizontally relative to display (e.g., due to head movement while tracking square object that is stationary in world) X (position of object relative to eye) Result: perceived shear! Recall rolling electronic shutter effects on digital cameras. Y display pixel row
Compensating for Rolling Backlight Perform post-process shear on rendered image Similar to previously discussed barrel distortion and chromatic warps Predict head motion, assume fixation on static object in scene Only compensates for shear due to head motion, not object motion Render each row of image at a different time (the predicted time photons will hit eye) Suggests exploration of different rendering algorithms that are more amenable to fine-grained temporal sampling, e.g., ray caster? (each row of camera rays samples scene at a different time)
Increasing Frame Rate Using Re-Projection Goal: maintain as high a frame rate as possible under challenging rendering conditions: Stereo rendering: both left and right eye views High-resolution outputs Must render extra pixels due to barrel distortion warp Many rendering hacks (bump mapping, etc.) are less effective in VR so rendering must use more expensive techniques Researchers experimenting with reprojection-based approaches to improve frame rate (e.g., Oculus Time Warp ) Render using conventional techniques at 30 fps, reproject (warp) image to synthesize new frames based on predicted head movement at 75 fps Potential for image processing hardware on future VR headsets to perform high frame-rate reprojection based on gyro/accelerometer
Near-Future VR Rendering System Components Low-latency image processing for subject tracking High-resolution, high-frame rate, wide-field of view display Massive parallel computation for high-resolution rendering In headset motion/accel sensors + eye tracker Exceptionally high bandwidth connection between renderer and display: e.g., 4K x 4K per eye at 90 fps! On headset graphics processor for sensor processing and reprojection
Activity in Image Capture of VR Content Google s JumpVR video: 16 4K GoPro cameras Consider challenge of: Registering/3D align video stream (on site) Broadcast encoded video stream across the country to 50 million viewers Lytro Immerge A dense light field camera array pursuing 6 degree-of-freedom video for VR Many, many others: Jaunt ONE, Vuze, Samsung Gear 360, Nokia Ozo,
Summary VR presents many new graphics challenges! Tracking Head-pose tracking with high accuracy and low latency Rendering Low-latency, high resolution & frame-rate, wide field of view, Displays Going beyond 2D panel displays: HMDs, curved displays, Capture How to capture video for VR displays? CS184/284A, Lecture 24 Ren Ng, Spring 2016
Acknowledgments Thanks to Kayvon Fatahalian for this lecture!