Insights into High-level Visual Perception or Where You Look is What You Get Jeff B. Pelz Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of Technology
Students Roxanne Canosa Jason Babcock Eric Knappenberger Dan Lerner Marianne Lipps (Ph.D. Imaging Science) (MS Color Science) (MS Imaging Science) (BS Imaging Science) (BS Imaging Science)
Optical Illusions Reveal the shortcomings of the visual system, and our best effort to make sense from incomplete information
Outline 1. What are the fundamental limitations of the visual system?
Outline 1. Fundamental limitations 2. What strategies are employed to compensate for those limitations?
Outline 1. Fundamental limitations 2. Strategies to compensate for limitations 3. Can we build tools that take advantage of those strategies to inform the design and evaluation of imaging systems?
Outline 1. Fundamental limitations 2. Strategies to compensate for limitations 3. Build design and evaluation tools 4. Can we use our understanding of the human visual system to aid design of next-generation computer vision systems?
Introduction Visual perception is a complex process that unfolds over time, typically occurring at a level below conscious awareness. People are often unaware of the details of how they perform many tasks, including gathering visual information from the environment. By monitoring the eye movement patterns of observers as they perform a task, we can learn about task strategy and performance.
Fundamental Limitations 1. What are the fundamental limitations of the visual system?
The Design of the Visual System There were evolutionary pressures for highacuity vision (human as predator), and a wide field-of-view (human as prey).
The Design of the Visual System There were evolutionary pressures for highacuity vision (human as predator), and a wide field-of-view (human as prey). Even if the entire cortex were devoted to vision, there are not sufficient resources to represent a large visual field at high acuity.
The Foveal Compromise The solution favored by nature represented a compromise between the two demands. The foveal compromise makes use of: A. Anisotropic sampling of the scene B. Serial execution (task switching) C. Limited internal representations D. Focused attention
A. Anisotropic Sampling of the Visual Field The foveal compromise High-acuity central fovea Limited-acuity periphery photoreceptor density periphery center periphery
Anisotropic Sampling of the Visual Field + If you can read this you must be cheating.
Anisotropic Sampling of the Visual Field The visual field must be sampled by the high-acuity fovea: If you can read this you must be cheating The foveal compromise requires a mechanism for moving the eyes about the scene.
Outline 1. Fundamental limitations 2. What strategies are employed to compensate for those limitations?
Foveal Compromise: Eye Movements Each eye has three agonistantagonist muscle pairs to rotate the eye horizontally, vertically, and about the optical axis.
Types of Eye Movements Smooth pursuit: match object motion Vestibular-ocular response: compensate for self-motion Vergence: merge images at different distances Saccades: move fovea to new location
Background: Eye Movement Types Smooth pursuit Vestibular-ocular response Image stabilization Vergence Saccades - Image destabilization: shifts fovea to new image region
Destabilizing Eye Movements Saccades Amplitude: < 1 > 45 visual angle Velocity: > 600 /second Frequency: ~ 3-4/second (>150,000/day) Saccades are made to targets requiring high spatial resolution and to the locus of attention.
B. Serial Execution: Sequential Sampling
Serial Execution: Sequential Sampling
Serial Execution: Sequential Sampling
Serial Execution: Sequential Sampling
Serial Execution: Sequential Sampling
Serial Execution: Foveations With each eye movement, the fovea slides under a new portion of the retinal image. A new portion of the image is sampled, but each new sample is centered on the fovea
Serial Execution: Foveations
Serial Execution: Foveations
Serial Execution: Foveations
Serial Execution: Foveations
C. Internal Representation
Internal Representation If a high-acuity internal representation is built up over multiple fixations, it should be easy to detect even small differences between images. A B
Internal Representation Following are two versions of the school children, separated by a blank slide. There is a difference between the two; your task is to identify the difference. View them in alternation, trying to find the difference. The difference is clearly visible in the slide at the end.
View ~3 sec, then advance A
View ~1/2 sec, then continue
View ~3 sec, then REVERSE B
Compare to previous slide A
Limited Neural Resources Something beyond variable acuity is responsible. Deploying attention to different areas in sequence conserves limited resources. Changes to the scene can be made to unattended regions without affecting conscious perception. In nature, such changes usually induce apparent motion, drawing attention to the region.
Serial Execution: Eye Movements The limited acuity periphery must be sampled by the high-acuity fovea, resulting in serial data acquisition. The eye movements guiding that acquisition are externally-observable markers of acuity demands, deployment of attention, and perceptual strategies.
Serial Execution; Image Preference 3 sec viewing
Outline 1. Fundamental limitations 2. Strategies to compensate for limitations 3. Can we build tools that take advantage of those strategies to inform the design and evaluation of imaging systems?
Measuring eye movements The Problem: After all, the eye is sitting in a bag of fat in a hole in your head, and there are six big muscles pulling on it. Cornsweet, 1976
Measuring eye movements The Solution: Barlow photographed a droplet of mercury placed on the limbus. Translations of the head were minimized by having subjects lie on a stone slab with their heads wedged tightly inside a rigid iron frame Kowler, 1990
Measuring eye movements
Measuring eye movements Limbus eyetracker Video-based eyetracker
Measuring eye movements Scleral eye-coils Dual Purkinje eyetracker
Head-mounted eyetracker Infrared / Video Headband-mounted eyetracker
Infrared, Video-based Eyetrackers Bright Pupil; On-axis Illumination IR camera IRED
Remote eyetracker Infrared / Video Remote-head eyetracker
Change Blindness
Human Computer Interface
Visualization = 250 ms
Image & Subject Dependence
Radiographic Search: Scanpath
Radiographic Search: Fixation Density
Measuring eye movements These commercially available eyetrackers are restricted to laboratory use. The ability to monitor perception as people perform real tasks in the real world would allow us to ask new kinds of questions.
RIT Wearable Eyetracker
IR illuminator/ optics module monochrome CMOS eye camera calibration LASER folding mirror hot mirror color CMOS scene camera
RIT Wearable Eyetracker
Fixation Sequence Before Image Capture
Complex, Familiar Tasks
Outline 1. Fundamental limitations 2. Strategies to compensate for limitations 3. Build design and evaluation tools 4. Can we use our understanding of the human visual system to aid design of next-generation computer vision systems?
Motivation Because vision is effortless for humans, computer vision was chosen as an early research domain. Early attempts at computer vision systems attacked the problem by brute force with limited success: Tried Image Understanding on static 2D images ( From Pixels to Predicates )
Limited Computational Resources Even in the face of Moore s Law, computers will not have sufficient power in the foreseeable future to solve vision by brute force.
Limited Computational Resources Even in the face of Moore s Law, computers will not have sufficient power in the foreseeable future to solve vision by brute force. Computer-based perception faces the same fundamental challenge that human perception did during evolution: limited computational resources
The Foveal Compromise The solution favored by nature: A. Anisotropic sampling of the scene B. Serial execution (task switching) C. Limited internal representations D. Focused attention
Motivation: Cognitive Science Human Cognition Sensorial Experience High-level Visual Perception Attentional Mechanisms Eye Movements
Motivation: Cognitive Science Human Cognition Sensorial Experience High-level Visual Perception Artificial Intelligence Computer Vision Active Vision Attentional Mechanisms Eye Movements
Inspiration - Active Vision Active vision was the first step. Unlike traditional approaches to computer vision, active vision systems focused on extracting information from dynamic, 3D scenes. Aloimonos, 1987 Bajcsy, 1988 Ballard, 1989 Brooks, 1991 Vision & robotics @ UR CS @ U Penn
Active Vision Inspired by anisotropic, binocular vision in humans, researchers built neuromorphic vision systems that took advantage of active cameras. Vision & robotics @ UR Humanoid robotics @ MIT
Inspiration - - Active Vision Visual routines were an important component of the Active Vision approach. Pre-defined routines are scheduled and run to extract information when and where it is needed.
Perceptual Strategies Limited representation + task-switching Deploying attention and eye movements are controlled below conscious awareness; there must be mechanisms (strategies) that protect us from the constraints of visual perception in the real world - that help us make sense from the incomplete data available.
Perceptual strategies Beyond the mechanics of how the eyes move during real tasks, we are interested in strategies that may support the conscious perception that is continuous temporally as well as spatially.
Goal - - Strategic Vision Strategic Vision can use high-level, topdown strategies for extracting information from complex environments.
Goal - - Strategic Vision Strategic Vision can use high-level, topdown strategies for extracting information from complex environments. One goal of our research is to study human behavior in natural, complex tasks to search for visual routines that emerge under realworld constraints.
Perceptual Strategies
representations: : Successive Foveations Limited representations
representations: : Successive Foveations Limited representations 0 msec
representations: : Successive Foveations Limited representations 770 msec
representations: : Successive Foveations Limited representations 1400 msec
representations: : Successive Foveations Limited representations 2000 msec
representations: : Successive Foveations Limited representations 2700 msec
representations: : Successive Foveations Limited representations 2800 msec
Perceptual Strategies: Look-ahead fixations 2000 msec 800 msec... Intervening tasks look-ahead fixation guiding fixation interaction
Perceptual Strategies: Look-ahead fixations Sub-tasks Sequenced look-ahead Fixations 2000 milliseconds 7000 Sub-tasks Interposed look-ahead Fixations 0 milliseconds 5000
Perceptual Strategies: Look-ahead fixations Humans employ strategies to ease the computational and memory loads inherent in complex tasks. Look-ahead fixations represent one such strategy: Opportunistic execution of informationgathering visual routines to pre-fetch information needed for future subtasks.
Conclusions Monitoring eye movements gives us a window into perception and cognition that can reveal details not available even to the observer. Visual Strategies observed can help us understand how people use vision in their interaction with the world, and perhaps aid in the design of artificial systems that take advantage of this knowledge.
Conclusions Tools that monitor subjects eye movements can aid in the design and evaluation of imaging systems. The design of next-generation computer vision systems may be aided by implementing algorithms derived by understanding the strategies employed by the human visual system to compensate for limited computational resources.
Questions?