Vision III. How We See Things (short version) Overview of Topics. From Early Processing to Object Perception

Vision III From Early Processing to Object Perception Chapter 10 in Chaudhuri 1 1 Overview of Topics Beyond the retina: 2 pathways to V1 Subcortical structures (LGN & SC) Object & Face recognition Primary visual cortex: Lines, direction, colour M vs. P pathways: Movement vs. Particulars How We See Things (short version) RGCs: Dot detectors V1: Line orientation, motion, colour V4: Shapes Temporal lobe: Objects & Faces 2 3

Fundamental Concept: Brain Organization Schemes Dorsal vs. Ventral Streams Dorsal stream mainly involved in motion processing (the where pathway ) Ventral mainly involved in processing identity of objects (the what pathway ) Will focus on ventral here, which takes most of its input from P pathway (more later) 4 5 From Eye To Brain Contralaterality in Vision One might think info from left eye goes to right brain & vice versa, but no. Instead... Information from left half of visual field goes (first) to right half of brain & vice versa That is, everything to the left of what you re fixating goes to the right hemisphere (first), and vice versa. 6 7

Contralaterality in Vision Nasal halves of retinas (close to nose): Capture light from temporal half of visual field Send signals across to contralateral side of brain Temporal halves of retinas (close to temples): Capture light from nasal half of visual field Send signals along to ipsilateral side of brain Primary Visual Cortex (V1) 8 9 Optic Tract Optic Radiations Optic Tract Optic Tract Optic Radiations Optic Radiations Primary Visual Cortex (V1) 9 Primary Visual Cortex (V1) 9

Nasal Visual Field Temporal Retina Ipsilateral Hemisphere Temporal Visual Field Nasal Retina Contralateral Hemisphere Optic Tract Optic Tract Optic Radiations Optic Radiations Primary Visual Cortex (V1) 9 Primary Visual Cortex (V1) 10 Temporal Visual Field Nasal Retina Contralateral Hemisphere Temporal Visual Field Nasal Retina Contralateral Hemisphere Optic Tract Optic Tract Optic Radiations Optic Radiations Primary Visual Cortex (V1) 10 Primary Visual Cortex (V1) 10

Optic Tract Optic Tract Optic Radiations Optic Radiations Primary Visual Cortex (V1) 11 Primary Visual Cortex (V1) 11 Optic Tract Optic Tract Optic Radiations Optic Radiations Primary Visual Cortex (V1) 11 Primary Visual Cortex (V1) 11

Foveal Representation Optic Tract Is the fovea split, with info from each half carried to separate hemispheres? No, instead it is represented in both hemispheres Evidence for this is seen in foveal sparing, i.e., continued visual function in fovea after loss of a visual hemifield due to stroke Optic Radiations Primary Visual Cortex (V1) 11 12 Questions Light from the temporal visual field of the right eye falls on the half of the retina, which sends information to the side of the brain What are the two large streams of visual information that exit V1 and go to the rest of the brain? Two Pathways From Eye To Cortex Geniculocortical pathway: Lateral Geniculate Nucleus (LGN) of thalamus to V1 90% of RGC outputs Tectopulvinar pathway: Superior colliculus (aka tectum ) to Pulvinar nucleus to visual cortex (many parts) 10% of RGC outputs 13 14

The LGN As we ve seen, thalamus has nuclei for early sensory processing of all sensory modalities (except smell). LGN is the one for vision A knee-shaped part of the thalamus Has a six-layered structure Does early visual processing (centre-surround RFs) Good example of a module that is organized in layers and columns 15 16 The LGN The LGN LGN has left and right halves. Each half receives signals from right and left eyes Layers 2, 3, & 5 receive input from the ipsilateral eye Layers 1, 4, & 6 receive input from the contralateral eye C I I C I C See I? I see! I see! 1+4 = 6, not true, so contra ; 2+3 = 5, true, so ipsi LGN layers 1 & 2 are magnocellular, with large neurones Part of the M-pathway, responsible for motion LGN layers 3-6 are parvocellular, with small neurones Part of the P-pathway, responsible for colour and detail 17 18

The LGN Laterality: Red: Receive signals from the ipsilateral eye. Blue: Receive signals from the contralateral eye. Pathways (aka Channels) Solid: parvocellular layers. Dotted: magnocellular layers Visual Processing in LGN LGN neurones have centre-surround receptive fields, just like retinal ganglion cells However, LGN cells are more selective, possibly representing some signal processing to reduce noise and produce sharper tuning than RGCs. These cleaned signals are sent on the V1 19 20 Visual Processing in LGN The LGN s connection to V1 goes both ways, with a descending fibre tract This has been found to play a role in attentional modulation Example: After I ask Are there any questions?, likely some of my magno LGN cells are more active (those for upward motion) The LGN s Retinotopic Map Each LGN layer is organized according to a Retinotopic Map Map: Each place on the retina corresponds to a place on the LGN in a systematic fashion Retinotopic: Neurones that are adjacent in LGN have RFs that are adjacent on the retina 21 22

Adjacent spots in the visual field correspond to adjacent spots on the retina, which in turn correspond to adjacent spots in the LGN Adjacent spots in the visual field correspond to adjacent spots on the retina, which in turn correspond to adjacent spots in the LGN 23 23 Adjacent spots in the visual field correspond to adjacent spots on the retina, which in turn correspond to adjacent spots in the LGN Adjacent spots in the visual field correspond to adjacent spots on the retina, which in turn correspond to adjacent spots in the LGN 23 23

How do we know this? Single-cell recording experiments in monkeys Recorded from individual neurones with a very fine electrode. For example, our electrode might penetrate the LGN parallel to its surface, thus staying the same layer. Measuring response of LGN neurones, and moving systematically across surface of LGN, we find the receptive fields move systematically across retina. 24 25 25 25

25 25 25 25

LGN Location Columns Single-cell recording experiments in monkeys If we instead move the electrode down through the LGN layers, we find the RFs are all the same place That is, the LGN is organized into location columns. i.e., there are columns of neurones that all process information from the same location on retina 25 26 27 27

27 27 27 27

27 27 Questions Superior Colliculus Which layers of the right LGN receive inputs from the left eye? What do we mean when we say the LGN has location columns? What do we mean when we say the LGN has a retinotopic layout? Small branch from the optic tract goes to SC Has retinotopic map of contralateral visual field Signals go from SC to another thalamic nucleus, the pulvinar From there, they go to many parts of the visual cortex 28 29

Superior Colliculus V1 SC receives descending signals from visual, auditory and somatosensory cortices SC integrates these to coordinate eye and body movements toward stimuli Example: You hear a loud sound or feel a tap on your shoulder and look automatically in that direction Primary visual cortex = striate cortex = Brodmann Area 17 = Visual Receiving Area = V1 The first cortical area for visual processing The best-understood part of the cortex, thanks to work by such luminaries as Hubel & Weisel, who won a Nobel Prize for research on V1 30 31 Organizational aspects: 6 layered structure Retinotopic map V1 Ocular dominance columns Orientation selectivity columns Cytochrome oxidase blobs Location hypercolumns Layers of V1 Layer 1: No neurones, just fibres from neurones below Layers 2-3: Communicate horizontally with other visual cortical areas Layer 4: Receives inputs from LGN, subdivided into 4A, 4B, 4Cα (receives parvo inputs) & 4Cβ (receives magno) Signals are then sent up/down from here to other layers Layers 5-6: Send descending communications back to subcortical areas (LGN and SC) 32 33

Layers of V1 Retinotopic Layout of V1 RFs of adjacent V1 neurones are adjacent on the retina But, this retinotopic mapping is distorted relative to the surface area of the retina Foveal Magnification: Far more V1 neurones have RFs in the fovea than in the periphery i.e., foveal RGCs innervate a far larger area of V1than one would predict based on the area of the fovea 34 35 Foveal Magnification On The Retina In V1 Foveal Magnification On The Retina In V1 36 36

Foveal Magnification Why does V1 exhibit foveal magnification? One, there are simply more RGCs per unit area of fovea than in the peripheral retina Two, each foveal RGC innervates more cortical neurones Presumably this allows for more complex and precise processing of visual information from fovea Questions What is the role of the SC in vision? Which layer of V1 receives signals from LGN? What is foveal magnification? Why does it occur? 37 38 Binocularity & Ocular Dominance At the LGN, all neurones are monocular However, at V1, the majority are binocular, taking inputs from both eyes This is the beginning of stereoscopic depth perception Ocular Dominance Most V1 neurones are binocular But most show some preference for one eye or the other The preference varies systematically across the surface of the cortex 39 40

Ocular Dominance How Ocular Dominance Comes About 41 42 Orientation Selectivity Orientation Selectivity Most V1 Neurons have elongated receptive fields These are ON/OFF or OFF/ON, like RGCs Each neurone responds best to a line of light (ON/ OFF) or dark (OFF/ON) of a given orientation 43 44

Orientation Selectivity How does orientation selectivity in V1 neurones arise? Hubel & Weisel proposed the model below, where several LGN cells having RFs lined up on the retina--feed into one V1 cell How V1 cells are wired to RGCs to produce oriented receptive fields 45 46 How V1 cells are wired to RGCs to produce oriented receptive fields + + + How V1 cells are wired to RGCs to produce oriented receptive fields + + + V1 Simple Cell V1 Simple Cell 46 46

How V1 cells are wired to RGCs to produce oriented receptive fields - + + + + How V1 cells are wired to RGCs to produce oriented receptive fields - + + + + Retinal Ganglion Cells Retinal Ganglion Cells LGN Cells V1 Simple Cell V1 Simple Cell 46 46 How V1 cells are wired to RGCs to produce oriented receptive fields - + - + + + How V1 cells are wired to RGCs to produce oriented receptive fields - + - + + + - + Retinal Ganglion Cells Retinal Ganglion Cells LGN Cells LGN Cells V1 Simple Cell V1 Simple Cell 46 46

How V1 cells are wired to RGCs to produce oriented receptive fields - + - + + + - + How V1 cells are wired to RGCs to produce oriented receptive fields - + - - + + + + + Retinal Ganglion Cells Retinal Ganglion Cells LGN Cells LGN Cells V1 Simple Cell V1 Simple Cell 46 47 How V1 cells are wired to RGCs to produce oriented receptive fields + + + - + - - + + How V1 cells are wired to RGCs to produce oriented receptive fields + + + + + + + + + Retinal Ganglion Cells Retinal Ganglion Cells LGN Cells LGN Cells V1 Simple Cell V1 Simple Cell 47 48

+ + + How V1 cells are wired to RGCs to produce oriented receptive fields + + + + + + + + Retinal Ganglion Cells LGN Cells V1 Simple Cell Directional Motion Selectivity Hubel & Weisel also found V1 cells sensitive to the direction of motion of the stimulus These are the first stage in our ability to process moving stimuli 48 49 Directional Motion Selectivity + + + Schematic of a Reichardt Motion Detector Striate Cortex Simple Cell Delaying Interneuron Striate Cortex Motion-Sensitive Cell (Reichardt Detector) 50 51

+ + + + + + Schematic of a Reichardt Motion Detector Questions Delaying Interneuron Striate Cortex Simple Cell Striate Cortex Motion-Sensitive Cell (Reichardt Detector) What are three stimulus characteristics that V1 neurones are tuned to? True or false: Orientation-selective cells are all ON/OFF (i.e, excitatory centre/inhibitory surround)? 51 52 Organization of V1 How are the various cells in V1 organized? Hubel & Weisel proposed the ice cube model, whereby orientation and ocular dominance columns varied independently Also proposed the location hypercolumn, which is a set of all orientation columns and two ocular dominance columns Ice Cube Model of V1 Retina 53 54

Ice Cube Model of V1 Ice Cube Model of V1 Retina Retina 54 54 Ice Cube Model of V1 Ice Cube Model of V1 Retina Retina 54 54

Ice Cube Model of V1 Ice Cube Model of V1 Retina Retina 54 54 Ice Cube Model of V1 Ice Cube Model of V1 Retina While it s accurate as far as it goes, the ice cube model is not complete Motion direction selectivity is not incorporated, nor are colour processing, spatial frequency, or M vs. P channels. 54 55

Cytochrome Oxidase Blobs Interlaced with location hypercolumns are another set of columns that show high neural activity Once were thought to be involved in colour processing and called colour blobs But instead they seem to integrate info from M&P cells M vs. P Channels As noted earlier, two channels start in retina: Magno = movement Parvo = particulars These continue on through V1 to higher visual areas 56 57 M vs. P: Anatomical Separation M Retina Parasol RGCs Midget RGCs LGN Layers 1-2 Layers 3-6 V1, layer 4 4Cα 4Cβ V1, blobs blob & interblob interblob only Extrastriate MT V4 Pathways Dorsal ( where ) Ventral ( what ) P Complete Separation Partial Separation M vs. P: Functional Differences 58 59

Ultimately... M channel projects more to MT (motion processing area) and then to the dorsal stream (= where pathway ) P channel projects more to V4 (form processing area) and then to the ventral stream (= what pathway ) We will now take a closer look at the latter Questions What is a location hypercolumn in V1? The M & P pathways are completely segregated up to what point in the visual system? 60 61 The Ventral Stream Consists of a network of areas, mostly in the inferior temporal (IT) area, that engage in highlevel vision IT cortex can be divided into 3 zones: Posterior IT (PIT): Complex form processing Central IT (CIT): View invariant processing Anterior IT (AIT): Individuation / configuration / shape-invariant processing Complex Form Processing Tasks (PIT/CIT) Differentiating illumination edges from reflectance edges. Inverse Projection: Determining 3D shape from 2D information Segmentation: Differentiating objects from background and each other. Viewpoint invariance: Objects look different from different viewpoints Shape invariance: Some objects, especially living things, change shape but nonetheless are recognized as the same object. Completion: Objects are often partially occluded, how do we complete the view of a partially-viewed object? 62 63

Illumination Edge 64 64 Reflectance Edge Reflectance Edge Illumination Edge Illumination Edge It is difficult for a computer program (but easy for us) to determine which changes in lightness in this scene are due to properties of different parts of the scene, and which are due to changes in illumination. 64 64

Light Comes From Above Light Comes From Above One assumption the visual system makes in differentiating shadow from reflectance change is that light comes from above. True over our evolutionary history. 65 One assumption the visual system makes in differentiating shadow from reflectance change is that light comes from above. True over our evolutionary history. 65 The Light-from-above Assumption The Light-from-above Assumption 66 67

The Light-from-above Assumption Shadows Have Fuzzy Edges 67 Another assumption the visual system makes is that shadows have fuzzy edges (penumbras). 68 Shadows Have Fuzzy Edges Inverse Projection Problem Another assumption the visual system makes is that shadows have fuzzy edges (penumbras). 68 An infinite number of objects can create the same image on the retina. How do we know which one is out there? 69

Inverse Projection Rules The brain uses heuristics--rules of thumb that aren t always true--to solve the otherwise impossible inverse projection problem. E.g., A straight line in the 2D image on the retina is a straight line in 3D reality If the tips of two lines meet in 2D, assume they meet at their tips in 3D reality 70 Heuristics Both of these assumptions hold true in most cases, but not all. Both are part of a more general rule that the visual system interprets the 3D world in a stable way, meaning that it will not change with slight changes in POV. 71 Heuristics Heuristics Both of these assumptions hold true in most cases, but not all. Both are part of a more general rule that the visual system interprets the 3D world in a stable way, meaning that it will not change with slight changes in POV. 71 Both of these assumptions hold true in most cases, but not all. Both are part of a more general rule that the visual system interprets the 3D world in a stable way, meaning that it will not change with slight changes in POV. 71

Heuristics Inverse Projection Fail Both of these assumptions hold true in most cases, but not all. Both are part of a more general rule that the visual system interprets the 3D world in a stable way, meaning that it will not change with slight changes in POV. 71 http://tinyurl.com/7duz9zb Your brain makes assumptions about how 2D projections arise from 3D objects In the case of the Devil s Triangle, they not only fail to give you the correct interpretation, they actively prevent you from getting it! 72 Inverse Projection Epic Fail! Segmentation: Which parts of a scene belong to which objects? What is object vs. background? Part of the solution involves gestalt rules such as smoothness heuristics, but part is simple experience. Here Magritte messes with our segmentation heuristic by violating both gestalt rules and our experiences. 73 74

Segmentation Smoothness Heuristic Fail Gestalt heuristics play a role in segmentation: Smoothness: Take the interpretation with the least sharp turns Pragnanz (simplicity): Take the interpretation with the fewest objects and types of objects 75 76 Smoothness Heuristic Fail Perceptual Segregation: Figure and Ground 76 77

Figure-Ground Segmentation Heuristics used to determine which area is figure: Figures are located in the lower part of scene Figures are symmetrical Figures are small, backgrounds are large Figures are vertical Elements that are meaningful (i.e., have been seen as figures before) are figures 78 79 80 81

Figure-Ground Segmentation in V1 Recordings from V1 in the monkey cortex show: Response to area that is figure No response to area that is ground This result is important because: V1 neurones are early in the nervous system It reveals both a feedforward and feedback in the system How a neurone in V1 responds to stimuli presented to its receptive field (green rectangle). (a) The neurone responded when the stimulus on the receptive field is figure. (b) No response when the same pattern on the receptive field is not figure! 82 83 How a neurone in V1 responds to stimuli presented to its receptive field (green rectangle). (a) The neurone responded when the stimulus on the receptive field is figure. (b) No response when the same pattern on the receptive field is not figure! Questions What are some heuristics the brain uses......to distinguish luminance changes from lightness changes?...to solve the inverse projection problem?...to solve the segmentation problem? 83 84

Viewpoint Invariance Viewpoint Invariance Ceci ne sont pas des pipes We recognize objects from different viewpoints, even though pattern of light on retina changes. i.e., there is a -to-one relation between light patterns and objects (and vice versa). Beiderman s RBC and Tarr et al s view-based recognition models both tried to account for this We will see that this is another example of thesis-antithesis-synthesis 85 86 A B C A B C Human Viewpoint Invariance is Imperfect. Quick: Which pairs show two views of the same object? Human Viewpoint Invariance is Imperfect. Quick: Which pairs show two views of the same object? 87 87

A B C A B C Human Viewpoint Invariance is Imperfect. Quick: Which pairs show two views of the same object? Human Viewpoint Invariance is Imperfect. Quick: Which pairs show two views of the same object? 87 87 Two Viewpoints on Viewpoint Invariance Structural-description models: 3D object representations are based on combinations of 3D volumetric primitives Image-description models: Ability to identify 3D objects comes from sets of stored 2D images from different perspectives Structural Description Models Marr s model proposed a sequence of events using simple geometrical features: Edges (detected via V1 neurones) View-invariant features such as parallel lines, curve polarity, angle type. (PIT, maybe?) Geometrical shapes (again, PIT?) Relations between geometric shapes (CIT?). 88 89

Structural-Description Models Geons & Objects Recognition-by-components theory by Biederman (developed from Marr s ideas) Volumetric primitives are called geons Theory proposes there are 36 geons that combine to make all 3-D objects Geons include cylinders, rectangular solids, pyramids, etc. 90 91 Structural-Description Models Properties of geons View-invariant: They can be recognized from almost any viewpoint (except rare accidental viewpoints) Discriminability: They can be easily distinguished from one another. Principle of componential recovery - the ability to recognize an object if we can identify its geons It is difficult to identify the object behind the mask because the corners and curves that allow extraction of geons have been obscured. 92 93

Image-Description Models Now that it is possible to identify geons, the object can be identified In contrast to structural description models, image-description models claim that: Ability to identify 3-D objects comes from stored 2-D viewpoints from different perspectives. Evidence for this comes from novel object studies: For a familiar object, view invariance occurs For a novel object, view invariance does not occur Shows that an observer must have the different viewpoints encoded before recognition can occur from all viewpoints 94 95 Psychophysical curve showing that a monkey is better at identifying the view of the object that was presented during training (arrow). No view invariance. 96 Synthesis Tjan & Legge (1998): View-invariant performance found for simple objects (e.g., geometrical shapes) But not for complex objects (e.g., ameoboids, bent paper-clip objects, etc.) Complexity defined quantitatively via an ideal observer algorithm Recent models incorporate both imagedescription and structural description aspects. 97

Completion You Complete Me Completion of partially-viewed objects is based on gestalt heuristics such as smoothness and pragnanz But, as with all heuristics, these sometimes fail. Sometimes we complete things that aren t there... Experience obviously plays an important role 98 99 You Complete Me 99 100

Shape Invariance Shape Invariance We have little trouble recognizing this bird as such, despite its many changes in shape A similar problem arises with facial expression 101 102 Questions Which theory best explains our ability to recognize objects from many views, structural description models or image-description models? What does completion refer to in vision? Face Perception Face processing is a highly complex visual task Faces are quite uniform, but we individuate them with ease Facial expressions are subtle variations in face shape, but we decode them with ease Thought to be subserved by a network of brain areas, some of which are in AIT 103 104

My Clones? Your face is the same as everybody has the two eyes... nose in the middle, mouth under. It's always the same. Now if you had the two eyes on the same side of the nose, for instance or the mouth at the top that would be some help. - Humpty Dumpty, Through the Looking Glass 105 106 My Clones? Prosopaganosia An inability to recognize faces Often arises after damage to the AIT, specifically the fusiform face area (FFA) Specific to faces, recognition of other objects is unimpaired 106 107

Face-Selective Neurones In areas of monkey cortex homologous to FFA, we find cells that respond specifically to faces Perceptual Differences Between Faces & Objects A number of phenomena suggest that faces are processed in a qualitatively different way than other objects Inversion has a disproportionate effect on face recognition Inversion seems to disrupt configural processing in faces but not objects Composite effects exist for faces but not objects 108 109 Face Inversion Inversion Disrupts Configural Processing A little harder A lot harder Schwaninger, A., Carbon, C.C., & Leder, H. (2003). Expert face processing: Specialization and constraints. In G. Schwarzer & H. Leder (Eds.), Development of Face Processing, pp. 81-97. Göttingen: Hogrefe. 110 111

Inversion Disrupts Configural Processing Inversion Disrupts Configural Processing Schwaninger, A., Carbon, C.C., & Leder, H. (2003). Expert face processing: Specialization and constraints. In G. Schwarzer & H. Leder (Eds.), Development of Face Processing, pp. 81-97. Göttingen: Hogrefe. Schwaninger, A., Carbon, C.C., & Leder, H. (2003). Expert face processing: Specialization and constraints. In G. Schwarzer & H. Leder (Eds.), Development of Face Processing, pp. 81-97. Göttingen: Hogrefe. 111 111 Composite Face Effect Composite Face Effect 112 112

Questions What is prosopagnosia? When does it occur? What is the composite face effect? 113