Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Visual Attention: Combining Object Identity and Location, Study notes of Design

How the visual system combines information about the identity and location of objects through the concept of visual attention. a Bayesian theory of attention, the readout of IT population activity, and the role of universal features in the process. References to various studies and theories are provided.

Typology: Study notes

2021/2022

Uploaded on 08/05/2022

aichlinn
aichlinn 🇮🇪

4.4

(45)

1.9K documents

1 / 94

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Visual Attention: Combining Object Identity and Location and more Study notes Design in PDF only on Docsity! Bottom-up and top-down processing in visual perception Thomas Serre Brown University Department of Cognitive & Linguistic Sciences Brown Institute for Brain Science Center for Vision Research The ‘what’ problem monkey electrophysiology The ‘what’ problem monkey electrophysiology Hung Kreiman et al ’05 Willmore Prenger & Gallant ’10 shape factors (if any) are uniquely and consistently associated with neural responses. This has not been attempted before because of the intractable size of three-dimensional shape space. In this virtually infinite domain, a conventional random or systematic (grid-based) stimulus approach can never produce sufficiently dense combinatorial sampling. –1 –1 a b Response = 0.4A + 0.0B + 49.0AB + 0.0 38 10 0 1 Relative x position R el at iv e y po si tio n R el at iv e y po si tio n Relative z positionMaximum curvature M in im um c ur va tu re 0 180 180 360 Angle on xy plane (deg) A ng le o n yz p la ne (d eg ) 0 i 5 deg c d 0 20 40 Rotation angle (deg) 900–90–4 –2 0 2 4 10 20 30 Depth (deg) 0 0 10 20 30 1 2.10.48 Size(x)x position (deg) y po si tio n (d eg ) –10 100 –10 10 0 0 10 20 30 0 10 20 30 Sha din g Te xtu re Non e 0 10 20 30 e f g h Te xtu re Non e Stereo Nonstereo 900–90 0 10 20 30 Light angle (deg) Sha din g 0 10 20 30 0 10 20 30 900–90 900–90 10 20 30 0 10 20 30 0 –4 –2 0 2 4 –4 –2 0 2 4 0 10 20 30 0 10 20 30 1 2.10.48 1 2.10.48 0 20 40 0 20 40 900–90 900–90 –10 10 0 –10 10 0 –10 100 –10 100 40 j 0 spikes s–1 R es po ns e (s pi ke s s–1 ) R es po ns e (s pi ke s s–1 ) R es po ns e (s pi ke s s–1 ) R es po ns e (s pi ke s s–1 )0 spikes s–1 R es po ns e (s pi ke s s–1 ) –1 0 1 –1 0 1 –1 0 1 –1 0 1 Figure 2 Neural tuning for three-dimensional configuration of surface fragments. (a) Top 50 stimuli across eight generations (400 stimuli) for a single IT neuron recorded from the ventral bank of the superior temporal sulcus (17.5 mm anterior to the interaural line). (b) Bottom 50 stimuli for the same cell. (c) Responses to highly effective (top), moderately effective (middle) and ineffective (bottom) example stimuli as a function of depth cues (shading, disparity and texture gradients, exemplified in Supplementary Fig. 10). Responses remained strong as long as disparity (black, green and blue) or shading (gray) cues were present. The cell did not respond to stimuli with only texture cues (pale green) or silhouettes with no depth cues (pale blue). (d) Response consistency across lighting direction. The implicit direction of a point source at infinity was varied across 1801 in the horizontal (left to right, black curve) and vertical (below to above, green curve) directions, creating very different two-dimensional shading patterns (Supplementary Fig. 11). (e) Response consistency across stereoscopic depth. In the main experiment, the depth of each stimulus was adjusted so that the disparity of the surface point at fixation was 0 (that is, the animal was fixating in depth on the object surface). In this test, the disparity of this surface point was varied from !4.51 (near) to 5.61 (far). (f) Response consistency across xy position. Position was varied in increments of 4.51 of visual angle across a range of 13.51 in both directions. (g) Sensitivity to stimulus orientation. As with all neurons in our sample, this cell was highly sensitive to stimulus orientation, although it showed broad tolerance (about 901) to rotation about the z axis (rotation in the image plane, blue curve); this rotation tolerance is also apparent among the top 50 stimuli in a. Rotation out of the image plane, about the x axis (black) or y axis (green), strongly suppressed responses. (h) Response consistency across object size over a range from half to twice that of the original stimulus. (i) Linear/nonlinear response model based on two Gaussian tuning functions (details as in Fig. 2d,e). (j) The tuning functions are projected onto the surface of a high-response stimulus, seen from the observer’s viewpoint (left) and from above (right). Error bars indicate s.e.m. in all panels. 1354 VOLUME 11 [ NUMBER 11 [ NOVEMBER 2008 NATURE NEUROSCIENCE ART ICLES Yamane Carlson Bowman Wang & Connor ’08 The ‘what’ problem human psychophysics Thorpe et al ’96; VanRullen & Thorpe ’01; Fei-Fei et al ’02 ’05; Evans & Treisman ’05; Serre Oliva & Poggio ’07 Vision as ‘knowing what is where’ Aristotle; Marr ’82 ventral ‘what‘ stream dorsal ‘where’ stream Ungerleider & Mishkin ‘84 ‘What’ and ‘where’ cortical pathways ventral ‘what‘ stream dorsal ‘where’ stream Ungerleider & Mishkin ‘84 ? How does the visual system combine information about the identity and location of objects? ‘What’ and ‘where’ cortical pathways ventral ‘what‘ stream dorsal ‘where’ stream Ungerleider & Mishkin ‘84 ? How does the visual system combine information about the identity and location of objects? ➡ Central thesis: visual attention (see also Van Der Velde and De Kamps ‘01; Deco and Rolls ‘04) ‘What’ and ‘where’ cortical pathways Perception as Bayesian inference Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 S I visual scene description image measurements P (S|I) ∝ P (I|S)P (S) Perception as Bayesian inference Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 S I visual scene description image measurements P (S|I) ∝ P (I|S)P (S) Perception as Bayesian inference Mumford ’92; Knill & Richards ‘96; Dayan & Zemel ’99; Rao ’02 ’04; Kersten & Yuille ‘03; Kersten et al ‘04; Lee & Mumford ‘03; Dean ’05; George & Hawkins ’05 ’09; Hinton ‘07; Epshtein et al ‘08; Murray & Kreutz-Delgado ’07 S I visual scene description image measurements Assumption #1: Attentional spotlight Broadbent ’52 ‘54; Treisman ‘60; Treisman & Gelade ‘80; Duncan & Desimone ‘95; Wolfe ‘97; and many others O,L I image measurements • To recognize and localize objects in the scene, the visual system selects objects, one object at a time P (O,L, I) visual scene description Assumption #2: ‘what’ and ‘where’ pathways O,L I image measurements • Object location L and identity O are independent ventral ‘what‘ stream dorsal ‘where’ stream P (O,L, I) visual scene description P (O,L, I) = P (O)P (L)P (I|L,O) O I object image measurementslocation L Assumption #2: ‘what’ and ‘where’ pathways • Object location L and identity O are independent ventral ‘what‘ stream dorsal ‘where’ stream L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Assumption #3: universal features see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Assumption #3: universal features Serre et al ’05 David et al ’06 Cadieu et al ’07 Willmore et al ’10 V2/V4 see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Assumption #3: universal features Serre et al ’05 David et al ’06 Cadieu et al ’07 Willmore et al ’10 V2/V4 see Riesenhuber & Poggio ’99; Serre et al ’05 ’07 Bayesian inference and attention L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Serre Oliva & Poggio ’07 behavior Bayesian inference and attention L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Serre Oliva & Poggio ’07 behavior Bayesian inference and attention L Xi I object Fi retinotopic features O position and scale tolerant features N location image measurements Serre Oliva & Poggio ’07 behavior feedforward (bottom-up) sweep Bayesian inference and attention L Xi I Fi O N ventral / what dorsal / where Bayesian inference and attention L Xi I Fi O N ventral / what dorsal / where P (Xi|I) = P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Xi ￿ P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Bayesian inference and attention L Xi I Fi O N ventral / what dorsal / where feedforward (bottom-up) sweep feedforward input P (Xi|I) = P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Xi ￿ P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Bayesian inference and attention L Xi I Fi O NE A S P (Xi|I) = P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Xi ￿ P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Neuron Review The Normalization Model of Attention John H. Reynolds1,* and David J. Heeger2 1Salk Institute for Biological Studies, La Jolla, CA 92037-1099, USA 2Department of Psychology and Center for Neural Science, New York University, New York, NY 10003, USA *Correspondence: reynolds@salk.edu DOI 10.1016/j.neuron.2009.01.002 Attention has been found to have a wide variety of effects on the responses of neurons in visual cortex. We describe amodel of attention that exhibits each of these different forms of attentional modulation, depending on the stimulus conditions and the spread (or selectivity) of the attention field in the model. The model helps reconcile proposals that have been taken to represent alternative theories of attention. We argue that the variety and complexity of the results reported in the literature emerge from the variety of empirical protocols that were used, such that the results observed in any one experiment depended on the stimulus conditions and the subject’s attentional strategy, a notion that we define precisely in terms of the attention field in the model, but that has not typically been completely under experimental control. Introduction Attention has been known to play a central role in perception since the dawn of experimental psychology (James, 1890). Over the past 30 years, the neurophysiological basis of visual attention has become an active area of research, yielding an explosion of findings. Neuroscientists have utilized a variety of techniques (single-unit electrophysiology, electrical microstimu- lation, functional imaging, and visual-evoked potentials) to map the network of brain areas thatmediate the allocation of attention (Corbetta and Shulman, 2002; Yantis and Serences, 2003) and to examine how attention modulates neuronal activity in visual cortex (Desimone and Duncan, 1995; Kastner and Ungerleider, 2000; Reynolds and Chelazzi, 2004). During the same period of time, the field of visual psychophysics has developed rigorous methods for measuring and characterizing the effects of atten- tion on visual performance (Braun, 1998; Carrasco, 2006; Cava- nagh and Alvarez, 2005; Sperling andMelchner, 1978; Verghese, 2001; Lu and Dosher, 2008). We review the single-unit electrophysiology literature docu- menting the effects of attention on the responses of neurons in visual cortex, and we propose a computational model to unify the seemingly disparate variety of such effects. Some results are consistent with the appealingly simple proposal that atten- tion increases neuronal responses multiplicatively by applying a fixed response gain factor (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999), while others are more in keeping with a change in contrast gain (Li and Basso, 2008; Mar- tinez-Trujillo and Treue, 2002; Reynolds et al., 2000), or with effects that are intermediate between response gain and contrast gain changes (Williford and Maunsell, 2006). Other studies have shown attention-dependent sharpening of neuronal tuning at the level of the individual neuron (Spitzer et al., 1988) or the neural population (Martinez-Trujillo and Treue, 2004). Still others have shown reductions in firing rate when attention was directed to a nonpreferred stimulus that was paired with a preferred stimulus also inside the receptive field (Moran and Desimone, 1985; Recanzone and Wurtz, 2000; Reynolds et al., 1999; Reynolds and Desimone, 2003). These different effects of attentional modulation have not previously been explained within the framework of a single computational model. We demonstrate here that a model of attention that incorporates divisive normalization (Heeger, 1992b) exhibits each of these different forms of attentional modulation, depending on the stim- ulus conditions and the spread (or selectivity) of the attentional feedback in the model. In addition to unifying a range of experimental data within a common computational framework, the proposedmodel helps reconcile alternative theories of attention. Moran and Desimone (1985) proposed that attention operates by shrinking neuronal receptive fields around the attended stimulus. Desimone and Duncan (1995) proposed an alternative model, in which neurons representing different stimulus components compete and atten- tion operates by biasing the competition in favor of neurons that encode the attended stimulus. It was later suggested that atten- tion instead operates simply by scaling neuronal responses by a fixed gain factor (McAdams and Maunsell, 1999; Treue and Martinez-Trujillo, 1999). Treue and colleagues advanced the ‘‘feature-similarity gain principle,’’ that the gain factor depends on the match between a neuron’s stimulus selectivity and the features or locations being attended (Treue andMartinez-Trujillo, 1999; Martinez-Trujillo and Treue, 2004). Spitzer et al., 1988 proposed that attention sharpens neuronal tuning curves, and Martinez-Trujillo and Treue (2004) explained that sharpening is predicted by their ‘‘feature-similarity gain principle.’’ Finally, Rey- nolds et al., 2000 proposed that attention increases contrast gain. Indeed, the initial motivation for the model proposed here derived from the reported similarities between the effects of attention and contrast elevation on neuronal responses (Rey- nolds and Chelazzi, 2004; Reynolds et al., 1999, 2000; Reynolds and Desimone, 2003). The proposed normalization model of attention combines aspects of each of these proposals and exhibits all of these forms of attentional modulation. Thus, the various models out- lined above are not mutually exclusive. Rather, they can all be expressed by a single, unifying computational principle. We propose that this computational principle endows the brain with the capacity to increase sensitivity to faint stimuli presented alone and to reduce the impact of task irrelevant distracters 168 Neuron 61, January 29, 2009 ª2009 Elsevier Inc. R(x, θ) = A(x, θ)E(x, θ) S(x, θ) + σ Multiplicative scaling of tuning curves by spatial attention see also Heeger & Reynolds ’09 McAdams and Maunsell ‘99 Model P (L = x) = 1/|L| P (L = x) ≈ 1 P (Xi = x|I) ∝ ￿ F i,L P (Xi = x|F i, L)P (I|Xi)P (F i)P (L). Contrast vs. response gain see also Heeger & Reynolds ’09 Trujillo and Treue ‘02 Mc Adams and Maunsell ’99 P (Xi|I) = P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Xi ￿ P (I|Xi) ￿ F i,L P (Xi|F i, L)P (L)P (F i) ￿ Learning to localize cars and pedestrians L Xi I Fi O N learning object priors Learning to localize cars and pedestrians L Xi I Fi O N learning object priors Context Learning to localize cars and pedestrians L Xi I Fi O N learning object priors learning location priors (global contextual cues, see Torralba, Oliva et al) Context The experiment Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker The experiment Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a The experiment cars pedestrians 0.5 0.75 1 Uniform priors (bottom-up) Feature priors Feature + contextual (spatial) priors Humans 1st three fixations • Eye movements as proxy for attention • Dataset: - 100 street-scenes images with cars & pedestrians and 20 without • Experiment - 8 participants asked to count the number of cars/pedestrians - block design for cars and pedestrians - eye movements recorded using an infra-red eye tracker R O C a re a *similar (independent) results by Ehinger Hidalgo Torralba & Oliva ’10 Explains 92% of the inter-subject agreement! Bottom-up saliency and free-viewing L Xi I Fi N Method ROC area Bruce & Tsotos ’06 72.8% Itti et al ’01 72.7% Proposed 77.9% human eye data from Bruce & Tsotsos Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways • Attention as the inference process implemented by the interaction between ventral and dorsal areas - Integrates bottom-up and top-down (feature-based and context-based) attentional mechanisms - Seems consistent with known physiology Summary I • Goal of vision: - To solve the problem of ’what is where’ problem • Key assumptions: - ‘Attentional spotlight’ → recognition done sequentially, one object at a time - ‘What’ and ‘where’ independent pathways • Attention as the inference process implemented by the interaction between ventral and dorsal areas - Integrates bottom-up and top-down (feature-based and context-based) attentional mechanisms - Seems consistent with known physiology • Main attentional effects in the presence of clutters - Spatial attention reduces uncertainty over location and improves object recognition performance over first bottom-up (feedforward) pass The ‘readout’ approach The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n The ‘readout’ approach neuron 1 neuron 2 neuron 3 neuron n Pattern Classifier IT readout improves with attention + train readout classifier on isolated object IT readout improves with attention + train readout classifier on isolated object + test generalization in clutter IT readout improves with attention + test generalization in clutter unattended + train readout classifier on isolated object IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data + IT readout improves with attention Zhang Meyers Bichot Serre Poggio Desimone unpublished data +
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved