Saliency and Task-Based Eye Movement Prediction and Guidance

Size: px
Start display at page:

Download "Saliency and Task-Based Eye Movement Prediction and Guidance"

Transcription

1 Saliency and Task-Based Eye Movement Prediction and Guidance by Srinivas Sridharan Adissertationproposalsubmittedinpartialfulfillmentofthe requirements for the degree of Doctor of Philosophy in the B. Thomas Golisano College of Computing and Information and Sciences Rochester Institute of Technology February, 2015

2 B. THOMAS GOLISANO COLLEGE OF COMPUTING AND INFORMATION AND SCIENCES ROCHESTER INSTITUTE OF TECHNOLOGY ROCHESTER, NEW YORK CERTIFICATE OF APPROVAL Ph.D. DEGREE PROPOSAL The Ph.D. Degree Proposal of Srinivas Sridharan has been examined and approved by the dissertation committee as satisfactory for the dissertation required for the Ph.D. degree in Computing and Information Sciences Dr. Reynold J Bailey, Dissertation Advisor Date Coordinator Ph.D. Degree Program Date Dr. Joe M Geigel Dr. Anne Haake Dr. Linwei Wang ii

3 Saliency and Task-Based Eye Movement Prediction and Guidance by Srinivas Sridharan Submitted to the B. Thomas Golisano College of Computing and Information and Sciences in partial fulfillment of the requirements for the Doctor of Philosophy Degree at the Rochester Institute of Technology Abstract The ability to predict and guide viewer attention has important applications in computer graphics, image and scene understanding, object detection, visual search and training. Human eye movements have interested researchers as they provide insight into the cognitive processes involved in task performance. It has also interested researchers to understand what guides viewer attention in a scene. It has been shown that saliency in the image, scene context, and task at hand play a significant role in guiding attention. Many computational models have been proposed to predict regions in the scene that are most likely to attract human attention. These models primarily deal with bottom-up visual attention and typically involve free viewing of the scene. In this proposal we would like to develop a more comprehensive computational model for visual attention that uses scene context, scene saliency, task at hand, and eye movement data to predict future eye movements of the viewer. We would also like to explore the possibility of guiding viewer attention about the scene in a subtle manner based on the predicted gaze obtained from the model. Finally, we would like to tackle the challenging inverse problem - to infer task being performed by the viewer based on scene information and eye movement data. iii

4 Contents List of Tables List of Figures vi vii 1 Introduction 1 2 Saliency and Task Based Eye Movement Prediction Problem Definition Research Objectives and Contributions Background and Related Work Bottom-up Saliency Based Visual Attention Top-Down Cognition Based Visual Attention Proposed Approach Scene Context Extraction Saliency Map Generation Comprehensive Model Training Phase Testing Phase Evaluation Measurement Kullback-Leibler (KL) divergence Normalized Scanpath Saliency (NSS) Linear Correlation Coe cient (LCC) iv

5 Contents v 3 Adaptive Subtle Gaze Guidance Using Estimated Gaze Problem Definition Research Objectives and Contributions Background and Related Work Subtle Gaze Direction Proposed Approach Adaptive Subtle Gaze Direction Using Estimated Gaze Evaluation User Study Task Inference Problem Problem Definition Research Objectives and Contributions Background and Related Work Approach Evaluation User Study Timeline 37 6 Conclusion 38 A Eye Tracking Datasets 40 Bibliography 42

6 List of Tables A.1 Eye tracking datasets over still images. D, T and d columns stand for viewing distance in centimeters, stimuli presentation time in seconds and screen size in inches respectively. Reproduced from [1] vi

7 List of Figures 2.1 (A) This figure illustrates the information preserved by the global features for two images. (B) The average of the output magnitude of the multiscale-oriented filters on a polar plot. (C) The coe cients (global features) obtained by projecting the averaged output filters into the first 20 principal components (D) shows noise images with filtered outputs at 1, 2, 4 and 8 cycles per image, representing the gist of the scene and maintaining the spatial organization and texture characteristics of the original image. The texture contained in this representation is still relevant for scene categorization (e.g., open, closed, indoor, outdoor, natural or urban scenes).reproduced from [2] (a) Image shows the schematic representation of Koch and Ullman model to compute saliency model using primitive feature maps and the center surround neurophysiological properties of the human eye (b) Image shows the flowchart of the model developed by Itti to compute saliency map based on the Koch and Ullman model. This flowchart shows the filtering process involved, extraction of feature maps, center-surround normalization and also methods to combine feature maps to obtain the saliency map. Reproduced from [3] vii

8 List of Figures viii 2.4 Schematic diagram of the model for predicting task-based eye movements. The training phase shows the selected training images, eye tracking data on the training images, task based feature extraction, saliency map for each image and training fixations extracted from the eye tracking data. We combine all the features to obtain the training feature set and then perform PCA/ICA reduction if necessary to reduce the feature size. A Trainer is implemented to train on this feature set. Second half of the image shows the testing phase where similar testing features are extracted as in the training phase and then the Trainer is used to predict the eye position. This new predicted eye position can then be compared to testing fixations which server as ground-truth data Figure shows a mammogram image. The large red circle shows the area marked by the expert as an irregularity Hypothetical image with current fixation region F and predetermined region of interest A. Insetillustratesgeometricdotproducttocompute Gaze distributions for an image under static and modulated conditions. Input image (top). Gaze distribution for static image (bottom left). Gaze distribution for modulated image (bottom right). White crosses indicate locations preselected by researchers for modulation Image on left is the image viewed by the subject when assigned the task of counting the number of deer in the scene. The red circles in the image indicate viewer s fixation data. The image on right shows the corresponding task based saliency map, highlighting task relevant regions to direct the viewer s attention

9 List of Figures ix 4.1 Figure showing Image A, a street view image with eye tracking data where the task provided to viewer was to locate the cars in the scene. Image B shows a similar image that can be classified as a street image and also has cars in the image to be task relevant Figure shows the experiment conducted by Yarbus in Image on top-left shows the picture of Family and an unexpected visitor and the scanpaths of a subject for each task in the experiment while viewing the stimulus image Image (A) shows the eye movements of a subject when given the task to check the rear view mirrors. Image (B) shows the eye movements of the subject when given the task to check the gauges on the dashboard. The circle in red/green indicates the fixations made by the subject when performing the task. The number indicated inside the circle shows the order of fixations. Note that the subject also gathers information of the road, the GPS when performing the task at hand. 36

10 Chapter 1 Introduction Predicting human gaze behavior and guiding viewer attention in a given scene are very challenging tasks. Humans perform a wide variety of tasks such as navigation, reading, playing sports, and interacting with objects in the environment. Each task performed depends on the input from the environment and from memory about the task. Attention research has been concerned with understanding the input stimuli, neural mechanisms, information processing, memory, and the motor signals involved in task performance. Eye movements provide information about the regions attended in an image and gives insight about the underlying cognitive process [4]. Saliency in the image has been shown to guide attention, e.g. regions with high local contrast, high edge density, bright colors (bottom-up e ect) [9, 3, 10]. Humans are also immediately drawn to faces or regions with contextual information (top-down e ect) [8]. Finally the pattern of eye movements not only depends on the scene being viewed but also on the viewer s intent or task assigned [5, 6, 7]. Researchers continue to debate whether it is salient features, contextual information or both that ultimately drives attention during free viewing (no task specified) of static images [11, 12, 13, 14]. There are many computational models that predict regions that are most likely to attract viewer attention in a scene. These computational models are designed based on bottom-up attention, top-down attention or a combination of both. However, many of these models only consider free viewing and as such do not 1

11 2 take into account the impact of any specific task on eye movements. In the proposed work we plan to: 1. Develop and evaluate a comprehensive model of human visual attention prediction that incorporates: Scene context (GIST, SIFT, SURF, Bag of Words etc.) Bottom-up scene saliency Task at hand Eye movement data across multiple subjects 2. Develop and evaluate a novel adaptive approach to guide viewer attention about a scene that requires no permanent or overt changes to the scene being viewed, and has minimal impact on viewing experience. 3. Develop a framework for task inference based on scene information and eye movement data. This framework attempts to di erentiate eye movements for task performance versus eye movements made to gather information in the scene. In the next three chapters each of these research objectives are explained in more detail, specifically we provide the problem definition, background and related work, proposed approach and evaluation measures. Saliency and Task Based Eye Movement Prediction is presented in chapter 2, Adaptive Subtle Gaze Guidance Using Estimated Gaze is presented in chapter 3, and TaskInferenceProblem is presented in chapter 4. Timeline for the proposed work is presented in chapter 5. Chapter 6 presents the conclusion and highlights potential future work that is beyond the scope of this proposal. An appendix is provided listing several research datasets (images and corresponding eye movements) that will be utilized over the course of this work.

12 Chapter 2 Saliency and Task Based Eye Movement Prediction 2.1 Problem Definition Predicting gaze behavior of a human in a given scene is a very challenging task. There are multiple factors that influence human gaze behavior. The salient features in the scene, the task at hand and prior knowledge of the scene are some of the factors that highly influence gaze behavior. Visual saliency based models predict regions of interest that attract the gaze of a subject based on image features such as contrast, color, orientation etc [15, 16, 17, 18]. There are other top-down computational models that combine saliency maps and scene context. Some top down models use face detection, object detection, and image blobs with visual saliency to gather visual attention details in the scene [19, 20, 21]. The task being undertaken has averystronginfluenceondeploymentofattention[5]. It has been shown that humans process visual information in a need-based manner. We look for things that are relevant for the current task and pay less attention to irrelevant objects in the scene. Researchers have shown that there is a high correlation between visual cognition and eye movements when dealing with complex tasks [22]. When subjects are asked to perform a visually guided task their fixations were found to be on 3

13 2.2. Research Objectives and Contributions 4 task-relevant locations. This finding was established using the block-copying task where the subjects were asked to assemble building blocks, and it was shown that subjects eye movements reveal the algorithm used for completing the task [23]. Others have studied gaze behavior while performing tasks in natural environments such as driving, sports, walking, etc [6, 24, 22, 25]. The view of many is that both bottom-up and top-down factors are combined to direct our attention. There have been many computational models using Bayesian approaches to integrate top-down and bottom-up salient cues [26]. Eye-tracking technology helps to estimate visual attention of the subject while performing a task. Eye trackers provide fixation and saccade information in real-time that could give insight of top-down task based visual attention and the scene features provide the bottom-up saliency map. Many gaze prediction algorithms have been proposed based on image scene features and visual saliency maps [27, 28]. These computational models lack two key factors in gaze prediction, 1) these models seldomly account for the top-down visual attention that can be obtained by considering the scene s context and 2) the training data used to develop these models were obtained during free viewing and so does not take into account the impact of specific tasks on eye movements. Hence, there is a need for a comprehensive computational model of human visual attention prediction, that can identify regions in the scene most likely to be attended for a given task at hand. 2.2 Research Objectives and Contributions The goal of this aspect of the proposed work is to develop a comprehensive model of human visual attention prediction that incorporates, scene context, bottom-up scene saliency, task at hand and eye movement data obtained across multiple subjects to build a task based saliency map. The task based saliency map will predict regions in the scene that attract viewer s attention while performing an assigned task. The model is further trained to predict viewer gaze on new (related) stimuli images. Such amodelwillallowresearcherstogainmoreinsightintothetasksolvingbehavior

14 2.3. Background and Related Work 5 and also predict the task solving approach under di erent input conditions. The model can then be used to understand the subject gaze behavior for a given task and compare it with other tasks on similar stimuli. The proposed model will also help to address the time-consuming burden of creating manual annotations of the regions of images used in perception-related experiments. Our proposed model will be able to aid people performing repeated image search tasks, by suggesting regions of interest based on the predicted gaze. While gaze target prediction techniques are prone to false positives, they can sill be very valuable in providing additional suggestions for viewing. For example, consider a radiologist searching for abnormal regions in a mammogram. At the end of the task, our prediction system can suggest other regions to look which he/she might have missed. In this manner, the technology is seen as providing assistance rather than attempting to replace the expert. This model can also be used to study di erences in visual attention between subjects. Hence for a given task and gaze behavior it could be possible to di erentiate experts from non-experts. 2.3 Background and Related Work When we look around us, we perceive some objects in the scene to be more interesting than others. There are certain objects in the scene that pop-out and grab our attention over others. The drawing of our attention in this fashion is termed as bottom-up or saliency-based visual attention. Our focused attention can be thought of as a rapidly shifting spotlight and the areas focused are the salient regions in the scene. These salient regions can be represented as a 2-dimensional saliency map, that captures these regions of high attention. However, human visual attention is not plainly a feed-forward spatially selective filtering process. There is also cognitive feedback to the visual system to focus attention in a top-down manner. For example, there may be contextually relevant areas of the image (such as faces) that also draw our attention. Several computational models have been proposed to model bottom-up, top-down or both to understand visual attention.

15 2.3. Background and Related Work Bottom-up Saliency Based Visual Attention Saliency-based attention models are classified based on the saliency computation mechanism. Most saliency based models intend to highlight regions of interest that attract attention in the scene. Bottom-up visual saliency models can be broadly classified in several ways [29]: Cognitive Models: Models that closely relate the psychological and neurophysiological attributes of the human visual system to compute saliency. These models account for contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions [15, 30]. Bayesian Models: Models using Baye s rule to detect object of interest or saliency regions by probabilistically combining extracted scene features with known prior knowledge of the scene or scene context [17, 31]. Decision Theoretic Models: Visual attention is believed to produce decisions on the state of the scene being viewed such that there is an optimal decision based on minimizing the probability of error. Hence salient features can be defined as best recognized classes over all other visual classes available [32, 33]. InformationTheoretic Models: Information theoretic models define saliency to be regions computed that maximizes information sampled from a given scene. The most informative regions are selected from all possible regions available in the scene [34, 35]. Graphical Models: Visual attention models are computed based on eye movements. These eye movements when treated as a time series and the

16 2.3. Background and Related Work 7 hidden variables influencing these eye movements can be modeled using Hidden Markov Models, Dynamic Bayesian Networks and Conditional Random Fields [36, 18]. Spectral Analysis Models: Adigitizedsceneviewedinthespatialdomain can be converted to frequency domain and saliency models are derived based on the premise that similar regions in frequency domain imply redundancy. Such models are simpler to explain and compute but do not necessarily explain psychological and neurophysiological attributes of the human visual system [37, 38, 39] Top-Down Cognition Based Visual Attention Top-down models on the other hand are goal-driven or task-driven compared to bottom-up cues which are mainly based on characteristics of a visual scene [40]. Topdown visual attention models are determined by cognitive factors such as knowledge of the scene, expectations, rewards, tasks at hand and goals. Bottom-up attention being feed-forward tends to be both involuntary and fast. On the other hand, top-down attention is slow, task driven and voluntary. Top-down attention is also referred to as a closed-loop [41, 42]. Two major sources have been explored that influence top-down or cognition based visual attention. Scene context or layout of a scene have also been shown to influence eye movements and visual attention. For example, people are naturally attracted to faces and regions relevant to the scene. The second one is task-based, certain complex tasks such as driving or reading highly influence when and where humans fixate in the scene. It has been proposed that humans are able to exhibit interest towards targets by relatively changing the gains on di erent basic features which attract attention [43]. For example when asked to look for a specific colored object a higher gain will be assigned to search that particular color among the other available colors in the scene. A computational model was developed that integrate

17 2.4. Proposed Approach 8 the visual cues for target detection by maximizing the signal-to-noise ratio of target vs. background [44]. An evolutionary algorithm was also developed to search in the basic saliency model parameter-space for the target objects [20]. In comparison to gain adjustment for fixed feature detection, other top-down attention models were suggested, in which preferred features were obtained by tuning the width of feature detectors [45]. These models study the role of object features in visual search and are similar to the techniques of object detection in computer vision. However, these models are based on human visual attention as compared to computer vision models which use predefined feature templates for detecting/tracking cars, humans or faces [46, 47]. 2.4 Proposed Approach Both top-down and bottom-up computational models provide a single saliency map which indicate regions in the image that are most likely to be attended. The saliency maps are generated while free viewing and features detectors are tuned for specific image attributes. A major drawback for having such a saliency map is that the location with the highest saliency value does not necessarily translate to the region most attended, it has been shown that majority of fixations are towards task-relevant locations [22]. It is also very di cult to predict the order of attention from a single saliency map. The saliency maps provides us with a mask which eliminates locations that may be least attended and also locations which most likely attract the viewer s attention. In this aspect of the proposed work we aim to develop a comprehensive model of human visual attention that uses scene context, saliency map, task at hand and eye movement data to obtain a task based saliency map. The model is further trained to test if it is capable of predicting human fixations in other similar images for the same task.

18 2.4. Proposed Approach Scene Context Extraction Scene context plays a vital role in attracting visual attention to specific regions in the scene. Humans have a high degree of accuracy is describing a scene or image even with viewing times as low as 80ms. This ability enables humans to capture enough information to obtain a rough representation or gist of the scene [48]. This enables humans to quickly classify the scene such as indoor vs outdoor, urban vs. rural or natural vs. man-made. It has been shown that semantic associations play a vital role in guiding visual attention. When searching for shoes for example, humans are more likely to look for them on floor than on top of a table or on the ceiling [49, 50]. Several models utilizing low-level features have been presented to obtain the gist of the scene. A computational model was proposed that is based on the spatial envelope using low-dimensional representation of the scene. The model generates a multidimensional space reduced by applying principal-component-analysis and independent-component-analysis in which scenes sharing membership in semantic categories are projected [2]. Gabor filters were used on input images to extract a selected number of universal textons (from the training set using K-means clustering) [51]. Researchers have also used the biological center-surround features (receptive field) from the orientation, color and intensity channels for modeling gist [52]. Gist representation is a well-known field in computer vision as it provides a global scene information which is especially useful for searching scene databases with many images. It has also been used to limit the region for object search in a scene rather than processing it in the entireity. The most important use of gist representation is in the modeling of top-down attention [53, 54]. In this proposal we use gist of the scene [2] toobtainalow-dimensionalrepresentation of the scene that does not require explicit segmentation of image regions and objects. Gist refers to the meaningful information that an observer can identify from the glimpse of the scene. We use gist description to include the semantic label of the scene with few objects and their surface characteristics and layout. It represents the global properties of the space that the scene subtends and not neces-

19 2.4. Proposed Approach 10 sarily include individual objects that the scene contains. Every scene is defined by 8 categories namely naturalness, openness, expansion, depth, roughness, complexity, ruggedness and symmetry. Each scene is described as a vector of meaningful values indicating image s degree of naturalness openness, roughness, expansion, mean depth etc. The gist of the scene will help classify similar images and also provide a global low-dimensional representation for image groups. Figure 2.1 shows the two representative sample images, a polar plot showing average responses of multiscale-oriented filters on these images obtained by applying principal component analysis (global feature templates), global features projected in the first 20 principal components, and low-frequency representation (noise image) representing the gist maintaining the spatial organization and texture characteristics of the original image. Figure 2.1: (A) This figure illustrates the information preserved by the global features for two images. (B) The average of the output magnitude of the multiscaleoriented filters on a polar plot. (C) The coe cients (global features) obtained by projecting the averaged output filters into the first 20 principal components (D) shows noise images with filtered outputs at 1, 2, 4 and 8 cycles per image, representing the gist of the scene and maintaining the spatial organization and texture characteristics of the original image. The texture contained in this representation is still relevant for scene categorization (e.g., open, closed, indoor, outdoor, natural or urban scenes).reproduced from [2] Within image local features and between image features can be obtained after computing gist for each image. Feature detection algorithms such as Scale-invariant

20 2.4. Proposed Approach 11 feature transform (SIFT) [55], Speeded up Robust Feature (SURF) [56], Maximally Stable Extremal Regions (MSER) [57], Histogram of Oriented Gradients (HOG) etc can be used to identify key local features in the scene. Local feature detection and matching algorithms help identify regions that are similar within the image and also regions similar between classified images. This will further enable us to build scene context (region based) with similar features and group them into a labeled category. Alistofsuchcategories(e.g.Bag-of-words)canbeusedtoassociateregionsofthe scene to a task at hand Saliency Map Generation Most attention models are directly or indirectly inspired by the physiological or neurophysiological properties of the human eye. The basic model proposed by Itti et al. [3] uses four assumptions. First, visual input is represented in the form of topographic feature map. The feature maps are constructed based on the idea of center-surround representation of the features at di erent spatial scales and competition among feature for visual attention. The second assumption is that these feature maps are combined to give a single local saliency map of any location with respect to its neighborhood. Third, the maximum of the saliency map is the most salient location at a given time and it also helps determine the next location for attention shift. Fourth, attention is shifted to di erent parts of the stimuli based on the saliency map and the order of attention shift is represented by the decreasing order of saliency in the map. Figure 2.2 shows the schematic representation proposed by Koch and Ullman and the 2.3 shows model proposed by Itti et al. In the early model proposed by Koch and Ullman [58] low-levelfeaturesofthevisualsystemsuchascolor,intensity and orientation were computed to obtain a set of pre-attentive feature maps which were based on the retinal input to the eye. The activity of all these feature maps were combined for a given location. This combination of feature maps provide the topographic saliency map. A simple winner-take-all network was designed to detect the most salient location. The second part of the image shows the schematic

21 2.4. Proposed Approach 12 Figure 2.2: (a) Image shows the schematic representation of Koch and Ullman model to compute saliency model using primitive feature maps and the center surround neurophysiological properties of the human eye Figure 2.3: (b) Image shows the flowchart of the model developed by Itti to compute saliency map based on the Koch and Ullman model. This flowchart shows the filtering process involved, extraction of feature maps, center-surround normalization and also methods to combine feature maps to obtain the saliency map. Reproduced from [3]

22 2.5. Comprehensive Model 13 diagram used for the study which was built on the Koch and Ullman architecture and provides a complete implementation of all stages. Multi-scale spatial images (eight spatial scales per channel) are computed and the center-surround di erences for each feature (3 features) is computed to obtain the local spatial feature map (42 feature maps). A lateral inhibition scheme is used to initiate competition for saliency with the feature map. These individual feature maps are then combined to form a single conspicuity map for each feature type. The conspicuity maps are then combined to obtain a single topographic saliency map. 2.5 Comprehensive Model We propose a comprehensive computational model of human visual attention prediction, that can identify regions in the scene most likely to be attended from the scene context, saliency map, task at hand and eye movement data of the subject. A set of images from publicly available image databases are chosen and the gist and saliency maps for these images are pre-computed. The task to be performed when viewing these set images is determined ahead of time. Subject s eye movements are recorded for these images while performing the given task for a specified period of time. The images are then randomly divided into training and testing datasets. Figure 2.4 shows the schematic representation of the proposed model. The model is divided into a Training and Testing phase and each phase is explained in detail below Training Phase The stimuli images are randomly separated into training and testing images. The images for training are eye tracked using a remote eye-tracker. Fixation data is gathered from subjects looking at images in the training set using an eye-tracking device. A fixation map (averaged across all subjects) is then created. The eye tracking data is then split into two-groups n-initial fixations which serve as a feature vector to the model, remaining fixations are used as data to train the model. The

23 2.5. Comprehensive Model 14 Figure 2.4: Schematic diagram of the model for predicting task-based eye movements. The training phase shows the selected training images, eye tracking data on the training images, task based feature extraction, saliency map for each image and training fixations extracted from the eye tracking data. We combine all the features to obtain the training feature set and then perform PCA/ICA reduction if necessary to reduce the feature size. A Trainer is implemented to train on this feature set. Second half of the image shows the testing phase where similar testing features are extracted as in the training phase and then the Trainer is used to predict the eye position. This new predicted eye position can then be compared to testing fixations which server as ground-truth data. saliency map obtained from saliency based visual attention model is also used as a feature vector. The gist and local image-based features are also extracted and are provided as an input to the model. The task at hand is encoded as an independent variable to the model. Using the gist, local image features, saliency map, task at hand and eye tracking data the final feature vector is generated. This feature vector will be of very high dimensionality, hence the feature space is reduced using techniques such as principle component analysis (PCA) or Independent Component Analysis(ICA). The stimuli images are then trained (linear model, neural-network,

24 2.6. Evaluation Measurement 15 gaussian mixture, support vector machines) with the reduced features. The learning algorithm is now trained for the saliency, scene context (gist + additional local features) and n-initial eye movement data. The final learning model (Trainer) will assign weights to features based on training fixations. To counter over-learning bias the training images are split randomly using 80/20 rule. The model will learn on 80% of the images in the training dataset and will be tested on the remaining 20% of the images to re-parameterize the model Testing Phase The stimuli images which have not been used for training are used as testing images. Fixation data from several subjects is also gathered for these images using a remote eye-tracker. The eye tracking data is preprocessed and split into two-groups, n- initial fixations which serve as a feature to the model (similar to Training phase) and remaining fixations that act as ground-truth data. Similar to the training phase, saliency based features are extracted, the saliency map is obtained and is used as feature to the model. The gist, local image-based features are also extracted as an input. The final features are reduced in dimension using PCA/ICA and is provided as input to the learnt model. The output of the model is the predicted gaze position (point-based or region-based). This predicted gaze position is then compared to the ground truth data (remaining fixations). 2.6 Evaluation Measurement Many attention and gaze prediction models are validated against eye tracking data of human observers. Eye movements provide an easy mechanism to understand the cognitive process involved in image perception and how eye movements vary with task. We can evaluate the predicted gaze obtained from the model to that of the eye movement data obtained from the human observer viewing the scene. The evaluation can be classified as 1) point-based 2) region based and 3) subjective evaluation. In the point-based approach the predicted gaze points are compared with

25 2.6. Evaluation Measurement 16 the ground-truth eye tracking gaze point and a distance measure can be obtained. In the region based approach instead of evaluating a single gaze point we compare the estimated gaze region to that of the region of fixations from multiple subjects. Subjective scores can also be obtained from experts to evaluate estimated gaze on a Likert scale. However, subjective evaluation is time-consuming, error-prone and is not quantitative compared to methods 1 and 2. In literature the following are the widely used evaluation techniques Kullback-Leibler (KL) divergence. KL divergence also known as information divergence, is a measure of di erences between two probability distributions P and Q and is denoted as D KL (P k Q). In the context of saliency and gaze prediction it is used as a distance metric between distributions. P is the discrete probability distribution of the predicted gaze and Q is the ground-truth distribution. Models that can predict human fixations exhibit higher KL divergence since human subjects will fixate on fewer regions (with maximum response) and will avoid most of the regions with lower response from the model [59]. KL divergence is both sensitive to any di erences between distributions and is invariant to reparameterizations thereby not a ecting the scoring Normalized Scanpath Saliency (NSS). The normalized scan path saliency is defined as the response value at a given position on the predicted gaze region which is normalized to have a zero mean and unit standard deviation NSS = 1 (S(x, y) µ S). NSS is computed once for each fixation s and subsequently the mean and stander error are computed across the set of NSS scores. When the value of NSS is 1 it indicates that the subject s eye fixation fall in aregionwherethepredictedgazeisonesstandarddeviationaboveaverage. Wecan also say that a NSS value of 1 show that the region predicted is the most probable region to fixate than any other region on the image [60]. Whereas a NSS value of 0 show that the model does not perform any better than randomly picking a fixation

26 2.6. Evaluation Measurement 17 location in the scene Linear Correlation Coe cient (LCC). The linear correlation coe cient measures the strength of a linear relationship between two variables being measured. LCC measure is widely used to compare two images for registration, features, object recognition and disparity measurement. LCC(Q, P )= P x,y (Q(x, y) µ Q).(P (x, y) µ P ) q (2.1) 2 Q. 2 P In equation 2.1 P and Q represent the predicted fixation region and ground truth subject s fixation in the region x, y respectively. µ and 2 represent the mean and variance of the pixel values in the region around x and y. The advantage of using LCC is that it is bounded in comparison to KL divergence and it is easy to compute than NSS or AUC. A correlation value of +1/ 1indicatethatthereisaperfect linear relationship between the two variables and a value of 0 indicate no correlation.

27 Chapter 3 Adaptive Subtle Gaze Guidance Using Estimated Gaze 3.1 Problem Definition The previous chapter focused on the problem of gaze prediction. In this chapter we focus on the related problem of gaze guidance. When viewing traditional static images the viewer s gaze pattern is guided by a variety of influences (bottom-up and top-down). For example, the pattern of eye movements may depend on the viewer s intent or task [5, 6]. Image content also plays a significant role. For example, it is natural for humans to be immediately drawn to faces or other informative regions of an image [8]. Additionally, research has shown that our gaze is drawn to regions of high local contrast or high edge density [9, 10]. Although traditional images are limited to these passive modes of influencing gaze patterns, digital media o ers the opportunity for active gaze control. The ability to direct a viewer s attention has important applications in computer graphics, data visualization, image analysis, and training. Existing computer-based gaze manipulation techniques, which direct a viewer s attention about a display, have been shown to be e ective for spatial learning, search task completion, and medical training applications. We propose a novel mechanism for guiding visual attention 18

28 3.2. Research Objectives and Contributions 19 about a scene. Our proposed approach guides the viewer in a manner that has minimal impact to the viewing experience. It also requires no permanent alterations to the scene to highlight areas of interest. Previous work on guiding visual attention typically involved having the researchers manually select the relevant regions of the scene. This process is slow and tedious. We propose to overcome this issue by combining our gaze guidance technique with our gaze prediction framework. While gaze prediction techniques are prone to false positives, they can still be very valuable in providing additional suggestions for viewing. 3.2 Research Objectives and Contributions Our proposed gaze guidance mechanism will be developed with the following goals in mind It should perform in real time It should adapt to image/ scene content as well as viewing configuration It Should adapt to the task assigned to the viewer The technique should be subtle and have minimal impact on viewing experience The proposed model is adaptive (real-time) in selecting task relevant regions in the image based on the regions not previously fixated by the user and the taskbased saliency map. These predicted regions from the model are used to actively map regions in the scene to guide the viewer s attention. The adaptive model can highlight task relevant regions that have not been viewed or other salient regions in the image to assist the viewer for task completion. An adaptive gaze guidance technique will enable researchers to quickly and accurately direct viewer s attention to unattended relevant regions in the image. Such a model is novel as it selects regions of interest in the image to guide a viewer based

29 3.3. Background and Related Work 20 on the current viewing pattern. The location and order of fixations of no two viewers is the same, hence manually pre-selecting regions to guide attention (as done in previous work) is not ideal. A gaze guidance model of this nature eliminates the need for manual intervention and also adapts in real-time for each image being viewed. The model can learn over time and also provide assistance to the viewer in real-time while performing the task at hand. Our adaptive subtle gaze guidance technique can also be deployed in psychophyisical experiments involving short-term information recall, learning, visual search and problem solving tasks. 3.3 Background and Related Work Jonides [62] exploredthedi erencesbetweenvoluntaryandinvoluntaryattention shifts and referred to cues which trigger involuntary eye-movements as pull cues. Computer based techniques for providing these pull cues are often overt. These include simulating the depth-of-field e ect from traditional photography to bring di erent areas of an image in or out of focus or directly marking up on the image to highlight areas of interest [63, 64]. The issue with these types of approaches is that they require permanent, overt changes to the image which impacts the overall viewing experience and may even hide or obscure important information in the image. Figure 3.1 for example shows a mammogram that has a red circle highlighted to visually identify abnormal region in the image. Actively guiding viewer s attention to relevant information has been shown to improve problem solving [64, 65]. Guiding attention has shown to enhance spatial learning by improving the recollection of location, size and shape of objects in images [66, 67, 68]. It has also been shown to improve training, learning and education[71, 72, 73, 74]. Gaze manipulation strategies have also been used for improving performance on visual search tasks by either guiding attention to previously unattended regions [69] orguidingattentiondirectlytotherelevantregions in a scene [70]. Subtle techniques have been proposed to guide viewer s attention e ectively to regions of interest in a scene using remote eye trackers [61].

30 3.3. Background and Related Work 21 Figure 3.1: Figure shows a mammogram image. The large red circle shows the area marked by the expert as an irregularity. Our proposed approach is based on Subtle Gaze Direction (SGD) technique that works by briefly introducing motion cues (image-space modulations) to the peripheral region of the field of view [61]. Since the human visual system is highly sensitive to motion, these brief modulations serve as excellent pull cues. To achieve subtlety these modulations are presented only to the peripheral regions of the field of view. This is determined by using a real-time eye tracking device. The eye tracker provides us the current gaze position thereby giving us the accurate location of where the subject is foveated. These peripheral modulations are terminated before the viewer can scrutinize them with their high acuity foveal vision Subtle Gaze Direction Figure 3.2 shows a hypothetical image, suppose the goal is to direct the viewer s gaze to some predetermined area of interest A. LetF be the position of the last recorded

31 3.3. Background and Related Work 22 Figure 3.2: Hypothetical image with current fixation region F and predetermined region of interest A. Inset illustrates geometric dot product to compute. fixation and let ~V,bethevelocityofcurrentsaccade,let ~W be the vector from F to A, andlet be the angle between ~V and ~W. Modulations are performed on the pixel region A. Once the modulation commences, saccadic velocity is monitored using feedback from an eye tracker and the angle is continually updated using the geometric interpretation of the dot product. A small value of indicates that the center of gaze is moving towards the modulated region. In such cases, modulation is terminated immediately. This contributes to the overall subtlety of the technique. By repeating this process for other predetermined areas of interest, the viewer s gaze is directed about the scene. A user study conducted with 10 participants showed that the activation time (from start of the modulation to detection of movement towards modulation) was 0.5 seconds for nearly 75% of the target regions, indicating that participants responded to the majority of modulations. Nearly, 70% of the fixations were within one perceptual span from the modulation and 93% were within two perceptual spans. Finally, figure 3.3 shows that it is possible to guide viewer s attention to regions of interest in a subtle manner. The user study shows that it is possible to guide subject s attention to relevant regions of the scene, while these observations show that the SGD

32 3.3. Background and Related Work 23 Figure 3.3: Gaze distributions for an image under static and modulated conditions. Input image (top). Gaze distribution for static image (bottom left). Gaze distribution for modulated image (bottom right). White crosses indicate locations preselected by researchers for modulation. technique is successful in directing gaze it does not necessarily mean that the viewer fully processed the visual details of the modulated regions or remembered them. To better understand the impact of Subtle Gaze Direction on short-term spatial information recall and its applicability for training scenarios, we have already conducted several studies. See [68, 71, 72, 75] formoredetails.

33 3.4. Proposed Approach Proposed Approach Our approach combines the subtle gaze direction technique with the saliency and task based eye movement prediction model (in chapter 2) toactivelyandadaptively guide viewer s attention to task relevant regions in the scene. By combining the two methods we can guide viewer s attention in real-time based on the predicted gaze obtained from the comprehensive model and also achieve subtlety to ensure that there is minimal impact on the overall viewing experience Adaptive Subtle Gaze Direction Using Estimated Gaze The biggest challenge for gaze guidance is that the next fixation of the viewer is not available ahead of time, it has to be computed based on the direction of the movement of eye (saccade velocity) with the help of an eye tracker. Also the regions where the subject s gaze is to be guided is pre-computed manually and the sequence of regions are fixed ahead of time. This approach is both time consuming and cumbersome. Each viewer s scanpath is unique and changes based on the task at hand. Our saliency and task-based eye movement prediction model can be used to automatically generate task relevant regions for the gaze guidance technique. Figure 3.4: Image on left is the image viewed by the subject when assigned the task of counting the number of deer in the scene. The red circles in the image indicate viewer s fixation data. The image on right shows the corresponding task based saliency map, highlighting task relevant regions to direct the viewer s attention.

34 3.5. Evaluation 25 Figure 3.4 shows an image viewed by the subject when the task assigned was to count the number of deer in the scene. The corresponding task based saliency map image is shown on right, highlighting the regions in the image that are task relevant. The subject is eye tracked during the task and our proposed model predicts the gaze of the user based on the series of fixations recorded. The intensity map (right image) highlights the priority and relevance to the task regions in the scene. Task relevant regions are placed in a queue based on their saliency value and are moved to the end or popped once the subject has scrutinized the region for a desired duration of time. The model will then be able to guide the subject using SGD to these task relevant regions if previously unattended. The viewer s gaze is directed to task relevant regions by presenting a brief luminance modulation to the peripheral region of the field of view. The modulation is terminated as soon as the direction of saccade is towards the region of interest. This approach makes sure that our model is able to subtly guide viewer attention to task relevant regions that are previously unattended by the subject, and ensures that maximum visual coverage is achieved for successful completion of the task. 3.5 Evaluation User Study The goal of the user study will be to test the e ectiveness of the adaptive subtle gaze guidance using estimated gaze from the proposed model. Participants are chosen randomly and eye tracked while viewing a collection of static images. All participants are chosen to have normal or corrected-to-normal vision with no cases of color blindness. Each participant will undergo a brief calibration procedure to ensure proper eye-tracking. The images are pre-processed and the saliency map, gist and local image features are computed along with the previously recorded eye movement data as mentioned in chapter 2. After viewing the scene for a short period of time, the model gathers eye movement data of the subject in real-time and attempts to guide their attention to task-based relevant regions that are unat-

35 3.5. Evaluation 26 tended. This ensures that all task-relevant regions are attended, and the image is su ciently scrutinized to successfully complete the task at hand. The relevant regions are highlighted by briefly projecting motion cues (image-space modulations) to the peripheral region of the field of view. Eye tracking data and scene stimuli from each subject are recorded and the accuracy of performance will be computed against acontrolgroupthatisnotguidedusingadaptivesubtlegazeguidancetechnique. The following methods will be used to evaluate performance: Activation Time Activation time is defined as the time elapsed between the start of the modulation and the detection of movement in the direction of the modulation. As shown in the subtle gaze direction technique [61], that the criteria for terminating the modulation was met within 0.5 seconds for approximately 75 percent of the target regions and within 1 second for approximately 90 percent of the target regions, indicating that the participants responded to the majority of the modulations. The adaptive subtle gaze guidance should be tested to ensure that the activation time is similar or better than the SGD technique. In the SGD technique modulated regions were manually pre-selected to ensure the faster onset and termination of visual cues. In case of adaptive subtle gaze guidance the model has to predict the next possible fixation and also keep an account of the sequence of fixations previously made by the viewer. The model has to run in real-time, and accurately predict the next task-relevant gaze location to decide if viewer s attention is to be guided to this new location. Accuracy Measurement For tasks involving problem solving, training or visual search it is important to measure the accuracy of performance. The adaptive gaze guided group is compared with a static group to see if they performed significantly better for the given task at hand. The accuracy of the groups is evaluated using the following: Binary Classification Statistics

36 3.5. Evaluation 27 Binary classification statistics [76] canbeusedtoestablishmeasuresofaccuracy as well as sensitivity and specificity. To calculate these properties it is necessary to categorize the test outcomes as true positives(tp), true negatives(tn), false positives(fp), and false negatives(fn). Sensitivity is computed as follows: Sensitivity = (#oft P ) 100 (3.1) (#oft P +#off N) Specificity is defined as follows: Specif icity = (#oft N) 100 (3.2) (#oft N +#off P ) The sensitivity and specificity values can then be combined to produce a binary classification based measure of accuracy as follows: Accuracy = (#oft P +#oft N) 100 (3.3) (#oft P +#oft N +#off N +#off P ) The accuracy value can be compared between adaptive subtle gaze guided and control groups, higher accuracy would indicate better performance of the task at hand. Area Under Curve (AUC) Area Under Curve (AUC) or Receiver Operating Characteristic (ROC) curve can be used as a binary classifier system with a variable threshold. AUC or area under ROC curve is used to assess the performance of the adaptive gaze guidance technique. A value of 1 on the curve indicates perfect classification of task relevant regions. The ROC curve is e ectively used to test if the regions selected by the viewers is better-than random classification. This measure will ensure to see if the control group or adaptive gaze guided group performance is not by random chance. AUC or ROC along with accuracy measure from

37 3.5. Evaluation 28 binary classification will provide a complete result of the group performance. Levenshtein Distance Levenshtein distance [77, 78] isastringmetric, developed in the field of information theory and computer science to compute di erences between sequences. Levenshtein distance provides an appropriate measure to compare distances between task that require an ordered sequences. To accurately compare sequences using Levenshtein distance the correct (intended) viewing order of each image is converted into a string sequence. All responses from each participant are also converted to an appropriate string sequence in order to facilitate comparison to the correct sequence. Since the number of relevant regions varies across the images we normalize the distance measure computed for each image by dividing by the number of correct regions for the task. Each correct region is assigned a label. Suppose for the eight relevant regions in the scene the correct viewing order is [ABCDEFGH]. A Levenshtein distance of 0 ([ABCDEFGH]) would indicate no di erence, whereas a distance of 8 ([DCBAGHFE]) would indicate the maximum di erence.

38 Chapter 4 Task Inference Problem 4.1 Problem Definition It has been shown that task at hand influences visual attention greatly. The best known example for task based top down attention was proposed by Yarbus in 1967 [5, 79]. Eye movements convey vital information of the cognitive process involved when performing task such as driving, reading, visual search and scene understanding. Eye movements reveal the shift in attention and a sequence of eye movements highly relate to the task at hand. The di culty and complexity of the task also significantly influences eye movement. This is based on the assumption that eye-movements and visual cognition are highly correlated [22]. Eye movement can be used both as data to understand the underlying cognitive process and also to validate the computational models on visual attention. Thus, eye movements are used to better represent the task at hand and the fixations extracted are used as features for the computational model as described chapter 2. However the inverse of this process, to determine the task at hand from eye movement data is very di cult. Eye movements are made to perform a task at hand, also to gather additional information in the scene while performing the task. Salient regions in the image that are task irrelevant also attract visual attention. The process of di erentiating eye movements as task-based and information-based is the holy grail of eye tracking. 29

39 4.1. Problem Definition 30 Hence the task inference problem can be defined as identifying the task performed by the user while viewing the scene with the help of image features and real-time eye movement data. A generic model to predict the task at hand from eye movement data is far from reach. The problem needs to be simplified as, for a given set of stimuli images and relevant tasks is it possible to identify the task based on eye movement data? It is also important to extend the idea to any new image which can be classified to an existing image group in the data set and has relevance to the task defined for that image group. For example, if image A in figure 4.1 belongs to street image group and the task is to locate the cars in the image, then another image B can be a new stimulus image that can be classified as a street image and the provided task is applicable to it. Figure 4.1: Figure showing Image A, a street view image with eye tracking data where the task provided to viewer was to locate the cars in the scene. Image B shows a similar image that can be classified as a street image and also has cars in the image to be task relevant. The problem can now be defined as, given a set of p images in a group (i 1..p I) andeyemovementdataforeachimage(e i )inthegroupforn assigned tasks (t 1..n T ). The model should be able to identify the task performed by the viewer for a new stimuli image i new which can be classified in I when the tasks t 1..n T are relevant to it.

40 4.2. Research Objectives and Contributions Research Objectives and Contributions In this chapter the objective is to develop a framework for predicting the task being undertaken based on the scene context, bottom-up scene saliency, and eye movement data. The model is initially trained on a set of training stimuli images to compute the task-based saliency maps for the task at hand across multiple human subjects. A human observer (not part of training phase) is then presented with a new image that is similar to the training images (scene classification) and is relevant to the task at hand. The model has to then accurately in real-time predict the task performed by the user on the stimuli image based on the eye movement data gathered. Such a model can be used to obtain vital information about the viewer s intent in repeated image search tasks. For example, TSA experts look for certain specific objects (hazardous or harmful) in the image and this search process is highly repetitive. The model can be tuned to certain specific image groups enabling assistance in the visual search process. The idea can be extensively used in many image search tasks. It can also be used in training and learning environments to better understand the viewer s eye movements. Finally, this model can also provide a rich data-set of stimulus images and corresponding task dependent eye movements which can serve as ground-truth information for other visual attention models. This dataset can also be used for validation and to conduct empirical and performance studies with di erent saliency computational models. 4.3 Background and Related Work Yarbus showed that eye movement not only depends on the scene presented but also on current task at hand. Subjects were asked to watch a picture (a room with a family and an unexpected visitor entering the room) under di erent task conditions involving guessing the age, material circumstances of the family, reaction of the family and free viewing of the image. Figure 4.2 shows the scan path of a subject for the various tasks while viewing the same stimulus image. Attention in humans has also been broadly di erentiated on the basis of its at-

41 4.3. Background and Related Work 32 Figure 4.2: Figure shows the experiment conducted by Yarbus in Image on top-left shows the picture of Family and an unexpected visitor and the scanpaths of a subject for each task in the experiment while viewing the stimulus image. tribute namely covert and overt. Overtattentionistheprocessofdirecting the fovea towards a desired object or stimuli to fixate on the object and gather information. Covert attention on the other hand, is while focusing on an object simultaneously gathering information on surrounding objects without necessarily making an eye movement. An example of covert attention is while driving, the driver while focusing on the road covertly keeps tracks of his gauges, road signs and tra c lights. The theory behind covert attention is to quickly gather information on other interesting objects or features in the scene other than the one currently fixated. The reason for covert attention is due to the physiology of the eye that

42 4.4. Approach 33 maps slow saccades to other locations in the scene to gather interesting information for the next fixation [80]. However, researchers are still trying to understand the complex interactions between overt and covert attention. Many computational models try to find the regions that attract eye fixation and explain the process of overt attention. However, there are no computational frameworks that explain the reasons and mechanisms of covert attention and there is also no known measure for covert attention. Thus, visual saliency models cover the likelihood for a region in the scene being attended to but cannot explain whether the information gathered is through covert or overt attention. Most models predict very specific tasks such as locating humans which require these models to detect human faces [81, 47], skin color [82], skeletal structure and posture to detect humans in the scene. There have been approaches to detect specific features such as skin, faces, horizontal and vertical lines, curvature, corners and crosses, shapes texture, depth etc. These features enable us to di erentiate the salient region and to group similar regions based on the above features. The other approaches use specific object detection and scene classification techniques to identify images of interest. Other models predict the gaze of the subjects for a very specific task within a controlled setup [53, 83]. However there are no known models that use eye movement data to predict the task being performed by the user on similar images. 4.4 Approach Task inference using only eye movements is a very complex problem and it has been shown that it is extremely di cult to di erentiate task related fixations to other fixations in the scene. The comprehensive computational model of human visual attention prediction proposed uses saliency, gist, local image-based features, eye movement data and also encodes the task at hand to predict viewer s gaze. The model proposed will narrow down the regions of interest in the scene from saliency map or just eye movement data. When the task-based saliency map of the model

43 4.5. Evaluation 34 is generated with the eye movement data from multiple subjects the task relevant regions will be isolated. Thus predicting the task relevant gaze position and the real-time eye movement data will enable the model to run as a controlled feedback loop to predict the task being performed by the user. For example, in a driving scenario for a task to locate speed signs, the fixations are going to be on a speed sign (if present) in the scene. If there are multiple speed signs the attention is going to shift from one sign to another. Figure 4.3 shows the eye tracked image of a person driving a virtual truck. In image A the subject is given a task to view tra c using the rear-view mirrors. Whereas in image B the task provided to the subject was to monitor the gauges and other instruments while driving. In images A and B it can be clearly seen that there are fixation on the task-relevant regions and there are other fixations made to gather additional information. The proposed model will predict the task-relevant regions and for all the tasks specified for the image, and will infer the task performed by the user based on real-time eye movement data. 4.5 Evaluation Eye tracking has been extensively used to validate visual attention and gaze guidance models. Researchers manually record and view eye movement data to understand the cognitive process involved while performing a task. The proposed model will predict the task being performed by the user using image saliency map, gist, local imagebased features and pre-computed eye movement data. It is necessary to evaluate the performance of the model on task inference. The accuracy of the model is 100% if it predicted the task performed correctly and 0% if it failed to predict the task performed by the user User Study The goal of the user study is to test the accuracy of the model in predicting the task performed by the user given the image saliency map, gist, local image-based

44 4.5. Evaluation 35 features and pre-computed eye movement data. The model would have already been evaluated on gaze prediction as described chapter 2. A user study will be conducted to evaluate how quickly and accurately the model perform the task inference. Participants will be chosen randomly and eye tracked while viewing a collection of static images that have not been trained by the model. All participants are chosen to have normal or corrected-to-normal vision with no cases of color blindness. Each participant will be assigned a specific task while viewing the image for the specified period of time and the model will attempt to infer the task being performed. At the end of the study the task assigned to the subject will be compared with the task inferred by the model. The speed and accuracy of the model will be tested for each image and each image group overall. The eye movement data of the subject is recorded and the fixation map will be compared to the task based saliency map computed by the model. The evaluation measures discussed in chapter 2 section 2.6 can be used to compare the fixation distribution to the task based saliency map. A binary classification test can also be performed as described in chapter 3 section 3.5.

45 4.5. Evaluation 36 Figure 4.3: Image (A) shows the eye movements of a subject when given the task to check the rear view mirrors. Image (B) shows the eye movements of the subject when given the task to check the gauges on the dashboard. The circle in red/green indicates the fixations made by the subject when performing the task. The number indicated inside the circle shows the order of fixations. Note that the subject also gathers information of the road, the GPS when performing the task at hand.

Visual Search using Principal Component Analysis

Visual Search using Principal Component Analysis Visual Search using Principal Component Analysis Project Report Umesh Rajashekar EE381K - Multidimensional Digital Signal Processing FALL 2000 The University of Texas at Austin Abstract The development

More information

Comparing Computer-predicted Fixations to Human Gaze

Comparing Computer-predicted Fixations to Human Gaze Comparing Computer-predicted Fixations to Human Gaze Yanxiang Wu School of Computing Clemson University yanxiaw@clemson.edu Andrew T Duchowski School of Computing Clemson University andrewd@cs.clemson.edu

More information

Object Perception. 23 August PSY Object & Scene 1

Object Perception. 23 August PSY Object & Scene 1 Object Perception Perceiving an object involves many cognitive processes, including recognition (memory), attention, learning, expertise. The first step is feature extraction, the second is feature grouping

More information

ABSTRACT. Keywords: Color image differences, image appearance, image quality, vision modeling 1. INTRODUCTION

ABSTRACT. Keywords: Color image differences, image appearance, image quality, vision modeling 1. INTRODUCTION Measuring Images: Differences, Quality, and Appearance Garrett M. Johnson * and Mark D. Fairchild Munsell Color Science Laboratory, Chester F. Carlson Center for Imaging Science, Rochester Institute of

More information

Human Vision and Human-Computer Interaction. Much content from Jeff Johnson, UI Wizards, Inc.

Human Vision and Human-Computer Interaction. Much content from Jeff Johnson, UI Wizards, Inc. Human Vision and Human-Computer Interaction Much content from Jeff Johnson, UI Wizards, Inc. are these guidelines grounded in perceptual psychology and how can we apply them intelligently? Mach bands:

More information

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods 19 An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods T.Arunachalam* Post Graduate Student, P.G. Dept. of Computer Science, Govt Arts College, Melur - 625 106 Email-Arunac682@gmail.com

More information

Insights into High-level Visual Perception

Insights into High-level Visual Perception Insights into High-level Visual Perception or Where You Look is What You Get Jeff B. Pelz Visual Perception Laboratory Carlson Center for Imaging Science Rochester Institute of Technology Students Roxanne

More information

A Foveated Visual Tracking Chip

A Foveated Visual Tracking Chip TP 2.1: A Foveated Visual Tracking Chip Ralph Etienne-Cummings¹, ², Jan Van der Spiegel¹, ³, Paul Mueller¹, Mao-zhu Zhang¹ ¹Corticon Inc., Philadelphia, PA ²Department of Electrical Engineering, Southern

More information

A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures

A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures A Robust Neural Robot Navigation Using a Combination of Deliberative and Reactive Control Architectures D.M. Rojas Castro, A. Revel and M. Ménard * Laboratory of Informatics, Image and Interaction (L3I)

More information

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Visual Interpretation of Hand Gestures as a Practical Interface Modality Visual Interpretation of Hand Gestures as a Practical Interface Modality Frederik C. M. Kjeldsen Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate

More information

Analysis of Gaze on Optical Illusions

Analysis of Gaze on Optical Illusions Analysis of Gaze on Optical Illusions Thomas Rapp School of Computing Clemson University Clemson, South Carolina 29634 tsrapp@g.clemson.edu Abstract A comparison of human gaze patterns on illusions before

More information

Thesis: Bio-Inspired Vision Model Implementation In Compressed Surveillance Videos by. Saman Poursoltan. Thesis submitted for the degree of

Thesis: Bio-Inspired Vision Model Implementation In Compressed Surveillance Videos by. Saman Poursoltan. Thesis submitted for the degree of Thesis: Bio-Inspired Vision Model Implementation In Compressed Surveillance Videos by Saman Poursoltan Thesis submitted for the degree of Doctor of Philosophy in Electrical and Electronic Engineering University

More information

Modulating motion-induced blindness with depth ordering and surface completion

Modulating motion-induced blindness with depth ordering and surface completion Vision Research 42 (2002) 2731 2735 www.elsevier.com/locate/visres Modulating motion-induced blindness with depth ordering and surface completion Erich W. Graf *, Wendy J. Adams, Martin Lages Department

More information

Image Extraction using Image Mining Technique

Image Extraction using Image Mining Technique IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 9 (September. 2013), V2 PP 36-42 Image Extraction using Image Mining Technique Prof. Samir Kumar Bandyopadhyay,

More information

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and 8.1 INTRODUCTION In this chapter, we will study and discuss some fundamental techniques for image processing and image analysis, with a few examples of routines developed for certain purposes. 8.2 IMAGE

More information

MICA at ImageClef 2013 Plant Identification Task

MICA at ImageClef 2013 Plant Identification Task MICA at ImageClef 2013 Plant Identification Task Thi-Lan LE, Ngoc-Hai PHAM International Research Institute MICA UMI2954 HUST Thi-Lan.LE@mica.edu.vn, Ngoc-Hai.Pham@mica.edu.vn I. Introduction In the framework

More information

Evaluating Context-Aware Saliency Detection Method

Evaluating Context-Aware Saliency Detection Method Evaluating Context-Aware Saliency Detection Method Christine Sawyer Santa Barbara City College Computer Science & Mechanical Engineering Funding: Office of Naval Research Defense University Research Instrumentation

More information

Low-Frequency Transient Visual Oscillations in the Fly

Low-Frequency Transient Visual Oscillations in the Fly Kate Denning Biophysics Laboratory, UCSD Spring 2004 Low-Frequency Transient Visual Oscillations in the Fly ABSTRACT Low-frequency oscillations were observed near the H1 cell in the fly. Using coherence

More information

IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING

IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING IMAGE PROCESSING PAPER PRESENTATION ON IMAGE PROCESSING PRESENTED BY S PRADEEP K SUNIL KUMAR III BTECH-II SEM, III BTECH-II SEM, C.S.E. C.S.E. pradeep585singana@gmail.com sunilkumar5b9@gmail.com CONTACT:

More information

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition

Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Advanced Techniques for Mobile Robotics Location-Based Activity Recognition Wolfram Burgard, Cyrill Stachniss, Kai Arras, Maren Bennewitz Activity Recognition Based on L. Liao, D. J. Patterson, D. Fox,

More information

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness Jun-Hyuk Kim and Jong-Seok Lee School of Integrated Technology and Yonsei Institute of Convergence Technology

More information

The Interestingness of Images

The Interestingness of Images The Interestingness of Images Michael Gygli, Helmut Grabner, Hayko Riemenschneider, Fabian Nater, Luc Van Gool (ICCV), 2013 Cemil ZALLUHOĞLU Outline 1.Introduction 2.Related Works 3.Algorithm 4.Experiments

More information

Quality Measure of Multicamera Image for Geometric Distortion

Quality Measure of Multicamera Image for Geometric Distortion Quality Measure of Multicamera for Geometric Distortion Mahesh G. Chinchole 1, Prof. Sanjeev.N.Jain 2 M.E. II nd Year student 1, Professor 2, Department of Electronics Engineering, SSVPSBSD College of

More information

Haptic presentation of 3D objects in virtual reality for the visually disabled

Haptic presentation of 3D objects in virtual reality for the visually disabled Haptic presentation of 3D objects in virtual reality for the visually disabled M Moranski, A Materka Institute of Electronics, Technical University of Lodz, Wolczanska 211/215, Lodz, POLAND marcin.moranski@p.lodz.pl,

More information

EYE MOVEMENT STRATEGIES IN NAVIGATIONAL TASKS Austin Ducworth, Melissa Falzetta, Lindsay Hyma, Katie Kimble & James Michalak Group 1

EYE MOVEMENT STRATEGIES IN NAVIGATIONAL TASKS Austin Ducworth, Melissa Falzetta, Lindsay Hyma, Katie Kimble & James Michalak Group 1 EYE MOVEMENT STRATEGIES IN NAVIGATIONAL TASKS Austin Ducworth, Melissa Falzetta, Lindsay Hyma, Katie Kimble & James Michalak Group 1 Abstract Navigation is an essential part of many military and civilian

More information

Salient features make a search easy

Salient features make a search easy Chapter General discussion This thesis examined various aspects of haptic search. It consisted of three parts. In the first part, the saliency of movability and compliance were investigated. In the second

More information

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS

BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS KEER2010, PARIS MARCH 2-4 2010 INTERNATIONAL CONFERENCE ON KANSEI ENGINEERING AND EMOTION RESEARCH 2010 BODILY NON-VERBAL INTERACTION WITH VIRTUAL CHARACTERS Marco GILLIES *a a Department of Computing,

More information

Influence of stimulus symmetry on visual scanning patterns*

Influence of stimulus symmetry on visual scanning patterns* Perception & Psychophysics 973, Vol. 3, No.3, 08-2 nfluence of stimulus symmetry on visual scanning patterns* PAUL J. LOCHERt and CALVN F. NODNE Temple University, Philadelphia, Pennsylvania 922 Eye movements

More information

Background Adaptive Band Selection in a Fixed Filter System

Background Adaptive Band Selection in a Fixed Filter System Background Adaptive Band Selection in a Fixed Filter System Frank J. Crosby, Harold Suiter Naval Surface Warfare Center, Coastal Systems Station, Panama City, FL 32407 ABSTRACT An automated band selection

More information

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015

Perception. Introduction to HRI Simmons & Nourbakhsh Spring 2015 Perception Introduction to HRI Simmons & Nourbakhsh Spring 2015 Perception my goals What is the state of the art boundary? Where might we be in 5-10 years? The Perceptual Pipeline The classical approach:

More information

Background Pixel Classification for Motion Detection in Video Image Sequences

Background Pixel Classification for Motion Detection in Video Image Sequences Background Pixel Classification for Motion Detection in Video Image Sequences P. Gil-Jiménez, S. Maldonado-Bascón, R. Gil-Pita, and H. Gómez-Moreno Dpto. de Teoría de la señal y Comunicaciones. Universidad

More information

Content Based Image Retrieval Using Color Histogram

Content Based Image Retrieval Using Color Histogram Content Based Image Retrieval Using Color Histogram Nitin Jain Assistant Professor, Lokmanya Tilak College of Engineering, Navi Mumbai, India. Dr. S. S. Salankar Professor, G.H. Raisoni College of Engineering,

More information

Drum Transcription Based on Independent Subspace Analysis

Drum Transcription Based on Independent Subspace Analysis Report for EE 391 Special Studies and Reports for Electrical Engineering Drum Transcription Based on Independent Subspace Analysis Yinyi Guo Center for Computer Research in Music and Acoustics, Stanford,

More information

High-level aspects of oculomotor control during viewing of natural-task images

High-level aspects of oculomotor control during viewing of natural-task images High-level aspects of oculomotor control during viewing of natural-task images Roxanne L. Canosa a, Jeff B. Pelz a, Neil R. Mennie b, Joseph Peak c a Rochester Institute of Technology, Rochester, NY, USA

More information

Color Constancy Using Standard Deviation of Color Channels

Color Constancy Using Standard Deviation of Color Channels 2010 International Conference on Pattern Recognition Color Constancy Using Standard Deviation of Color Channels Anustup Choudhury and Gérard Medioni Department of Computer Science University of Southern

More information

ECC419 IMAGE PROCESSING

ECC419 IMAGE PROCESSING ECC419 IMAGE PROCESSING INTRODUCTION Image Processing Image processing is a subclass of signal processing concerned specifically with pictures. Digital Image Processing, process digital images by means

More information

Autocomplete Sketch Tool

Autocomplete Sketch Tool Autocomplete Sketch Tool Sam Seifert, Georgia Institute of Technology Advanced Computer Vision Spring 2016 I. ABSTRACT This work details an application that can be used for sketch auto-completion. Sketch

More information

Preparing Remote Sensing Data for Natural Resources Mapping (image enhancement, rectifications )

Preparing Remote Sensing Data for Natural Resources Mapping (image enhancement, rectifications ) Preparing Remote Sensing Data for Natural Resources Mapping (image enhancement, rectifications ) Why is this important What are the major approaches Examples of digital image enhancement Follow up exercises

More information

Improved SIFT Matching for Image Pairs with a Scale Difference

Improved SIFT Matching for Image Pairs with a Scale Difference Improved SIFT Matching for Image Pairs with a Scale Difference Y. Bastanlar, A. Temizel and Y. Yardımcı Informatics Institute, Middle East Technical University, Ankara, 06531, Turkey Published in IET Electronics,

More information

Efficient Target Detection from Hyperspectral Images Based On Removal of Signal Independent and Signal Dependent Noise

Efficient Target Detection from Hyperspectral Images Based On Removal of Signal Independent and Signal Dependent Noise IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 9, Issue 6, Ver. III (Nov - Dec. 2014), PP 45-49 Efficient Target Detection from Hyperspectral

More information

SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS

SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS RADT 3463 - COMPUTERIZED IMAGING Section I: Chapter 2 RADT 3463 Computerized Imaging 1 SECTION I - CHAPTER 2 DIGITAL IMAGING PROCESSING CONCEPTS RADT 3463 COMPUTERIZED IMAGING Section I: Chapter 2 RADT

More information

Color Image Segmentation Using K-Means Clustering and Otsu s Adaptive Thresholding

Color Image Segmentation Using K-Means Clustering and Otsu s Adaptive Thresholding Color Image Segmentation Using K-Means Clustering and Otsu s Adaptive Thresholding Vijay Jumb, Mandar Sohani, Avinash Shrivas Abstract In this paper, an approach for color image segmentation is presented.

More information

Color Space 1: RGB Color Space. Color Space 2: HSV. RGB Cube Easy for devices But not perceptual Where do the grays live? Where is hue and saturation?

Color Space 1: RGB Color Space. Color Space 2: HSV. RGB Cube Easy for devices But not perceptual Where do the grays live? Where is hue and saturation? Color Space : RGB Color Space Color Space 2: HSV RGB Cube Easy for devices But not perceptual Where do the grays live? Where is hue and saturation? Hue, Saturation, Value (Intensity) RBG cube on its vertex

More information

Enhanced image saliency model based on blur identification

Enhanced image saliency model based on blur identification Enhanced image saliency model based on blur identification R.A. Khan, H. Konik, É. Dinet Laboratoire Hubert Curien UMR CNRS 5516, University Jean Monnet, Saint-Étienne, France. Email: Hubert.Konik@univ-st-etienne.fr

More information

Below is provided a chapter summary of the dissertation that lays out the topics under discussion.

Below is provided a chapter summary of the dissertation that lays out the topics under discussion. Introduction This dissertation articulates an opportunity presented to architecture by computation, specifically its digital simulation of space known as Virtual Reality (VR) and its networked, social

More information

Our visual system always has to compute a solid object given definite limitations in the evidence that the eye is able to obtain from the world, by

Our visual system always has to compute a solid object given definite limitations in the evidence that the eye is able to obtain from the world, by Perceptual Rules Our visual system always has to compute a solid object given definite limitations in the evidence that the eye is able to obtain from the world, by inferring a third dimension. We can

More information

CHAPTER 4 LOCATING THE CENTER OF THE OPTIC DISC AND MACULA

CHAPTER 4 LOCATING THE CENTER OF THE OPTIC DISC AND MACULA 90 CHAPTER 4 LOCATING THE CENTER OF THE OPTIC DISC AND MACULA The objective in this chapter is to locate the centre and boundary of OD and macula in retinal images. In Diabetic Retinopathy, location of

More information

Keywords: Data Compression, Image Processing, Image Enhancement, Image Restoration, Image Rcognition.

Keywords: Data Compression, Image Processing, Image Enhancement, Image Restoration, Image Rcognition. Volume 5, Issue 1, January 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Scrutiny on

More information

Anna University, Chennai B.E./B.TECH DEGREE EXAMINATION, MAY/JUNE 2013 Seventh Semester

Anna University, Chennai B.E./B.TECH DEGREE EXAMINATION, MAY/JUNE 2013 Seventh Semester www.vidyarthiplus.com Anna University, Chennai B.E./B.TECH DEGREE EXAMINATION, MAY/JUNE 2013 Seventh Semester Electronics and Communication Engineering EC 2029 / EC 708 DIGITAL IMAGE PROCESSING (Regulation

More information

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES International Journal of Advanced Research in Engineering and Technology (IJARET) Volume 9, Issue 3, May - June 2018, pp. 177 185, Article ID: IJARET_09_03_023 Available online at http://www.iaeme.com/ijaret/issues.asp?jtype=ijaret&vtype=9&itype=3

More information

Classification in Image processing: A Survey

Classification in Image processing: A Survey Classification in Image processing: A Survey Rashmi R V, Sheela Sridhar Department of computer science and Engineering, B.N.M.I.T, Bangalore-560070 Department of computer science and Engineering, B.N.M.I.T,

More information

Image Enhancement in Spatial Domain

Image Enhancement in Spatial Domain Image Enhancement in Spatial Domain 2 Image enhancement is a process, rather a preprocessing step, through which an original image is made suitable for a specific application. The application scenarios

More information

A Proficient Roi Segmentation with Denoising and Resolution Enhancement

A Proficient Roi Segmentation with Denoising and Resolution Enhancement ISSN 2278 0211 (Online) A Proficient Roi Segmentation with Denoising and Resolution Enhancement Mitna Murali T. M. Tech. Student, Applied Electronics and Communication System, NCERC, Pampady, Kerala, India

More information

CS231A Final Project: Who Drew It? Style Analysis on DeviantART

CS231A Final Project: Who Drew It? Style Analysis on DeviantART CS231A Final Project: Who Drew It? Style Analysis on DeviantART Mindy Huang (mindyh) Ben-han Sung (bsung93) Abstract Our project studied popular portrait artists on Deviant Art and attempted to identify

More information

NAVIGATIONAL CONTROL EFFECT ON REPRESENTING VIRTUAL ENVIRONMENTS

NAVIGATIONAL CONTROL EFFECT ON REPRESENTING VIRTUAL ENVIRONMENTS NAVIGATIONAL CONTROL EFFECT ON REPRESENTING VIRTUAL ENVIRONMENTS Xianjun Sam Zheng, George W. McConkie, and Benjamin Schaeffer Beckman Institute, University of Illinois at Urbana Champaign This present

More information

Implicit Fitness Functions for Evolving a Drawing Robot

Implicit Fitness Functions for Evolving a Drawing Robot Implicit Fitness Functions for Evolving a Drawing Robot Jon Bird, Phil Husbands, Martin Perris, Bill Bigge and Paul Brown Centre for Computational Neuroscience and Robotics University of Sussex, Brighton,

More information

PARAMETRIC ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES

PARAMETRIC ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES PARAMETRIC ANALYSIS OF IMAGE ENHANCEMENT TECHNIQUES Ruchika Shukla 1, Sugandha Agarwal 2 1,2 Electronics and Communication Engineering, Amity University, Lucknow (India) ABSTRACT Image processing is one

More information

CS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

CS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University CS534 Introduction to Computer Vision Linear Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters

More information

Image Enhancement using Histogram Equalization and Spatial Filtering

Image Enhancement using Histogram Equalization and Spatial Filtering Image Enhancement using Histogram Equalization and Spatial Filtering Fari Muhammad Abubakar 1 1 Department of Electronics Engineering Tianjin University of Technology and Education (TUTE) Tianjin, P.R.

More information

An Example Cognitive Architecture: EPIC

An Example Cognitive Architecture: EPIC An Example Cognitive Architecture: EPIC David E. Kieras Collaborator on EPIC: David E. Meyer University of Michigan EPIC Development Sponsored by the Cognitive Science Program Office of Naval Research

More information

7Motion Perception. 7 Motion Perception. 7 Computation of Visual Motion. Chapter 7

7Motion Perception. 7 Motion Perception. 7 Computation of Visual Motion. Chapter 7 7Motion Perception Chapter 7 7 Motion Perception Computation of Visual Motion Eye Movements Using Motion Information The Man Who Couldn t See Motion 7 Computation of Visual Motion How would you build a

More information

Effective Iconography....convey ideas without words; attract attention...

Effective Iconography....convey ideas without words; attract attention... Effective Iconography...convey ideas without words; attract attention... Visual Thinking and Icons An icon is an image, picture, or symbol representing a concept Icon-specific guidelines Represent the

More information

THE RELATIVE IMPORTANCE OF PICTORIAL AND NONPICTORIAL DISTANCE CUES FOR DRIVER VISION. Michael J. Flannagan Michael Sivak Julie K.

THE RELATIVE IMPORTANCE OF PICTORIAL AND NONPICTORIAL DISTANCE CUES FOR DRIVER VISION. Michael J. Flannagan Michael Sivak Julie K. THE RELATIVE IMPORTANCE OF PICTORIAL AND NONPICTORIAL DISTANCE CUES FOR DRIVER VISION Michael J. Flannagan Michael Sivak Julie K. Simpson The University of Michigan Transportation Research Institute Ann

More information

The Effect of Opponent Noise on Image Quality

The Effect of Opponent Noise on Image Quality The Effect of Opponent Noise on Image Quality Garrett M. Johnson * and Mark D. Fairchild Munsell Color Science Laboratory, Rochester Institute of Technology Rochester, NY 14623 ABSTRACT A psychophysical

More information

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror

Image analysis. CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror Image analysis CS/CME/BIOPHYS/BMI 279 Fall 2015 Ron Dror A two- dimensional image can be described as a function of two variables f(x,y). For a grayscale image, the value of f(x,y) specifies the brightness

More information

Computer Vision, Lecture 3

Computer Vision, Lecture 3 Computer Vision, Lecture 3 Professor Hager http://www.cs.jhu.edu/~hager /4/200 CS 46, Copyright G.D. Hager Outline for Today Image noise Filtering by Convolution Properties of Convolution /4/200 CS 46,

More information

A Numerical Approach to Understanding Oscillator Neural Networks

A Numerical Approach to Understanding Oscillator Neural Networks A Numerical Approach to Understanding Oscillator Neural Networks Natalie Klein Mentored by Jon Wilkins Networks of coupled oscillators are a form of dynamical network originally inspired by various biological

More information

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster)

Session 2: 10 Year Vision session (11:00-12:20) - Tuesday. Session 3: Poster Highlights A (14:00-15:00) - Tuesday 20 posters (3minutes per poster) Lessons from Collecting a Million Biometric Samples 109 Expression Robust 3D Face Recognition by Matching Multi-component Local Shape Descriptors on the Nasal and Adjoining Cheek Regions 177 Shared Representation

More information

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin

A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION. Scott Deeann Chen and Pierre Moulin A TWO-PART PREDICTIVE CODER FOR MULTITASK SIGNAL COMPRESSION Scott Deeann Chen and Pierre Moulin University of Illinois at Urbana-Champaign Department of Electrical and Computer Engineering 5 North Mathews

More information

Perception. What We Will Cover in This Section. Perception. How we interpret the information our senses receive. Overview Perception

Perception. What We Will Cover in This Section. Perception. How we interpret the information our senses receive. Overview Perception Perception 10/3/2002 Perception.ppt 1 What We Will Cover in This Section Overview Perception Visual perception. Organizing principles. 10/3/2002 Perception.ppt 2 Perception How we interpret the information

More information

CPSC 532E Week 10: Lecture Scene Perception

CPSC 532E Week 10: Lecture Scene Perception CPSC 532E Week 10: Lecture Scene Perception Virtual Representation Triadic Architecture Nonattentional Vision How Do People See Scenes? 2 1 Older view: scene perception is carried out by a sequence of

More information

Computing for Engineers in Python

Computing for Engineers in Python Computing for Engineers in Python Lecture 10: Signal (Image) Processing Autumn 2011-12 Some slides incorporated from Benny Chor s course 1 Lecture 9: Highlights Sorting, searching and time complexity Preprocessing

More information

How the Geometry of Space controls Visual Attention during Spatial Decision Making

How the Geometry of Space controls Visual Attention during Spatial Decision Making How the Geometry of Space controls Visual Attention during Spatial Decision Making Jan M. Wiener (jan.wiener@cognition.uni-freiburg.de) Christoph Hölscher (christoph.hoelscher@cognition.uni-freiburg.de)

More information

Target detection in side-scan sonar images: expert fusion reduces false alarms

Target detection in side-scan sonar images: expert fusion reduces false alarms Target detection in side-scan sonar images: expert fusion reduces false alarms Nicola Neretti, Nathan Intrator and Quyen Huynh Abstract We integrate several key components of a pattern recognition system

More information

PERIMETRY A STANDARD TEST IN OPHTHALMOLOGY

PERIMETRY A STANDARD TEST IN OPHTHALMOLOGY 7 CHAPTER 2 WHAT IS PERIMETRY? INTRODUCTION PERIMETRY A STANDARD TEST IN OPHTHALMOLOGY Perimetry is a standard method used in ophthalmol- It provides a measure of the patient s visual function - performed

More information

Empirical Study on Quantitative Measurement Methods for Big Image Data

Empirical Study on Quantitative Measurement Methods for Big Image Data Thesis no: MSCS-2016-18 Empirical Study on Quantitative Measurement Methods for Big Image Data An Experiment using five quantitative methods Ramya Sravanam Faculty of Computing Blekinge Institute of Technology

More information

Today. Pattern Recognition. Introduction. Perceptual processing. Feature Integration Theory, cont d. Feature Integration Theory (FIT)

Today. Pattern Recognition. Introduction. Perceptual processing. Feature Integration Theory, cont d. Feature Integration Theory (FIT) Today Pattern Recognition Intro Psychology Georgia Tech Instructor: Dr. Bruce Walker Turning features into things Patterns Constancy Depth Illusions Introduction We have focused on the detection of features

More information

Evaluation of perceptual resolution of printed matter (Fogra L-Score evaluation)

Evaluation of perceptual resolution of printed matter (Fogra L-Score evaluation) Evaluation of perceptual resolution of printed matter (Fogra L-Score evaluation) Thomas Liensberger a, Andreas Kraushaar b a BARBIERI electronic snc, Bressanone, Italy; b Fogra, Munich, Germany ABSTRACT

More information

VICs: A Modular Vision-Based HCI Framework

VICs: A Modular Vision-Based HCI Framework VICs: A Modular Vision-Based HCI Framework The Visual Interaction Cues Project Guangqi Ye, Jason Corso Darius Burschka, & Greg Hager CIRL, 1 Today, I ll be presenting work that is part of an ongoing project

More information

DESIGNING AND CONDUCTING USER STUDIES

DESIGNING AND CONDUCTING USER STUDIES DESIGNING AND CONDUCTING USER STUDIES MODULE 4: When and how to apply Eye Tracking Kristien Ooms Kristien.ooms@UGent.be EYE TRACKING APPLICATION DOMAINS Usability research Software, websites, etc. Virtual

More information

Dense crowd analysis through bottom-up and top-down attention

Dense crowd analysis through bottom-up and top-down attention Dense crowd analysis through bottom-up and top-down attention Matei Mancas 1, Bernard Gosselin 1 1 University of Mons, FPMs/IT Research Center/TCTS Lab 20, Place du Parc, 7000, Mons, Belgium Matei.Mancas@umons.ac.be

More information

Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza

Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza Reinventing movies How do we tell stories in VR? Diego Gutierrez Graphics & Imaging Lab Universidad de Zaragoza Computer Graphics Computational Imaging Virtual Reality Joint work with: A. Serrano, J. Ruiz-Borau

More information

FACE RECOGNITION USING NEURAL NETWORKS

FACE RECOGNITION USING NEURAL NETWORKS Int. J. Elec&Electr.Eng&Telecoms. 2014 Vinoda Yaragatti and Bhaskar B, 2014 Research Paper ISSN 2319 2518 www.ijeetc.com Vol. 3, No. 3, July 2014 2014 IJEETC. All Rights Reserved FACE RECOGNITION USING

More information

Image analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror

Image analysis. CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror Image analysis CS/CME/BioE/Biophys/BMI 279 Oct. 31 and Nov. 2, 2017 Ron Dror 1 Outline Images in molecular and cellular biology Reducing image noise Mean and Gaussian filters Frequency domain interpretation

More information

ORIGINAL ARTICLE A COMPARATIVE STUDY OF QUALITY ANALYSIS ON VARIOUS IMAGE FORMATS

ORIGINAL ARTICLE A COMPARATIVE STUDY OF QUALITY ANALYSIS ON VARIOUS IMAGE FORMATS ORIGINAL ARTICLE A COMPARATIVE STUDY OF QUALITY ANALYSIS ON VARIOUS IMAGE FORMATS 1 M.S.L.RATNAVATHI, 1 SYEDSHAMEEM, 2 P. KALEE PRASAD, 1 D. VENKATARATNAM 1 Department of ECE, K L University, Guntur 2

More information

VARIOUS METHODS IN DIGITAL IMAGE PROCESSING. S.Selvaragini 1, E.Venkatesan 2. BIST, BIHER,Bharath University, Chennai-73

VARIOUS METHODS IN DIGITAL IMAGE PROCESSING. S.Selvaragini 1, E.Venkatesan 2. BIST, BIHER,Bharath University, Chennai-73 Volume 116 No. 16 2017, 265-269 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu VARIOUS METHODS IN DIGITAL IMAGE PROCESSING S.Selvaragini 1, E.Venkatesan

More information

Implementation of Barcode Localization Technique using Morphological Operations

Implementation of Barcode Localization Technique using Morphological Operations Implementation of Barcode Localization Technique using Morphological Operations Savreet Kaur Student, Master of Technology, Department of Computer Engineering, ABSTRACT Barcode Localization is an extremely

More information

Remote Sensing. The following figure is grey scale display of SPOT Panchromatic without stretching.

Remote Sensing. The following figure is grey scale display of SPOT Panchromatic without stretching. Remote Sensing Objectives This unit will briefly explain display of remote sensing image, geometric correction, spatial enhancement, spectral enhancement and classification of remote sensing image. At

More information

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw

Figure 1. Artificial Neural Network structure. B. Spiking Neural Networks Spiking Neural networks (SNNs) fall into the third generation of neural netw Review Analysis of Pattern Recognition by Neural Network Soni Chaturvedi A.A.Khurshid Meftah Boudjelal Electronics & Comm Engg Electronics & Comm Engg Dept. of Computer Science P.I.E.T, Nagpur RCOEM, Nagpur

More information

Evolving High-Dimensional, Adaptive Camera-Based Speed Sensors

Evolving High-Dimensional, Adaptive Camera-Based Speed Sensors In: M.H. Hamza (ed.), Proceedings of the 21st IASTED Conference on Applied Informatics, pp. 1278-128. Held February, 1-1, 2, Insbruck, Austria Evolving High-Dimensional, Adaptive Camera-Based Speed Sensors

More information

Image Smoothening and Sharpening using Frequency Domain Filtering Technique

Image Smoothening and Sharpening using Frequency Domain Filtering Technique Volume 5, Issue 4, April (17) Image Smoothening and Sharpening using Frequency Domain Filtering Technique Swati Dewangan M.Tech. Scholar, Computer Networks, Bhilai Institute of Technology, Durg, India.

More information

Haptic control in a virtual environment

Haptic control in a virtual environment Haptic control in a virtual environment Gerard de Ruig (0555781) Lourens Visscher (0554498) Lydia van Well (0566644) September 10, 2010 Introduction With modern technological advancements it is entirely

More information

Wavelet-based Image Splicing Forgery Detection

Wavelet-based Image Splicing Forgery Detection Wavelet-based Image Splicing Forgery Detection 1 Tulsi Thakur M.Tech (CSE) Student, Department of Computer Technology, basiltulsi@gmail.com 2 Dr. Kavita Singh Head & Associate Professor, Department of

More information

Recent Advances in Image Deblurring. Seungyong Lee (Collaboration w/ Sunghyun Cho)

Recent Advances in Image Deblurring. Seungyong Lee (Collaboration w/ Sunghyun Cho) Recent Advances in Image Deblurring Seungyong Lee (Collaboration w/ Sunghyun Cho) Disclaimer Many images and figures in this course note have been copied from the papers and presentation materials of previous

More information

3D Space Perception. (aka Depth Perception)

3D Space Perception. (aka Depth Perception) 3D Space Perception (aka Depth Perception) 3D Space Perception The flat retinal image problem: How do we reconstruct 3D-space from 2D image? What information is available to support this process? Interaction

More information

Computer Vision. Howie Choset Introduction to Robotics

Computer Vision. Howie Choset   Introduction to Robotics Computer Vision Howie Choset http://www.cs.cmu.edu.edu/~choset Introduction to Robotics http://generalrobotics.org What is vision? What is computer vision? Edge Detection Edge Detection Interest points

More information

Non Linear Image Enhancement

Non Linear Image Enhancement Non Linear Image Enhancement SAIYAM TAKKAR Jaypee University of information technology, 2013 SIMANDEEP SINGH Jaypee University of information technology, 2013 Abstract An image enhancement algorithm based

More information

Figure 1.1: Quanser Driving Simulator

Figure 1.1: Quanser Driving Simulator 1 INTRODUCTION The Quanser HIL Driving Simulator (QDS) is a modular and expandable LabVIEW model of a car driving on a closed track. The model is intended as a platform for the development, implementation

More information

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis

Keywords: - Gaussian Mixture model, Maximum likelihood estimator, Multiresolution analysis Volume 4, Issue 2, February 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Expectation

More information

Wide-Band Enhancement of TV Images for the Visually Impaired

Wide-Band Enhancement of TV Images for the Visually Impaired Wide-Band Enhancement of TV Images for the Visually Impaired E. Peli, R.B. Goldstein, R.L. Woods, J.H. Kim, Y.Yitzhaky Schepens Eye Research Institute, Harvard Medical School, Boston, MA Association for

More information