Matching Words and Pictures

Similar documents
Real-Time Face Detection and Tracking for High Resolution Smart Camera System

Segmentation using Saturation Thresholding and its Application in Content-Based Retrieval of Images

Spatial Color Indexing using ACC Algorithm

Detection of Compound Structures in Very High Spatial Resolution Images

Multiresolution Analysis of Connectivity

Natalia Vassilieva HP Labs Russia

Content Based Image Retrieval Using Color Histogram

Real Time Word to Picture Translation for Chinese Restaurant Menus

Adaptive Feature Analysis Based SAR Image Classification

Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval

DESIGN & DEVELOPMENT OF COLOR MATCHING ALGORITHM FOR IMAGE RETRIEVAL USING HISTOGRAM AND SEGMENTATION TECHNIQUES

Method for Real Time Text Extraction of Digital Manga Comic

COLOR IMAGE SEGMENTATION USING K-MEANS CLASSIFICATION ON RGB HISTOGRAM SADIA BASAR, AWAIS ADNAN, NAILA HABIB KHAN, SHAHAB HAIDER

Autocomplete Sketch Tool

Image Extraction using Image Mining Technique

Fast pseudo-semantic segmentation for joint region-based hierarchical and multiresolution representation

An Efficient Color Image Segmentation using Edge Detection and Thresholding Methods

COMPARATIVE PERFORMANCE ANALYSIS OF HAND GESTURE RECOGNITION TECHNIQUES

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

Main Subject Detection of Image by Cropping Specific Sharp Area

A new method to recognize Dimension Sets and its application in Architectural Drawings. I. Introduction

Checkerboard Tracker for Camera Calibration. Andrew DeKelaita EE368

Large Scale Topic Detection using Node-Cut Partitioning on Dense Weighted-Graphs

University of Bristol - Explore Bristol Research. Peer reviewed version Link to published version (if available): /ISCAS.1999.

Real Time Video Analysis using Smart Phone Camera for Stroboscopic Image

Supervisors: Rachel Cardell-Oliver Adrian Keating. Program: Bachelor of Computer Science (Honours) Program Dates: Semester 2, 2014 Semester 1, 2015

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

Mel Spectrum Analysis of Speech Recognition using Single Microphone

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

Association Rule Mining. Entscheidungsunterstützungssysteme SS 18

Sabanci-Okan System at Plant Identication Competition

Locating the Query Block in a Source Document Image

Semantic Localization of Indoor Places. Lukas Kuster

An Improved Method of Computing Scale-Orientation Signatures

3D Face Recognition System in Time Critical Security Applications

Move Evaluation Tree System

A SURVEY ON HAND GESTURE RECOGNITION

A Fast Segmentation Algorithm for Bi-Level Image Compression using JBIG2

CS534 Introduction to Computer Vision. Linear Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

Image Forgery Detection Using Svm Classifier

A Review : Fast Image Retrieval Based on Dominant Color Feature

Color Constancy Using Standard Deviation of Color Channels

The Fribourg Product Image Database for Product Identification Tasks

3D-Assisted Image Feature Synthesis for Novel Views of an Object

FACE RECOGNITION USING NEURAL NETWORKS

Visual Interpretation of Hand Gestures as a Practical Interface Modality

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Performance Analysis of Color Components in Histogram-Based Image Retrieval

An Algorithm for Fingerprint Image Postprocessing

Australian Journal of Basic and Applied Sciences

Study Impact of Architectural Style and Partial View on Landmark Recognition

Book Cover Recognition Project

Kalman Filtering, Factor Graphs and Electrical Networks

Image binarization techniques for degraded document images: A review

Student Attendance Monitoring System Via Face Detection and Recognition System

Computer Log Anomaly Detection Using Frequent Episodes

arxiv: v1 [cs.lg] 2 Jan 2018

Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval

Finding Text Regions Using Localised Measures

A Comparison of Particle Swarm Optimization and Gradient Descent in Training Wavelet Neural Network to Predict DGPS Corrections

Quality Measure of Multicamera Image for Geometric Distortion

Enhancement of Speech Signal Based on Improved Minima Controlled Recursive Averaging and Independent Component Analysis

Issues in Color Correcting Digital Images of Unknown Origin

Motion Detector Using High Level Feature Extraction

Extraction and Recognition of Text From Digital English Comic Image Using Median Filter

Lecture5: Lossless Compression Techniques

AUTOMATED MALARIA PARASITE DETECTION BASED ON IMAGE PROCESSING PROJECT REFERENCE NO.: 38S1511

Image Searches, Abstraction, Invariance : Data Mining 8 September 2008

Automatic Segmentation and Indexing in a Database of Bird Images

A moment-preserving approach for depth from defocus

Industrial computer vision using undefined feature extraction

Lane Detection in Automotive

Combined Approach for Face Detection, Eye Region Detection and Eye State Analysis- Extended Paper

Colour Based People Search in Surveillance

Visual Search using Principal Component Analysis

AUTOMATED MUSIC TRACK GENERATION

Digital image processing vs. computer vision Higher-level anchoring

Onset Detection Revisited

Google Newspaper Search Image Processing and Analysis Pipeline

Research Seminar. Stefano CARRINO fr.ch

White Intensity = 1. Black Intensity = 0

Multi-Image Deblurring For Real-Time Face Recognition System

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Vehicle License Plate Recognition System Using LoG Operator for Edge Detection and Radon Transform for Slant Correction

Truthing for Pixel-Accurate Segmentation

A novel feature selection algorithm for text categorization

IDENTIFICATION OF SIGNATURES TRANSMITTED OVER RAYLEIGH FADING CHANNEL BY USING HMM AND RLE

Color: Readings: Ch 6: color spaces color histograms color segmentation

License Plate Localisation based on Morphological Operations

Auto-tagging The Facebook

Color. Used heavily in human vision. Color is a pixel property, making some recognition problems easy

Image Searches, Abstraction, Invariance : Data Mining 2 September 2009

Color. Used heavily in human vision. Color is a pixel property, making some recognition problems easy

Automatic Licenses Plate Recognition System

Robust Hand Gesture Recognition for Robotic Hand Control

Bogdan Smolka. Polish-Japanese Institute of Information Technology Koszykowa 86, , Warsaw

SCIENCE & TECHNOLOGY

Transmission the Image by Multi-Label Image Annotation

CHAPTER-4 FRUIT QUALITY GRADATION USING SHAPE, SIZE AND DEFECT ATTRIBUTES

CS295-1 Final Project : AIBO

Transcription:

Matching Words and Pictures Dan Harvey & Sean Moran 27th Feburary 2009 Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 1 / 40

1 Introduction 2 Preprocessing Segmentation Feature extraction 3 Multi-Modal Hierarchical Aspect Model Getting technical Annotating Images Searching Images Model Applications 4 Evaluation Methods Experiments Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 2 / 40

Outline 1 Introduction 2 Preprocessing Segmentation Feature extraction 3 Multi-Modal Hierarchical Aspect Model Getting technical Annotating Images Searching Images Model Applications 4 Evaluation Methods Experiments Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 3 / 40

Motivation Images are a core part of the modern world Recent explosion in number of images being captured and shared: Number of images on internet estimated to be in excess of 15x10 10 Global annual sales: 1x10 8 digital cameras and 3x10 8 camera phones Newspaper archives, picture libraries, etc maintain huge private collections Great interest in how we can analyse images to ensure ease of search and browsing Automatic matching of words to pictures is a potentially huge growth area Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 4 / 40

Matching words to pictures Interesting application of multi-modal data mining Two main types: Auto-annotation: predict annotation of images using all information present Correspondence: associate particular words with particular image substructures Focus on auto-annotation in this presentation Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 5 / 40

Automatic Image Annotation Two main philosophies [9],[10]: Block-based: Segment images and apply statistical models to those segmented regions Most common approach in the literature eg: CRM model of Lavrenko et al [11] Machine translation model of Duygulu et al [12] Global-feature based: Bypass segmentation stage and model global image statistics directly eg: Robust non-parametric model of Yavlinksy et al [10] Core issues for any approach: 1 Representation: How to represent image features? 2 Learning: How to form the classifier from training data? 3 Annotation: How to use the classifier for novel image annotation? Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 6 / 40

Statistical Machinery Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 7 / 40

Key Challenges Semantic Gap Lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation [4] Nature of Images Image understanding is one of the most complex challenges in AI [5] Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 8 / 40

Scale Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 9 / 40

Occlusion Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 10 / 40

Auto-Annotation Applications Three core applications: 1 Content Based Image Retrieval (CBIR) - retrieve images based on actual image content 2 Browsing Support - provide user with an easy way of browsing similar items 3 Auto-illustration - suggest pictures that might go well with surrounding text Large disparity between user needs and what technology supplies eg: Query: Feature is about deodorant so person should look active, not sweaty but happy, carefree - nothing too posed or set up - nice and natural looking [6] Response: I m Sorry, Dave I m Afraid I Can t Do That :-) Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 11 / 40

Google Image Search Google uses filenames, surrounding text and ignores contents of the images hence the poor retrieval results eg purple flowers with green leaves : Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 12 / 40

Imensecom PictureSearch The Imense CBIR (wwwimensecom) engine takes into account the actual image content: Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 13 / 40

Outline 1 Introduction 2 Preprocessing Segmentation Feature extraction 3 Multi-Modal Hierarchical Aspect Model Getting technical Annotating Images Searching Images Model Applications 4 Evaluation Methods Experiments Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 14 / 40

Preprocessing: How to represent an image? Native dimension of images is too high Resolution 481x321 = 154,401 pixels Each pixel has 3 attributes R, G, B with 255 possible values That s half a million attributes! Find different regions by segmentation Extract features to describe each region Region and features together are known as a blob Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 15 / 40

Segmentation into regions Normalised Cuts (Shi and Malik, 2000) Complete graph with pixels as vertices Weights on edges based feature similarity eg Intensity, Colour value Recursively apply minimum cut, normalised by the number of edges cut Segmentation occasionally produces small unstable regions Pick 8 largest regions for feature extraction Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 16 / 40

Geometric Features Size Proportion of region area to image area Position Normalised coordinates of centre of mass Shape 1 Ratio of region area to perimeter squared 2 Moment of inertia about centre of mass 3 Ratio of region area to convex hull Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 17 / 40

Other Features Colour Represented by average and standard deviation of :- 1 (R, G, B) Representing physical colour 2 (L, a, b) Lightness, colour-opponent space Representing human vision 3 Chromaticity coordinates Measures the quality of a colour R r = R + G + B Texture 1 4 difference of Gaussian filters 2 12 oriented filters at 30 degree increments g = G R + G + B Not the only features but a good selection! (1) Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 18 / 40

Outline 1 Introduction 2 Preprocessing Segmentation Feature extraction 3 Multi-Modal Hierarchical Aspect Model Getting technical Annotating Images Searching Images Model Applications 4 Evaluation Methods Experiments Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 19 / 40

Multi-Modal Hierarchical Aspect Model Generative hierarchical model, combining Aspect model with a soft clustering model (Barnard & Forsyth 2001) [6][7][8]: Aspect model: Models joint distribution of documents (sequence of words and image blobs) and features Soft clustering model: Maps documents into clusters Images and words generated by a fixed hierarchy of nodes: Leaves of the hierarchy correspond to clusters Each node has some probability of generating each word (modelled as a Multinomial distribution) Each node also has some probability of generating an image segment (modelled as a Gaussian distribution) Images belonging to a cluster are generated by the nodes along the path from the leaf to the root Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 20 / 40

Generative nature of the Model Modelling data as being generated by the nodes along a path For example, if the sunset image is in the 3rd cluster its words and blobs are modeled by the nodes along the indicated path: Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 21 / 40

Generative nature of the Model Nodes close to the root are shared by many clusters and emit items shared by a large number of data elements Nodes closer to leaves are shared by few clusters and emit items specific to small number of data elements Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 22 / 40

Getting technical A document (blobs, words) is modelled by a sum over the clusters weighted by the probability that the document is in the cluster Generating a set of observations D (blobs, words) for a document d: P(D d) = P(c) ( ) P(i l, c)p(l c, d) c i D i Where: c indexes clusters, i indexes items, and l indexes levels P(i l, c) = probability of item (segment or word) in node #of items from node in document P(l c, d) = #of document items #of document items in cluster P(c, d) = #of document items d P(c, d) P(c) = #of total documents Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 23 / 40

Applying the model to annotate images Need to calculate the probability that an image emits a proposed word, given the observed blobs, B or P(w B) Way to think about this conceptually: Consider the probability of the items belonging to the current cluster Consider the probability of the items being generated by the nodes at various levels in the path associated to the current cluster Work the above out for all clusters Mathematically: P(w B) ( = ) ( ) P(w c, l)p(l c, B) P(b l, c)p(l c) P(c) c l b B l Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 24 / 40

Applying the model to search images Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 25 / 40

Applying the model to search images Need to calculate the probability that a document generates a Query or P(Q d): P(Q d) = c ( ) P(q l, c)p(l c, d) P(c) q Q l Documents with a high score for P(Q d) are returned to the user Soft query system: all words do not have to occur in each image returned Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 26 / 40

Applying the model to browse images Browsing from coarse to fine granularity using tree structure: Ocean Dolphins Whales Corals and so on Ocean Dolphins Tale Head and so on Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 27 / 40

Outline 1 Introduction 2 Preprocessing Segmentation Feature extraction 3 Multi-Modal Hierarchical Aspect Model Getting technical Annotating Images Searching Images Model Applications 4 Evaluation Methods Experiments Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 28 / 40

How to evaluate annotation performance? Compare to annotated images, not used for training Show non-trival learning (sky, water) common (tiger) uncommon Performance relative to empirical word frequency Quality of words predicted -ve worse, +ve better E model KL E KL = 1 N = 1 K w observed (E empirical KL data log p(w) p(w B) E model KL ) Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 29 / 40

Performance Measurements Word prediction measure Loss function, 0 all or nothing, 1 correct, -1 compliment E model NS = r n w N n E NS = ENS model Simpler word prediction measure 0 bad, 1 good E model PR E empirical NS = r n Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 30 / 40

Experiments Data set Corel image data set, 160 CD s each on a specific topic eg Aircrafts Sample of 80 CD s, 75% training set, 25% test set Remaining images were a more difficult held out set Exclude words with a frequency less than 20, vocabulary of 155 words 10 iterations of the training algorithm Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 31 / 40

Experiments Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 32 / 40

Clustering performance Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 33 / 40

Precision - Recall: Comparison Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 34 / 40

Results Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 35 / 40

Results Methods which use image clustering are very reliant on having images which are close to the training data Test set performed better than the novel held out set Performs well clustering simular images Less frequent and unseen blobs have lower performance Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 36 / 40

Conclusions Matching words to pictures is a form of multi-modal data mining Pre-process by segmenting images into feature vectors Predict words for novel images by calculating P(word image) Multi-Modal Hierarchical Aspect Model could annotate, search and browse image collections Model showed good performance on test set Less well on the held out set Exciting progress has been made, but much more work to be done! Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 37 / 40

References 1 J Jeon, V Lavrenko and R Manmatha (2003) Automatic Image Annotation and Retrieval using Cross-Media Relevance Models In Proceedings of the 26th Intl ACM SIGIR Conf, pages 119126, 2003 2 K Barnard and D Forsyth (2003) Learning the Semantics of Words and Pictures Proc International Conference on Computer Vision, pp II:408-415, 2001 3 T Hofmann Learning and representing topic A hierarchical mixture model for word occurrence in document databases Proc Workshop on learning from text and the web, CMU, 1998 4 AWM, Smeulders, M Worring, S Santini, A Gupta, R Jain: Content based image retrieval at the end of the early years IEEE Trans PAMI, 22 (2000) 1349-1380 Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 38 / 40

References 5 M Sonaka, V Hlavac, R Boyle Image Processing, Analysis, and Machine Vision Brooks/Cole Publishing, Pacific Grove, CA, 2nd Edition, 1999 6 K Barnard, P Duygulu, N de Freitas, D Forsyth, D Blei, and M I Jordan Matching words and pictures Journal of Machine Learning Research, 3:11071135, 2003 7 K Barnard, P Duygulu and D A Forsyth Clustering art In IEEE Conf on Computer Vision and Pattern Recognition, II: 434-441, 2001 8 K Barnard and D A Forsyth Learning the semantics of words and pictures In Int Conf on Computer Vision, pages 408-15, 2001 Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 39 / 40

References 9 X Qi and Y Han Incorporating multiple SVMs for automatic image annotation, Pattern Recognition, vol 40, pp 728-741, 2007 10 A Yavlinsky, E Schofield, and S Ruger Automated image annotation using global features and robust nonparametric density estimation, Int l Conference on Image and Video Retrieval, Singapore, 2005 11 V Lavrenko, R Manmatha, and J Jeon A model for learning the semantics of pictures In Proceedings of the 16th Conference on Advances in Neural Information Processing Systems NIPS, 2003 12 P Duygulu, K Barnard, N de Freitas, and D Forsyth Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary In Seventh European Conference on Computer Vision, volume 4, pages 97-112, 2002 Dan Harvey & Sean Moran (DME) Matching Words and Pictures 27th Feburary 2009 40 / 40