CENG 595 Selected Topics in Computer Engineering Computer Vision. Zafer ARICAN, PhD

CENG 595 Selected Topics in Computer Engineering Computer Vision Zafer ARICAN, PhD

Today Administrivia What is Computer Vision? Why is it a difficult problem? State-of-the art Brief course syllabus

Instructor Zafer ARICAN, PhD zafer.arican@turktelekom.com.tr PhD, Ecole Polytechnique Federale de Lausanne, Switzerland Team Leader at Turk Telekom R&D Thesis: Analysis and Processing of Omnidirectional Images in a Spherical Framework

My Research at TT R&D Augmented Reality SDK

Course Prerequisities Data structures Familiarity with linear algebra Familiarity with probability A good working knowledge of C++ programming (or willingness and time to pick it up quickly) Knowledge of machine learning and image processing is plus

Textbook Primary textbook Richard Szeliski, Computer Vision: Algorithms & Applications Online at : http://www.szeliski.org/book Secondary textbooks Forsyth, David A., and Ponce, J. Computer Vision: A Modern Approach, Prentice Hall, 2003. Hartley, R. and Zisserman, A. Multiple View Geometry in Computer Vision, Academic Press, 2002.

Grading Programming Problem Sets (30%) Will be announced during each related lecture 4-5 homeworks Final Project (70%) Programming 15 Min. Presentation A technical report Strong class participation Not mandatory but can offset negative performance

OpenCV Strong Computer Vision Library Mainly in C++ Interfaces for C, C++, Python, Java Supports Windows, Linux, Mac OS, ios and Android Will be used for programming homeworks and final project Web site: http://opencv.org/ Reference version: 2.4.3

What is Computer Vision? What do you see? Which city? What is there in the photo? Which one in the front? Which one behind? People standing or walking?

What we see What a computer sees

What is Computer Vision? Computer vision is automatic understanding of What is present Where things are What actions happened from a picture, a series of pictures or videos. Mainly Measurement (3D shape and pose) Perception and interpretation (Recognition of people, objects, scenes and activities)

What is Computer Vision NOT? Image processing : Images to Images image enhancement, image restoration, image compression. Take an image and process it to produce a new image which is, in some way, more desirable Comp. Photograpy: Images to Images extending the capabilities of digital cameras through the use of computation to enable the capture of enhanced or entirely novel images of the world Computer Graphics: Models to Images Rendering of 2D images from models and scenes in a 3D virtual world.

Related Disciplines

Why is computer vision difficult? Viewpoint variation Illumination Scale

Why is computer vision difficult? Intra-class variation Motion (Source: S. Lazebnik) Background clutter Occlusion

Why is computer vision difficult? An image is a projection of world An under-constrained problem

Why is computer vision difficult? Vision is an amazing feat of natural intelligence Visual cortex occupies about 50% of Macaque brain More human brain devoted to vision than anything else

Source: S. Lazebnik Why study computer vision? Millions of images being captured all the time Lots of useful applications The next slides show the current state of the art

6 billion 6E+09 Flickr 5 billion 5E+09 4 billion 4E+09 3 billion 3E+09 2 billion 2E+09 1 billion 1E+09 0 12.15.2003 12.15.2004 12.15.2005 12.15.2006 12.15.2007 12.15.2008 12.15.2009

Other photo sharing sites 50 billion 40 billion 30 billion 20 billion 10 billion

and growing Flickr: > 1.7 million photos / day Facebook: > 100 million photos / day (as of February 2010) YouTube: > 35 hours of video every minute (as of November 2010) ~ 57 billion photos will be taken (US) in 2010 http://windowsteamblog.com/windows_live/b/windowslive/archive/2010/04/09/what-to-do-with-57-billion-photos.aspx (compare with ~17 billion negatives exposed in 1996)

Optical character recognition (OCR) If you have a scanner, it probably came with OCR software Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/automatic_number_plate_reco Sudoku grabber http://sudokugrab.blogspot.com Automatic check processing Source: S. Seitz

Face detection Many new digital cameras now detect faces Canon, Sony, Fuji, Source: S. Seitz

Smile detection? Sony Cyber-shot T70 Digital Still Camera Source: S. Seitz

Face recognition

Face recognition Who is she? Source: S. Seitz

Vision-based biometrics How the Afghan Girl was Identified by Her Iris Patterns Read the story Source: S. Seitz

Login without a password Fingerprint scanners on many new laptops, other devices Face recognition systems now beginning to appear more widely http://www.sensiblevision.com/ Source: S. Seitz

Object recognition (in supermarkets) LaneHawk by EvolutionRobotics A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it Source: S. Seitz

Object recognition (in mobile phones) This is becoming real: Microsoft Research Point & Find Source: S. Seitz

iphone Apps: (www.kooaba.com) Source: S. Lazebnik

Google Goggles

Special effects: shape capture The Matrix movies, ESC Entertainment, XYZRGB, NRC Source: S. Seitz

Special effects: motion capture Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz

Special effects: camera tracking Boujou, 2d3

Sports Sportvision first down line Nice explanation on www.howstuffworks.com Source: S. Seitz

Smart cars Mobileye Vision systems currently in high-end BMW, GM, Volvo models By 2010: 70% of car manufacturers. Sources: A. Shashua, S. Sei

Vision-based interaction (and games) Sony EyeToy Assistive technologies Nintendo Wii has camera-based IR tracking built in. See Lee s work at CMU on clever tricks on using it to create a multi-touch display! Xbox Kinect

Vision in space NASA'S Mars Exploration Rover Spirit captured this westward view from atop a low plateau where Spirit spent the closing months of 2007. Vision systems (JPL) used for several tasks Panorama stitching 3D terrain modeling Obstacle detection, position tracking For more, read Computer Vision on Mars by Matthies et al. Source: S. Seitz

Robotics NASA s Mars Spirit Rover http://en.wikipedia.org/wiki/spirit_rover Autonomous RC Car http://www.cs.cornell.edu/~asaxena/rccar/

Medical imaging 3D imaging MRI, CT Image guided surgery Grimson et al., MIT Source: S. Seitz

Augmented Reality Google AR Glasses Lego AR Booth artt SDK

Syllabus (Tentative) 2. Image formation o Camera& Optics o Light & Color 3. Image Filtering o Linear Filters o Image Sampling 4. Grouping and fitting o Segmentation o Hough transform o Contour detection o Image transformations 5. Multiple view geometry o Camera model o Homography and image warping o Epipolar geometry o Camera calibration 6. 3D reconstruction o Stereo o Sparse depth estimation o Dense depth estimation 7. Feature Detection& Matching o Corners & Blobs o Descriptors o Matching 8. Recognition o Introduction to recognition o Face detection o Bag-of-words models 9. Motion o Optical Flow o Tracking o Video processing