CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu inwogu@buffalo.edu
Today Logistics Schedule Introductions What is computer vision? Why is vision so hard?
Prerequisites This course is appropriate for students with these essential prerequisites: A good working knowledge of MATLAB programming (or willingness and time to pick it up quickly!) Linear algebra Vector calculus The course does not assume prior imaging experience, computer vision, image processing, or graphics
Text Optional Strongly recommended
Matlab Problem sets and projects will involve Matlab programming. Matlab runs on all the CSE lab Windows and UNIX systems. CSE 473/573 students can use their existing accounts in CSE labs, and can request new CSE accounts (for any non CSE majors) Any issues with CSE lab machines and accounts should be forwarded to cse-consult@cse.buffalo.edu
Grading There will be four components to the course grade Approximately ten short online quizzes Three programming assignments Comprehensive mid-term exam Final project (including evaluation of final report) Class participation is strongly encouraged to offset any negative performance in any of the above components.
Final project Significant implementation of a technique related to the course content Teams of 2 encouraged (document each role!) CVPR-type review article (no teams) Two components (if you implement a project different from what is assigned) : proposal document (no more than 2 pages) final write-up with results (no more than 8 pages CVPR style)
Course goals Values of computer vision to society Principles of image formation Convolution and image pyramids Feature detection, matching and alignment Motion estimation Visual recognition Machine learning models in vision Temporal models
Course goals Values of computer vision to society Principles of image formation Convolution and image pyramids Feature detection, matching and alignment Motion estimation Visual recognition Machine learning in vision Temporal models
What is computer vision? What does it mean, to see? to know what is where by looking. How to discover from images what is present in the world, where things are, what actions are taking place. Computing properties of the 3D world from visual data (measurement) Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation) from Marr, 1982
The problem Want to make a computer understand images We know it is possible we do it effortlessly! Real world scene Sensing device Interpreting device Interpretation A person/ A smiling person/ Dr. Ifeoma Nwogu/ etc.
NSF Frontiers in computer vision workshop, 2011
Current state of the art The next slides show some examples of what current vision systems can do
Earth viewers (3D modeling) Image from Microsoft s Virtual Earth (see also: Google Earth)
Photosynth http://labs.live.com/photosynth/ Based on Photo Tourism technology developed by Noah Snavely, Steve Seitz, and Rick Szeliski
Photo Tourism overview Scene reconstruction Input photographs Relative camera positions and orientations Point cloud Sparse correspondence Photo Explorer System for interactive browsing and exploring large collections of photos of a scene. Computes viewpoint of each photo as well as a sparse 3d model of the scene.
Optical character recognition (OCR) Technology to convert scanned docs to text If you have a scanner, it probably came with OCR software Digit recognition, AT&T labs http://www.research.att.com/~yann/ License plate readers http://en.wikipedia.org/wiki/automatic_number_plate_recognition
Face detection Many new digital cameras now detect faces Canon, Sony, Fuji,
Smile detection? Sony Cyber-shot T70 Digital Still Camera
Object recognition (in supermarkets) LaneHawk by EvolutionRobotics A smart camera is flush-mounted in the checkout lane, continuously watching for items. When an item is detected and recognized, the cashier verifies the quantity of items that were found under the basket, and continues to close the transaction. The item can remain under the basket, and with LaneHawk,you are assured to get paid for it
Face recognition Who is she?
Vision-based biometrics How the Afghan Girl was Identified by Her Iris Patterns Read the story
Login without a password Fingerprint scanners on many new laptops, other devices Face recognition systems now beginning to appear more widely http://www.sensiblevision.com/
Object recognition (in mobile phones) Microsoft Research Point & Find, Nokia SnapTell.com (now amazon)
Snaptell Amazon acquires SnapTell WSJ 2009
Sports Sportvision first down line Nice explanation on www.howstuffworks.com
Sports Brief explanation on how hawk-eye works can be found here
Smart cars Mobileye Vision systems currently in high-end BMW, GM, Volvo models Back-up camera requirement for all new cars and light trucks Video demo
Vision-based interaction (and games) Digimask: put your face on a 3D avatar. Nintendo Wii has camera-based IR tracking built in. See Lee s work at CMU on clever tricks on using it to create a multi-touch display! Game turns moviegoers into Human Joysticks, CNET Camera tracking a crowd, based on this work.
Medical imaging 3D imaging MRI, CT Image guided surgery Grimson et al., MIT
Course goals Values of computer vision to society Principles of image formation Convolution and image pyramids Feature detection, matching and alignment Motion estimation Visual recognition Machine learning in vision Temporal models
Structure of light Left) scene illuminated with a ceiling lamp. Right) the two Images on the right have been obtained by illuminating the scene with a laser pointer. On each image, the red arrow indicates the approximate direction of the light beam produced by pointer.
Why is vision so hard?
The structure of ambient light
The structure of ambient light
The Plenoptic Function Adelson & Bergen, 91 The intensity P can be parameterized as: P (θ, φ, t, λ, X, Y, Z) The complete set of all convergence points constitutes the permanent possibilities of vision. Gibson
Why is vision so hard?
Measuring light vs. measuring scene properties by Roger Shepard ( Turning the Tables ) Depth processing is automatic, and we can not shut it down
Copyright A.Kitaoka 2003
Measuring light vs. measuring scene properties
Measuring light vs. measuring scene properties
Why is vision so hard?
Some things have strong variations in appearance
Why is vision so hard?
Related disciplines Graphics Image processing Artificial intelligence Computer vision Algorithms Machine learning Cognitive science
Again, what is computer vision? Mathematics of geometry of image formation? Statistics of the natural world? Models for neuroscience? Engineering methods for matching images? Science Fiction? Ans: All of the above and more.
amusement park sky The Wicked Twister Cedar Point Ferris wheel ride Lake Erie ride 12 E water ride tree tree Objects Activities Scenes Locations Text / writing Faces Gestures Motions Emotions people waiting in line people sitting on ride umbrellas tree deck bench carousel tree maxair pedestrians Goal of computer vision is to write computer programs that can interpret images
Slide Credits Darrell Trevor UC Berkeley Antonio Torralba MIT Vision Group Rob Fergus NYU Vision, Learning and Graphics group
Next class Readings for today: Szeliski, Ch. 1 Overview on linear algebra in the context of optimization techniques
Questions
Physical parameters of image formation Geometric Type of projection Camera pose Optical Sensor s lens type focal length, field of view, aperture Photometric Type, direction, intensity of light reaching sensor Surfaces reflectance properties
Next class More on Image Formation Readings for today: Szeliski, Ch. 1 Readings for next lecture: Szeliski 2.1-2.3.1, Forsyth and Ponce 1.1, 1.4 (optional).