Review Notes 1 CMPSC 390 Visual Computing Spring 2014 Bob Roos http://cs.allegheny.edu/~rroos/cs390s2014 Review Notes Introduction and PixelMath Major Concepts: raster image, pixels, grayscale, byte, color (RGB), human vision (rods and cones), subtractive vs additive color, raster image file types, pixel mapping, linear and affine transformations (special cases rotation; shear or skew; shift/translate), compositing, special effects using mod and floor, nonlinear transformations Basic Definitions A raster image is sometimes called a bitmap image an image defined by an array of dots or pixels, each of which is assigned a grayscale value or a color. A pixel is the basic dot in a raster graphics image. Usually a certain number of bits is reserved for each pixel, typically 8 bits per pixel for a grayscale image or 24 or 32 bits for a color image. A value of 0 indicates the lowest intensity (in grayscale, this is black); a value of 255 indicates the highest intensity (in grayscale, this is white). (Some image-manipulation systems use a 0 1 scale of real numbers.) Special case: a black-and-white image (no grays) requires only a single bit per pixel. Color Colors are usually specified by RGB values (red, green, and blue), each of which usually requires 8 bits for a total of 24 bits per pixel. Some images make use of an additional 8 bits to represent opacity, bringing the total to 32 bits per pixel. Note: 8 bits is usually referred to as a byte. Thus, to list the RGB values of a two-by-two color image would require 12 bytes (no opacity information) or 16 bytes (with opacity information). In computer graphics we use an additive color system red, green and blue light combine (add together) to form colors. Maximum amounts of all three give full-spectrum white light, while R, G, B values of zero result in the absence of any light, i.e., black. Figure 1 shows a few of the basic colors in the RGB system. Why red, green, and blue? Because that is (roughly) how the human eye sorts out colors. Inside the eye, there are structures called cones that are sensitive to color. Although it is a vast oversimplification, it is common to think of the cones as being divided into three types one sensitive to red light, one to green, and one to blue. All of the ten million or so distinct colors that the human eye can distinguish are combinations of the red, green, and blue values detected by Handed out on 12 February 2014 Handout 10
2 Review Notes RGB Values Color (0, 0, 0) black (255, 0, 0) red (0, 255, 0) green (0, 0, 255) blue (255, 255, 0) yellow (red + green) (0, 255, 255) cyan (green + blue) (255, 0, 255) magenta (red + blue) (255, 255, 255) white Figure 1: Additive (RGB) Colors these cones. (Other structures in the eyes, rods, are sensitive to differences in light intensity, but not color.) This is not the same as the color system that is used in, e.g., paint pigments, dyes, etc. These are subtractive systems. Think of starting with an all-white canvas. Applying paint to the canvas causes the painted areas to absorb light of certain frequencies, subtracting it from the light that gets reflected back to our eyes. Laying down more paint subtracts more wavelengths from the reflected light that meets out eyes. The rules for additive and subtractive color mixing are completely different, as are the so-called primary colors in each system. In your art classes in elementary and secondary school you may have learned that the primary colors were red, yellow, and blue and that red plus yellow equals orange, blue plus yellow equals green, etc. As the table in Figure 1 shows, this is not the case for additive color. Raster Image File Formats There are many, many different file formats for raster images. The most primitive is just a plain listing of every pixel s color. Some file extensions related to this type include.ppm,.pgm,.pbm ( portable pixmap, portable graymap, portable bitmap ),.bmp (Microsoft s bitmap format), and sometimes.tiff ( tagged image file format ). For example, Figure 2 shows a color image and the file sizes for various representations. The image is 512 512 pixels, so using 3 bytes per pixel to hold the RGB values, we would need a minimum of 512 512 3 = 786432 bytes. BYTES FILE 786570 peppers.bmp 786447 peppers.ppm 786572 peppers.tiff 512 512 3 = 786432 Figure 2: Formats that are lists of RGB values Another common file format is.gif ( graphics interchange format ), which creates a palette of 256 colors (the palette size can vary, however) and then matches each pixel in the image to the nearest color appearing in the palette. This is ideal for images with few colors such as logos, clip art, Handout 10 Handed out on 12 February 2014
Review Notes 3 drawings, etc. It is not as good for, e.g., photographs, since these usually have a much larger range of color than just 256 values. Gif files are usually very compact, making them ideal for transmission over the Internet. Gif files also can include transparency information, animation (several images stored in the same file and displayed in a continuous loop) and interlacing, a process that permits a file to be downloaded in stages at increasing resolutions. Interlaced files usually show up first as highly pixelated images, but then more details slowly fill in as higher resolution versions replace the low-resolution ones. The Joint Photographic Experts Group created a file format known as.jpeg (also.jpg). This file format uses compression techniques to shrink the file size of the image. However, the compression is lossy some colors are changed during compression and can not be recovered. Nevertheless, it is extremely popular and is often used for photographic files in digital cameras as well as nearly every other type of graphics file that does not require absolutely precise recording of every pixel s color. Jpeg files do not support transparency. The Portable Network Graphics format (.png) is one of the most versatile file formats. It permits lossless compression (unlike.jpg, where the compression is lossy ), allows transparency, and permits interlacing. However, it does not support animation. Figure 3 shows how much space is required for the peppers image of Figure 2 for some of these formats. BYTES FILE 57508 peppers-85.jpg 213423 peppers.gif 505942 peppers.png peppers-85.jpg was created using a quality value of 85% Figure 3: More Compact Formats Figure 4 shows one more example. The image contains exactly four colors and has size 256 256 = 32768 pixels. An enumeration of the RGB values would require 256 256 3 = 196608 bytes. We see that.ppm is pretty close to this. However, the conversions to.bmp,.gif, and.tiff both took into account the fact that there are only four colors. This, plus a bit of simple compression, accounts for the much smaller file sizes. The big surprises, however, are in the sizes of the.jpg and.png files. Through clever compression techniques, the file size has been reduced by a factor of almost 250% for the.jpg file and nearly 575% for the.png file. (It is worth mentioning that, as simple as the image is, the.jpg compression is still lossy the JPEG image blurs the colors at the boundaries between the four regions in the image.) PixelMath PixelMath is a teaching system developed by Steven Tanimoto at the University of Washington. It uses images and image transformations to teach students about the mathematics of transformations (as well as to introduce them to basic concepts of graphics). Handed out on 12 February 2014 Handout 10
4 Review Notes BYTES FILE 196623 fourcolor.ppm 32970 fourcolor.bmp 16721 fourcolor.tiff 1198 fourcolor.gif 790 fourcolor-85.jpg 343 fourcolor.png fourcolor-85.jpg was created using a quality value of 85% Figure 4: A More Extreme Example A typical use of the PixelMath Calculator Tool consists of: opening one (or two) windows containing images to serve as source image(s) opening another window to serve as the destination image typing a formula into the calculator window that represents a mapping from destination pixels back to source pixels The third step may seem backwards the natural way to think of an image transformation is as a mapping from the source to the destination (the words source and destination even suggest it). However, if (x, y) is a pixel location in the destination image, Source1(1.25*x, 1.75*y) means that destination pixel (x, y) is equal to source pixel (1.25x, 1.75y). Figure 5 illustrates this. For example, destination pixel (168, 240) gets mapped to source pixel (1.25 168 = 210, 1.75 240 = 420). (210,420) (220,294) (168,240) (176,168) (424,0) (339,0) Source Destination Figure 5: A Typical PixelMath Transformation Affine and Linear Transformations The simplest kinds of transformations are: Scale: resize the length or width (or both). General form: Handout 10 Handed out on 12 February 2014
Review Notes 5 Source1(s x * x, s y * y) where s x and s y are scaling factors (the source image is s x times as wide as the destination and s y times as high as the destination). Figure 5 illustrates a scaling operation. Translate (also sometimes called shift ): slide the image a certain number of pixels in the x or y (or both) directions. General form: Source1(x + t x, y + t y ) where t x and t y are the shift amounts (the destination image is shifted t x pixels horizontally and t y pixels vertically to obtain the source image). Figure 6 illustrates this. (0,0) (50,75) Source Destination Figure 6: A Translation Operation Shear (also called skew): Tilt the image in the x or y direction (or both). General formula: Source1(x + r x * y, y + r y *x) The constants r x and r y control the amount of the shear. Figure 7 illustrates a shear in both the x and y directions. The horizontal dimension is more skewed than the vertical dimension. All of the examples above are special cases of something called an affine transformation, which is a mapping of the form: Source1(A * x + B * y + C, D * x +E * y + F ) where A, B,..., F are constants. The first lab assignment showed several other examples of affine transformations, in particular, rotations by 90, 180, and 270 degrees, and horizontal and vertical reflections. (Rotations by arbitrary angles are also affine transformations.) A general affine transformation combines elements of scaling, translation, shear, and rotation, so that with a single affine transformation we can rotate an image, shift it, resize it, and tilt it into Handed out on 12 February 2014 Handout 10
6 Review Notes Source Destination Figure 7: Shear (Skew) Operations various parallelogram shapes. We need only three matching points in the source and destination images in order to completely determine the constants in the transformation (the solution to lab 1 illustrates this for rotations). When the constants C and F are both zero, we get a special case of an affine transformation called a linear transformation. Later this semester we will see why this distinction is important. Compositing When we use two source images to create a destination image, we often want to achieve an overlay effect, a sort of semi-transparency in which both images can be seen. This is called compositing. The basic idea is to take a weighted average of pixels in the two images. Figure 8 shows a composite of the Mona Lisa and mandrill images that is 60% Mona Lisa and 40% mandrill. (Note that since the two source images are of different sizes, the composite doesn t quite line up.) Figure 8: Compositing: Source1(x,y)*0.6 + Source2(x,y)*0.4 A weighted average is used so that the resulting RGB values will still be within the allowed range of 0 255. Other weights could be used even if they don t add up to 1, but then some color Handout 10 Handed out on 12 February 2014
Review Notes 7 values may be negative or larger than 255. PixelMath automatically truncates values to the correct range, but the image may not be as desired. Special Operators if, floor, and mod In lab 1 we explored special effects of things like the if... then... else, floor, and mod operators. We most often use floor when we want to group pixels into blocks of a given size. For instance, the expression floor(x/50 results in an integer value that is zero for all x values between 0 and 49, one for all x values between 50 and 99, etc. This enables us to group x values into blocks of size 50. The mod operator allows us to cycle among different formulas based upon an integer value (often a position in the image). For example, if floor(x/50} mod 2 = 0...formula 1... else...formula 2... then switches between formula 1 and formula 2 every 50 horizontal pixels. We can use other divisors to switch among more than two formulas, e.g., use mod 5 to select among five options. Other Transformations In PixelMath, the destination image needn t be based upon a source image pixel values may be computed from other factors such as location of the pixel. Colors from the source image may be modified, exchanged, transformed, etc. Examples: RGB(x,0,0) RGB(2*x-y,3*(x+y) mod 256,x*x - y*y) Green1(x,y) RGB(Red1(y,x), Green1(x+y,y), x) Handed out on 12 February 2014 Handout 10
8 Review Notes Sample Exam Questions This is not meant to be an exhaustive list of possible questions! Instead, it is intended to give you an idea of what kinds of questions might appear on an exam. 1. A pixel has an RGB value of (230, 5, 230). Which of the eight basic colors (consisting only of the values 0 or 255) does this most resemble? (Give the color name, e.g., green or black, not the RGB values.) 2. True or false: the size of an image file is always directly proportional to the number of pixels in the image. 3. Which of the following PixelMath formulas causes the left and right halves of a source image to switch places (e.g., ). (a) S1(xmax-x,y) (b) if floor(xmax/x) mod 2 = 0 then S1(x+xmax/2,y) else S1(xmax/2-x,y) (c) S1(x-xmax/2,y) * 0.5 + S1(x+xmax/2,y) * 0.5 (d) if x < xmax/2 then S1(x+xmax/2,y) else S1(x-xmax/2,y) 4. A pure green pixel is composited with a pure red pixel, with the green pixel weighted by 1/3 and the red pixel weighted by 2/3. What are the RGB values of the resulting pixel? 5. What is the difference between additive and subtractive color systems? Use examples to explain. 6. Which of the following images would be produced by the formula Source1(y,x) (a) (b) (c) (d) 7. Assuming no compression and a range of 0 255 for color values, how many bytes would be needed to store the RGB values of a 10 10-pixel image, assuming: (a) no information about opacity is stored? (b) an opacity value between 0 and 255 is stored for each pixel? 8. Give an example of an image representation that uses lossy compression. Give an example of a representation that uses lossless compression. Handout 10 Handed out on 12 February 2014