Anti aliasing and Graphics Formats Eric C. McCreath School of Computer Science The Australian National University ACT 0200 Australia ericm@cs.anu.edu.au
Overview 2 Nyquist sampling frequency supersampling area sampling filtering Raw image formats PNG JPEG
Nyquist sampling frequency 3 Nyquist Shannon sampling theorem states: If a function x(t) contains no frequencies higher than B hertz, it is completely determined by giving its ordinates at a series of points spaced 1/(2B) seconds apart. C. E. Shannon, "Communication in the presence of noise", Proc. Institute of Radio Engineers, 37:1, pages 10 21, 1949. Patterns which have 'high' frequency changes in colour are problematic in Computer Graphics. Note the sharp changes in object boundaries are also 'high' frequency parts of an image. The problem caused by under sampling is called aliasing.
Super Sampling 4 Super sampling is an anti aliasing approach that calculates intensities at sub pixel grid positions and uses the average of these intensities to determine the pixel intensity. Super sampling is very costly both in terms of memory and processing requirements. One approach for reducing this cost is to use adaptive supersampling. Adaptive supersampling only supersamples pixels that are on boundaries. There are a number of variations on the sub pixel position for obtaining samples, these include: Grid Random Poisson disk Jitter Rotated Grid
Area Sampling 5 Rather than supersampling it is possible to calculate pixel intensity by working out the overlapping area within pixels of parts of the scene. The proportion of area within a pixel is used to calculate a weighted average over the contributing colours.
Filtering 6 Aliasing problems can be addressed by applying some form of filter to the image. A common filter to use is a Gaussian blur. Gaussian blur is achieved by convolving a 2D Gaussian function with the image. This has the effect of replacing an image intensity with that of the average intensity of surrounding pixels. g x, y = 1 x 2 y 2 2 2 e 2 2 f new x, y = f old x, y g x, y = f u, v g x u, y v du dv Gaussian function Convolution operator
Raw Image Formats 7 Raw Image Formats are lossless formats that store data that closely maps the sensors/pixels of the camera/display device. These will often be in either RGB or YUV colour spaces. The meta data will include information like: resolution, byte/bit ordering, the number of bits per intensity, colour space used, palette (if one is used), etc Often raw images will be 2 6 times larger than compressed formats like jpeg. The below is an example of an 3x3 image with 8 bit RGB intensity values. p0 p1 p2 p3 p4 p5 p6 p7 p8 p0 p0 p0 p1 p1 p1 p2 p2 p2... Data will be 3*3*3 = 27 bytes long
PNG and JPEG 8 PNG is a lossless data compression format for storing bitmap images. JPEG is a lossy data compression format also for storing bitmap images. Raw(RGB 8bit/ch) 117k PNG 18k Storage size an quality of a 200x200 image JPEG (q=2) 1k JPEG (q=10) 2k JPEG (q=90) 10k
JPG 9 JPEG is great for photos with textures and smooth changing colour, however, it is not as good for text, icons, or line drawing with sharp changing colours. JPEG files are made up of a sequence of segments divided by markers these indicate the type of the next segment. ericm@ericm desktop:~/courses/cg/notes/formats$ od c SimpleImage2.jpeg 0000000 377 330 377 340 \0 020 J F I F \0 001 001 001 \0 H 0000020 \0 H \0 \0 377 376 \0 023 C r e a t e d 0000040 w i t h G I M P 377 333 \0 C \0 377 377 0000060 377 377 377 372 377 377 377 377 377 377 377 377 377 377 377 377 0000100 377 377 377 377 377 377 377 377 377 377 377 377 377 377 377 377 * Marker to indicate the start of a comment. In hex they are bytes FF FE
JPG 10 Images would normally be converted to YCbCr and the croma components downsampled. Y = 0.299 R + 0.587 G + 0.114 B Cb = 0.1687 R 0.3313 G + 0.5 B + 128 Cr = 0.5 R 0.4187 G 0.0813 B + 128 From JPEG File Interchange Format V1.02, Eric Hamilton C Cube Microsystem, 1992 Y Cb Cr
JPG 11 Y, Cb, and Cr are divided into 8x8 blocks, these blocks are then transformed into a 'frequence domain' (a DCT). This produces an 8x8 block of numbers representing a linear combination of the different frequencies. These are quantised by dividing by constant values and rounded (this is the step which governs compression and quality). Lossless Huffman encoding is used for storing the bit length and runs of zeros of this zigzag sequence of quantised values. From http://en.wikipedia.org/wiki/file:dctjpeg.png public domain From http://en.wikipedia.org/wiki/file:jpeg_zigzag.svg, Alex Khristov Public Domain
PNG 12 PNG uses lossless data compression to store bitmap information. PNG includes RGB, RGBA, and greyscale colour spaces. Either a pallet or intensity channel approach is used. Image data first undergoes pre compression filter and then the DEFLATE (combining LZ77 and Huffman) compression algorithm is used.
PNG 13 PNG files start with a unique file header. After the header is a number of 'chunks' of data. These 'chunks' contain: length, type, data, and crc parts. Chunks have length(4 bytes), type (4 bytes), data(variable), crc (4 bytes). Key chunks in PNG are: IHDR header info; PLTE pallet information; IDAT image data; IEND end marker. ericm@ericm desktop:~/courses/cg/notes/formats$ od c SimpleImage.png head 0000000 211 P N G \r \n 032 \n \0 \0 \0 \r I H D R 0000020 \0 \0 \0 310 \0 \0 \0 310 \b 006 \0 \0 \0 255 X 256 0000040 236 \0 \0 \0 001 s R G B \0 256 316 034 351 \0 \0 0000060 \0 006 b K G D \0 377 \0 251 \0 = 275 346 V \r 0000100 \0 \0 \0 \t p H Y s \0 \0 \v 023 \0 \0 \v 023 0000120 001 \0 232 234 030 \0 \0 \0 \a t I M E \a 333 \n 0000140 033 \0 6 226 025 A 005 \0 \0 \0 031 t E X t 0000160 C o m m e n t \0 C r e a t e d 0000200 w i t h G I M P W 201 016 027 \0 \0 0000220 \0 I D A T x 332 354 275 y 220 $ 347 Y 356 373
PNG 14 The IHDR chunk appears first and contains (from RFC2083): Width: 4 bytes Height: 4 bytes Bit depth: 1 byte Color type: 1 byte Compression method: 1 byte Filter method: 1 byte Interlace method: 1 byte ericm@ericm desktop:~/courses/cg/notes/formats$ od t x1 SimpleImage.png head 0000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 0000020 00 00 00 c8 00 00 00 c8 08 06 00 00 00 ad 58 ae
Video Formats 15 There is a large number of video file formats. The format is governed by a standard(s) which enables programs to know how to decode and encode the video/audio stream. Most formats can be broken up into 3 main parts these parts are somewhat independent of each other. They are: the container format, the video stream format, and the audio stream format. Raw video is big, really big. This means compression is important, fortunately because of the spacial and temporal characteristics of videos they compresses very well. Take a 5 min video at 800x600 resolution 25fps, RGB with 8 bits per channel, 16bit audio sampled 44000Hz. If we stored the raw data the video part would take: 5 * 60 * 25 * 3 * 800 * 600 = 10GiB The audio would take: 5 * 60 * 44000 * 2 = 25MiB However we should be able to compress this down to ~50MiB maintaining reasonable quality.
Video Formats 16 The two most prominent video standards are MPEG 2 which includes DVD format, and MPEG 4 which dominates video on the web. These standards come from the Moving Picture Experts Group (MPEG). The difficulty with these as standards is that they are encumbered with patents. So in countries that have patents on software when you buy the software that encodes or decodes these formats the vendor should pay a licence fee (the vendor will pass this cost on). The alternatives are WebM which is a royalty free video file format and ogg video which has avoided approaches that make use of known patents. The difficulty with these as standards is there is the treat of patents that will pop up and format would be at the mercy of the patent holder.
Video Formats 17 Container formats include: Avi, FLV, MPEG 4 Part 12, Ogg, Matroska Video stream formats include: Theora, VP8, VP9, H.262/MPEG 2 Part 2, H.264 Audio formats include: MP3, Vorbis, FLAC, ACC
Theora 18 The Theora uses the following compression approaches: Y'CbCr colour spaces are used for representing image data, this has 1 lumma channel, and 2 chroma channels. Subsampling of 4:4:4, 4:2:2, or 4:2:0 is used. Channels are broken up into 8x8 blocks of values and DCT with quantization is used. These are store using using a zigzag and Huffman encoding. Frames are either: Intra which can be decode without other reference frames or Inter which uses motion vectors of macroblocks on the previous frame and last Intra frame to predict the current frame. The difference between the predicted and the actual image is the residual which is stored using the quantized DCT approach.