Spherical K-Means Color Image Compression Tim Pavlik Features/Functionality This project takes an input image in RGB colorspace and performs K-means clustering, where the number of clusters (N) is specified by the user. The RGB offsets from the selected cluster center center are then converted into spherical coordinate offsets for each pixel. These offsets are then quantized according to the number of partitions specified by the user to create a new image representation using fewer bits than the original 24 bit RGB image. The image is then decompressed/reconstructed to RGB and displayed to show the user how well the compression has performed. The user is also shown RG, RB and GB planar representations of the original/reconstructed pixels as well as the cluster centers. Algorithm Details First, RGB values taken from the input image are input to the Matlab K-means clustering algorithm. The K-means clustering iteratively places each pixel (with an R, G and B value) into one of N clusters (specified by the user) and outputs the final cluster centers as well as which cluster center each pixel was closest to. Fig 1: Exemplary pixels values (black dots) and clusters (red X s and circles) shown in RGB space Next, the pixel offsets are calculated by subtracting their RGB values from the RGB values of their assigned cluster center. The cartesian RGB offsets are then converted to spherical coordinate space with a Radius, Theta, and Phi offset from the assigned cluster center for each pixel. The angles (Theta and Phi) are then quantized according the the number of partitions (Theta Part. and Phi Part.) specified by the user uniformly over their respective ranges: Theta ranges from -π to π while Phi ranges from -π/ 2 to π/2. The radius is then quantized into one of Radius Part. partitions specified by the user, but the radius is quantized according to the range MaxR^(1/ Radius Part. ):MaxR where MaxR is the maximum (3D) distance out of all the pixel distances in the cluster from the cluster center. MaxR is different for each cluster, based on the distance of the greatest outlier in each cluster. The radius offset is then quantized based on exponentially increasing intervals as opposed to uniformly/
linear increasing values such as those used for angle offsets Phi/Theta. For example, the radii may be assigned to one of 2, 4, 8, 16, 32, etc as opposed to 1, 2, 3, 4, 5, etc (details below). So now the image is represented using fixed header information corresponding to the cluster centers in RGB space and the MaxR for each cluster. Each pixel in the image is now represented by a Cluster number, Radius offset, Theta Offset, and Phi Offset. Finally, the RGB values are recomputed and shown to the user to see how well the compressed images corresponds to the original. This is achieved by converting the quantized R, Theta, and Phi offsets into RGB offsets and adding those offsets to the cluster center for the given pixel. The user is also given the number of bits per pixel used in the compressed image as opposed to the original images which requires 24 bits per pixel with 8 bit RGB space. The RG, RB and GB planes are also displayed with the original RGB pixel values (black dots), new quantized RGB pixel values (green dots) and each RGB cluster center (red X s). Design Choices The K-means aspect was chosen because images tend to naturally fall into shades of a given palette of dominant colors, the clustering algorithm iteratively tries to find those dominant colors and exploit that to compress the colorspace. The parameters for the K-means algorithm were set to a maximum of 50 iterations (to shorten the time that the algorithm takes which can be quite time consuming even for medium sized images) and the emptyaction parameter was set to singleton which instructs the algorithm to take the farthest outlier of the other clusters to start a new cluster if an empty cluster forms during any iterations (this reduces the number of far outliers which leads to smaller MaxR values for each cluster and less error in the Radius Offset for each pixel in the cluster). The choice to represent the radii as exponentially increasing from MaxR^(1/ Radius Part. ) to MaxR was made in order to 1) Retain even the greatest outliers in each cluster at MaxR while 2) Realizing that most pixels are probably within a small distance of the cluster center and thus having a number of smaller Radius offsets to work with. Instructions Step 1: Select an input image. These images were chosen to show how well the compression works with different types of images. You can see by comparing the performance of the relatively monochromatic sunset.jpg (with mostly shades of red) with the more difficult rainbow.jpg. The seattle.jpg is slightly more typical with a more balanced blend of colors.
Step 2: Select compression parameters. At first, try using the default parameters (16 Cluster, 8 Radius partitions, 4 Theta Partitions, and 8 Phi partitions). For better quality, try increasing the values (especially the number of clusters for the rainbow.jpg image). For better compression, use smaller values.
Step 3: Press the Compress Colors button to perform the Spherical K-Means Clustering compression using the selected parameters. The script takes a few seconds, so a Loading... message is displayed showing that the script is still working.
Step 4: Observe the outputs. The reconstructed image is shown at the bottom right (Yellow arrow below). The RG, RB, and GB planes are also shown (Blue arrow below) with the original RGB pixel values as black dots, the new reconstructed RGB pixel values as green dots and the computed RGB cluster centers as red X s. Finally, the number of bits needed to represent the image using the compressed scheme is shown (Red oval below) compared to the original 24 bits.
Final Comments The results are impressive, using half the number of bits with the default values (12 bits/pixel as opposed to 24 bits/pixel) the sunset.jpg and seattle.jpg images were very close to their originals. Understandably, the compressed rainbow.jpg image had some more visible artifacts since there are so many subtle changes in colors. When the number of clusters used is increased, these artifacts become less visible. Moving forward, some changes could be attempted to try and improve the results even further: Align Phi/Theta with major/minor axes. Instead of using a uniform distribution of quantized angles Phi and Theta, the covariance/ellipsoid containing the given cluster could be calculated so that the Phi and Theta values can be aligned with the major and minor axes of the calculated ellipsoid. This could beter represent the RGB clusters that tend to form in narrow ellipsoids rather than perfect spheres. Use smaller range of radius offsets. The performance could be improved by instead of using the max radius of each cluster as the upper bound using a smaller value (say two standard deviations from the mean distance) to shrink the range of radii while still representing 95-97% of the values and leaving out the extreme outliers.
Compress each parameter in DCT. This would be the next logical step to compare the performance of this compression with the JPEG compression in RGB. Explore different colorspaces. K-means clustering is often performed in normalized RGB space in computer vision techniques. Perhaps this compression scheme would peform better in this space or any of the other common colorspaces such as CMY, HSV, YUV, YUL, etc.