Wavelet-Based Multiresolution Matching for Content-Based Image Retrieval Te-Wei Chiang 1 Tienwei Tsai 2 Yo-Ping Huang 2 1 Department of Information Networing Technology, Chihlee Institute of Technology, No. 313, Sec. 1, Wunhua Rd., Banciao City, Taipei County 220, Taiwan, R.O.C. 2 Department of Computer Science and Engineering, Tatung University, No.40, Sec. 3, Zhongshan N. Rd., Taipei City 104, Taiwan, R.O.C. Abstract In this paper, a content-based image retrieval method based on the wavelet transform is proposed. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then Y component of the image is further transformed to the wavelet domain. After the th level wavelet decomposition, a series of low-pass with different resolution level are obtained. In the image retrieving phase, the system compares these of the query image and those of the images in the database and find out good matches. Through the combination of the features with different resolution level, we can not only obtain the better precision rate but also the good reduction rate. We have performed experiments on a thousand images database and the results support our approach. Keywords: Content-based image retrieval, query-by-example, wavelet transform, color space.
1. Introduction With the enormous increase in recent years in the number of image databases available online, and the consequent need for better techniques to access this information, there has been a strong resurgence of interest in the research done in the area of image retrieval. Generally, image retrieval procedures can be roughly divided into two approaches: query-by-text (QbT) and query-by-example (QbE). In QbT, queries are texts and targets are images; in QbE, queries are images and targets are images. For practicality, images in QbT retrieval are often annotated by words, such as time, place, or photographer. To access the desired image data, the seeer can construct queries using homogeneous descriptions, such as eywords, to match these annotations. Such retrieval is nown as annotation-based image retrieval (ABIR). ABIR has the following drawbacs. First, manual image annotation is time-consuming and therefore costly. Second, human annotation is subjective. Furthermore, some images could not be annotated because it is difficult to describe their content with words. On the other hand, annotations are not necessary in a QbE setting, although they can be used. The retrieval is carried out according to the image contents. Such retrieval is nown as content-based image retrieval (CBIR) [1]. CBIR becomes popular for the purpose of retrieving the desired images automatically. Smeulder et al. [2] reviewed more than 200 references in this field. In QbE, the retrieval of images basically has been done via the similarity between the query image and all candidates on the image database. To evaluate the similarity between two images, one of the simplest ways is to calculate the Euclidean distance between the feature vectors representing the two images. To obtain the feature vector of an image, some transform type feature extraction techniques can be applied, such as wavelet [3], Walsh, Fourier, 2-D moment, DCT [4-6], and Karhunen-Loeve. In our image retrieval scheme, the wavelet transform is used to extract low-level texture features due to its superiority in multiresolution analysis and spatial-frequency localization. We hope that the feature sets derived from the wavelet transform can reduce the processing time without sacrificing the retrieving accuracy. In this paper, we propose a content-based image retrieval method based on the wavelet transform. In the image database establishing phase, each image is first transformed from the standard RGB color space to the YUV space; then Y component of the image is further transformed to a series of low-pass with different resolution level. In the image retrieving phase, the system compares these wavelet of the query image and those of the images in the database and find out good matches. The remainder of this paper is organized as follows. The next section is the related wors. Section 3 presents the proposed image retrieval system. Experimental results are shown in Section 4. Finally, conclusions are drawn in Section 5. 2. Related Wors Content-based image retrieval is a technology to search for similar images to a query based only on the image pixel representation. However, the query based on pixel information is quite time-consuming because it is necessary to devise a means of describing the location of each pixel and its intensity. Therefore, how to choose a suitable color space and reduce the data to be computed is a critical problem in image retrieval. Some of the systems employ color histograms. The histogram measures are only dependent on summations of identical pixel values and do not incorporate orientation and position. In other words, the histogram is only statistical distribution of the colors and loses the local information of the image. Therefore, we propose an image retrieval scheme to retrieve images
Figure 1. The proposed system architecture. from their transform domain, which tries to reduce data and still retains their local information. In this paper, we focus on the QbE approach. The user gives an example image similar to the one he/she is looing for. Finally, the images in the database with the smallest distance to the query image will be given, raning according to their similarity. We can define the QbE problem as follows. Given a query image Q and a database of images X 1, X 2,, X n, find the image X i closest to Q. The closeness is to be computed using a distance measuring function D(Q, X n ) which will be defined in Section 3.3. In the next section we shall introduce our image retrieval scheme. 3.1 System Architecture 3. The Proposed Image Retrieval System Figure 1 shows the system architecture of our wavelet-based QbE system. This system contains two major modules: the feature extraction module and the distance measuring module. The details of each module are introduced in the following subsections. 3.2 Feature Extraction Classifying an unnown input is a fundamental problem in pattern recognition. A common method is to define a distance function between the features of patterns and find the most similar pattern in the reference set. That is, features are functions of the measurements performed on a class of patterns that enable that class to be distinguished from other classes in
the same general category [7]. To have an effective retrieval, we have to extract distinguishable and reliable features from the images. During the feature extraction process, the images have to be converted to the desired color space. There exist many models through which to define the valid colors in image data. Each of the following models is specified by a vector of values, each component of that vector being valid on a specified range. This presentation will cover the following major color spaces definitions [8]: RGB (Red, Green, and Blue), CMYK (Cyan, Magenta, Yellow, and Blac Ke, CIE (Centre International d Eclairage), YUV (Luminance and Chroma channels), etc. In our approach, the RGB images are first transformed to the YUV color space. 3.2.1 RGB Color Space A gray-level digital image can be defined to be a function of two variables, f(x,, where x and y are spatial coordinates, and the amplitude f at a given pair of coordinates is called the intensity of the image at that point. Every digital image is composed of a finite number of elements, called pixels, each with a particular location and a finite value. Similarly, for a color image, each pixel (x, consists of three components: R(x,, G(x,, and B(x,, each of which corresponds to the intensity of the red, green, and blue color in the pixel, respectively. 3.2.2 YUV Color Space Originally used for PAL (European "standard") analog video, YUV is based on the CIE Y primary, and also chrominance. The Y primary was specifically designed to follow the luminous efficiency function of human eyes. Chrominance is the difference between a color and a reference white at the same luminance. The following equations are used to convert from RGB to YUV spaces: Y ( x, = 0.299 R( x, + 0.587 G( x, + 0.114 B( x,, (1) U ( x, = 0.492 ( B( x, Y ( x, ), (2) V ( x, = 0.877 ( R( x, Y ( x, ). (3) After converting from RGB to YUV, the features of each image can be extracted by the wavelet transform. 3.2.3 Wavelet Transform An image of size M N can be decomposed into its wavelet coefficients by using Mallat s pyramid algorithm [9]. Mathematically, it can be describes as the following recursive equations [10]: LL LH ( ) ( 1) ( m, = [[ LLrows * H ] * H ] 2 1 columns 1 2 m = 1, K, M / 2 ; n = 1, K, N / 2 ( ) ( 1) ( m, = [[ LL rows * H ] * G] 2 1 columns 1 2, (4),, (5)
LL (0) LL (1) LH (1) HL (1) HH (1) (2) LL (2) HL (2) LH (2) HH LH (1) HL (1) HH (1) (a) (b) (c) Figure 2. Illustration of the wavelet decomposition. (a) Illustration of LL (0) ; (b) Illustration of the 1st level decomposition; (c) Illustration of the 2nd level decomposition. (a) (b) (c) Figure 3. A sample image to illustrate the wavelet decomposition. (a) Original image; (b) 1st level decomposition of the image; (c) 2st level decomposition of the image. HL HH 1 m = M / 2 + 1, K, M / 2 ; n = 1, K, N / 2 ( ) ( 1) ( m, = [[ LLrows * G] * H ] 2 1 columns 1 2 m = M m = 1, K, M / 2 ; n = N / 2 ( ) ( 1) ( m, = [[ LLrows * G] * G] 2 1 columns 1 2 / 2 + 1, K, M / 2 1 ; n = N,, (6) 1 + 1, K, N / 2, / 2, (7) 1 + 1, K, N / 2. Here LL, LH, HL, and HH represent four of the image being decomposed. L and H are used to indicate low- and high- frequency components. H andg correspond to the lowpass and the high-pass filters, respectively. Expression 2 1(1 2) denotes sampling along column (row), and is the level of wavelet decomposition. Equations (4)-(7) indicate that any image signal can be decomposed in a specific wavelet domain. The wavelet decomposition is illustrated in Figure 2. LL (0) is the original image. The output of high-pass filters LH (1), HL (1), HH (1) are three with the same size as low-pass subimage LL (1), and presenting different image details in different directions. Figure 3 shows a sample image to illustrate the wavelet decomposition. After wavelet decomposition, the object image energy is distributed in different subbands, each of which eeps a specific frequency component. In other words, each subband image contains one feature. Intuitively, the feature at different subbands can be distinguished more easily than that in the original image. 3.3 Distance Measures A distance (or similarit measure is a way of ordering the features from a specific query point. The retrieval performance of a feature can be significantly affected by the distance
measure used. Ideally we want to use a distance measure and feature combination that gives best retrieval performance for the collection being queried. In our experimental system, we define a measure called the sum of squared differences (SSD) to indicate the degree of distance (or dissimilarit. Since the lowest frequency wavelet subband, LL (), is the most significant subband for th-level wavelet decomposition, we can retrieve the desired image(s) by comparing the LL () subband of the candidate images with that of the query image. Assume ( ) that LL ( ) ( m, and LL ( m, n represent the wavelet coefficients of the query image Q and q ) x n image Xn under LL (), respectively. Then the distance between Q and X n under the LL () subband can be defined as D LL ( ) ( ) ( LL ( m, LL ( m, ) 2 ( ) ( Q, X ) =. (8) n m n Based on the observation that LL (1), LL (2),, LL () correspond to different resolution levels of the original image, the retrieving accuracy may be improved to consider these at the same time. Therefore, the distance between Q and X n can be modified as the weighted combination of these : q D( Q, X =, (9) n ) w D ( ) ( Q, X LL n ) m where w is the weight of the distance under the th resoluiton level. x n 4. Experimental Results The development of effective retrieval techniques has been the core of information retrieval research (IR) for more than 30 years. A number of measures of effectiveness have been proposed, but the most frequently mentioned are recall and precision. To evaluate the retrieval efficiency of the proposed method, we use the performance measure, the precision rate as shown in Equation (10), R Precision rate Tr where R r is the number of relevant retrieved items, and T r is the number of all retrieved items. r =, (10) In this preliminary experiment, 1000 images downloaded from the WBIIS [11] and the SIMPLIcity [12] database are used to demonstrate the effectiveness of our system. The user can query by an external image or an image from the database. The difference between these two options is that when an external image is used its features need to be extracted while if the image is already in the database, its features are already extracted and stored in the database along the image. Therefore, when a query submitted using an image from the database, only the index of the image in the database is transferred to the server. In both cases the features used are imposed by the database selected at the start. In this experiment, we only use the images from the database as query images. To verify the effectiveness of our approach, we conducted a series of experiments using different type of feature sets, i.e., Y-component image, Y-component LL (1) subimage, Y- component LL (2) subimage,, Y-component LL (6) subimage, and some combinations of these
Figure 4. Retrieved results via the comparison of the original RGB images.. In our experiments, a butterfly (as shown in Figure 3) is used as the query image. The size of the image is 85 128. The retrieved results are raned in the top ten in similarity. Figure 4 - Figure 13 show the retrieved results, where the items are raned in the ascending order of the distance to the query image from the left to the right and then from the top to the bottom. Figure 4 shows the GUI of our system and the retrieved results by using the SSD distance between the original RGB bitmap of the query image and those of the images in the image database. Figure 5 shows the retrieved results by using the SSD distance between the Y-component of the same query image and those of the images in the image database. Note that the number of features of the Y-component image is one-third of that of the original RGB image; therefore, there exists a two-third reduction of the size of feature set. Figure 6 show the retrieved results using the SSD distance between the LL (1) subimage of the Y-component of the query image and those of the images in the image database. Note that the number of features of the LL (1) subimage of the Y-component image is one-fourth of that of the Y- component image; therefore, there exists a three-fourth reduction of the size of feature set. Figure 7 Figure 11 show the retrieved results by using the SSD distance between the LL (2) - LL (6) of the Y-component of the query image and those of the images in the image database. Table 1 shows the precision rate and the reduction rate of the test image under the Y- component low-pass with different resolution level. Table 2 and Table 3 show the precision rate and the reduction rate of the test image under some combination of these Y- component low-pass. From Table1-Table 3, we can find that the best precision rate, i.e. 0.8, occurs when the combination of the LL (2) and LL (6), or the combination of the LL (3), LL (4) and LL (6) are applied for distance measuring. In our experiments, the were combined with equal weights. Figure 12 shows the retrieved results based on the combination
of the Y-component LL (2) and LL (3). Figure 13 shows the retrieved results based on the combination of the Y-component LL (3), LL (4) and LL (5). Among the two combinations, the latter is better in terms of the reduction rate. Therefore, through the combination of the Y-component LL (3), LL (4) and LL (5), we can not only obtain the best precision rate but also the good reduction rate. Based on above observations, we can conclude that the combination of the features with different resolution level may be better than any single feature in terms of the precision rate. 5. Conclusions The goal of CBIR is to provide the user with a way to retrieve images from large image collections, based on visual similarity. In this paper, we propose a CBIR method based on the color space transformation and the wavelet transform. Each image is first transformed from the standard RGB color space to the YUV space; then Y component of the image is further transformed to the wavelet domain. The wavelet transform is applied to extract low-level features from the images due to its superiority in multiresolution analysis and spatialfrequency localization. To achieve QbE, the system compares the most significant wavelet of the Y-component of the query image and those of the images in the database and find out good matches. We find that through the combination of the features with different resolution level, we can not only obtain the better precision rate but also the good reduction rate. Since only preliminary experiment has been made to test our approach, a lot of wors should be done to improve this system: Since several features may be used simultaneously, it is necessary to develop a scheme that can integrate the similarity scores resulting from the matching processes. A long-term aim is combining the semantic annotations and low-level features to improve the retrieval performance. That is, the retrieved images should be somehow related to the objects contained in the scenes. Reference [1] Gudivada, V. and Raghavan, V. (1995), "Content-Based Image Retrieval Systems," IEEE Computers, 28(9), pp. 18-22. [2] Smeulders, A. W. M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000), "Content- Based Image Retrieval at the End of the Early Years," IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(12), pp. 1349-1380. [3] Liang, K.-C. and Kuo, C. C. (1999), "WaveGuide: A Joint Wavelet-Based Image Representation and Description System," IEEE Trans. on Image Processing, 8(11), pp. 1619-1629. [4] Huang, X.-Y., Zhang, Y.-J., and Hu, D. (2003), "Image Retrieval Based on Weighted Texture Features Using DCT Coefficients of JPEG Images," Proc. 4th Pacific Rim Conf. on Multimedia, Informaiton, Communications and Siginal Processing, pp. 1571-1575. [5] Huang, Y.-L. and Chang, R.-F.(1999), "Texture Features for DCT-Coded Image Retrieval
and Classification," Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 3013-3016. [6] Bae, H.-J. and Jung, S.-H. (1997), "Image Retrieval Using Texture Based on DCT," Proc. of Int. Conf. on Information, Communications and Signal Processing, Singapore, pp. 1065-1068. [7] Nadler, M. and Smith, E. P. (1993), Pattern Recognition Engineering, New Yor: Wesley Interscience. [8] Dunn, S. (1999), Digital Color. http://davis.wpi.edu/~matt/courses/color/. [9] Huang, L. and Huang, X. (2001), "Multiresolution Recognition of Offline Handwritten Chinese Characters with Wavelet Transform," Proc. Int. Conf. Document Analysis and Recognition, pp. 631-634. [10] Mallat, S. G. (1989), "A theory for multiresolution signal decomposition: the wavelet representation," IEEE Trans. on Pattern Analysis and Machine Intelligence, 11(7), pp. 674-693. [11] Wang, J. Z. (1996), Content Based Image Search Demo Page. http://bergman.stanford. edu/~zwang/project/imsearch/wbiis.html. [12] Wang, J. Z., Li, J., and Wiederhold, G. (2001), "SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries," IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(9), pp. 947-963.
Table 1. Precision rate and reduction rate of the test image under the Y-component low-pass with different resolution level. Original RGB image Y- Component image LL (1) subimage LL (2) subimage LL (3) subimage LL (4) subimage LL (5) subimage LL (6) subimage Precision rate 0.7 0.7 0.7 0.7 0.6 0.6 0.5 0.5 Number of coefficients Reduction rate 32640 (85 128 3) 10880 (85 128) 2752 (43 64) 704 (22 32) 176 (11 16) 48 (6 8) 12 (3 4) 4 (2 2) 0 0.6667 0.9157 0.9784 0.9946 0.9985 0.9996 0.9999 Table 2. Precision rate and reduction rate of the test image under some combination of the Y-component low-pass with different resolution level. LL (2) + LL (3) LL (2) + LL (4) LL (2) + LL (5) LL (2) + LL (6) LL (2) + LL (3) + LL (4) LL (2) + LL (3) + LL (4) + LL (5) LL (2) + LL (3) + LL (4) + LL (5) + LL (6) Precision rate 0.8 0.6 0.6 0.4 0.7 0.7 0.5 Number of coefficients 880 (22 32+ 11 16) 752 (22 32+6 8) 716 (22 32+3 4) 708 (22 32+2 2) 928 (22 32+11 16 +6 8) 940 (22 32+11 1 6+6 8+3 4) 944 (22 32+ 11 16+6 8+3 4+2 2) Reduction rate 0.9730 0.9770 0.9781 0.9783 0.9716 0.9712 0.9711 Table 3. Precision rate and reduction rate of the test image under some combination of the Y-component low-pass with different resolution level. (Cont.) LL (3) + LL (4) LL (3) + LL (5) LL (3) + LL (6) LL (3) + LL (4) + LL (5) LL (4) + LL (5) LL (5) + LL (6) LL (4) + LL (5) + LL (6) Precision rate 0.7 0.7 0.5 0.8 0.7 0.5 0.5 Number of coefficients 224 (11 16+6 8) 188 (11 16+3 4) 180 (11 16+2 2) 236 (11 16+ 6 8+3 4) 60 (6 8+3 4) 16 (3 4+2 2) 64 (6 8+3 4+ 2 2) Reduction rate 0.9931 0.9942 0.9945 0.9928 0.9982 0.9995 0.9980
Figure 5. Retrieved results based on the Y-component images. Figure 6. Retrieved results based on the Y-component LL (1). Figure 7. Retrieved results based on the Y-component LL (2). Figure 8. Retrieved results based on the Y-component LL (3).
Figure 9. Retrieved results based on the Y-component LL (4). Figure 10. Retrieved results based on the Y-component LL (5). Figure 11. Retrieved results based on the Y-component LL (6). Figure 12. Retrieved results based on the combination of the Y-component LL (2) and LL (3).
Figure 13. Retrieved results based on the combination of the Y-component LL (3), LL (4), and LL (5).