GRADIET MAGITUDE SIMILARITY DEVIATIO O MULTIPLE SCALES FOR COLOR IMAGE QUALITY ASSESSMET Bo Zhang, Pedro V. Sander, Amine Bermak, Fellow, IEEE Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong Hamad Bin Khalifa University, Education City, Doha, Qatar ABSTRACT Recently, various image quality assessment (IQA) metrics based on gradient similarity have been developed. In this paper, we extend the work of gradient magnitude similarity deviation (GMSD) and propose a more efficient metric. First, a novel similarity index is proposed, which gives the flexibility to tune the masking parameter to more closely match the human vision system (HVS). Then, we propose a multiscale GMSD method by incorporating scores of luminance distortion at different scales. Furthermore, a method for measuring chromatic distortions in YIQ color space based on our metric is proposed. The final IQA index, MS-GMSD c, is obtained by combining luminance and chrominance scores. Experimental results on four comprehensive datasets clearly show that, compared with 1 state-of-the-art IQA methods, our method achieves the best performance for both grayscale and chromatic image assessment. Index Terms Image Quality Assessment (IQA), Multiscale, Chromatic Distortion, Gradient Magnitude Similarity 1. ITRODUCTIO Image quality assessment (IQA) has become an important issue in image processing tasks, such as image compression, restoration, transmission and enhancement. In the past decade, various IQA methods have been developed which consist with subjective evaluation efficiently. Generally, there are two categories of full reference IQA methods. The first category of IQA indices are developed based on the properties of the human visual system (HVS) [1 5], which attempt to mimic the processing stages of the early vision. However, psychological properties of HVS are still far from fully understood, making it difficult for IQA methods to model the HVS accurately. On the other hand, the IQA methods in the second category operate efficiently based on the overall characteristics of HVS that vision are highly adapted for extracting structural information [ 13]. The well-known structure similarity index measure (SSIM) [] falls into this category, and tends to perceive the local structures for distortion evaluation. A multi-scale extension of SSIM (MS-SSIM) [] assesses the SSIM at different scales and produces much better performance than its single scale counterpart. Inspired by the success of the SSIM that utilizes structure information, latest methods turn to study the distortions of the image gradient [9, 11], because the gradient can implicitly magnify the distortions of local structures. Among gradient based IQA methods, the recently proposed gradient magnitude similarity deviation (GMSD) [9] produces the final score by pooling the gradient similarity map with standard deviation. This contentadaptive pooling method is quite efficient and makes GMSD competitive among various IQA methods. However, GMSD only measure the distortions in single scale and cannot measure chromatic distortions. More details of this method will be introduced in Section. In order to propose a better IQA metric, in Section 3.1 we introduce a degree of masking term into the similarity index, so that the measure of gradient similarity is more consistent with the HVS. Then we propose a multi-scale GMSD (MS-GMSD) in Section 3., which gives better flexibility in evaluating different viewing conditions and thus yields better assessment for luminance distortions. Further, Section 3.3 introduces a method for evaluating chromatic distortions in YIQ space. The final chromatic multi-scale GMSD (MS-GMSD c ) is proposed by combining the luminance and chrominance assessments. In order to demonstrate that multi-scale evaluation and the chrominance term are both effective, we compare MS-GMSD and MS-GMSD c individually with other methods (Section ). Comprehensive results show that our proposed method outperforms other 1 state-of-the-art methods on assessing both grayscale and chromatic images.. RELATED WORK The image gradient plays an important role in human vision system (HVS) and can reflect both contrast and structure information. In the work of gradient magnitude of similarity deviation (GMSD) [9], horizontal and vertical gradients are first calculated for both the distorted image d and the reference image r by convolving Prewitt filter along the two directions. We denote the computed directional gradient by G x,p and G y,p, where p represents the image index. Then the image gradient maps of r and d at the pixel location i are calculated 97-1-59-117-/17/$31. 17 IEEE 153 ICASSP 17
as: G r (i) = G d (i) = G x,r (i) + G y,r (i) (1) G x,d (i) + G y,d (i) () With the gradient magnitude maps G r and G d, the map of gradient magnitude similarity (GMS) is obtained through a pixel-wise computation: GMS(i) = G r(i)g d (i) + c G r (i) + G d (i) + c where c serves as the numerical stability term. The map of gradient magnitude similarity is capable of measuring the distortion level for each pixel: gradient distortion is more severe at locations where there is less gradient similarity. Unlike typical IQA metrics which usually give the final assessment score by averaging the similarity map, the final score of GMSD is proposed by computing the standard deviation of the GMS map: GMSD = (3) (GMS(i) i=1 GMSM) () where GMSM denotes the mean value of the GMS map. Because the pooling method can reflect the variation of the local quality degradation and is local content adaptive, it makes GMSD very consistent to human opinion scores (). 3. MULTI-SCALE GMSD AD COLOR EXTESIO 3.1. Similarity Index with Masking Control Since the SSIM index [] was introduced, quite a few IQA indices including GMSD calculate the similarity index with the Dice index, which has the form of (ab + c)/(a + b + c). The Dice index can exhibit the masking ability and qualitatively conforms to the HVS characteristic: the difference of physical quantities (such as luminance and contrast) become less perceptible when these quantities increase. However, the degree of masking in the Dice index is fixed and may deviate from the optimal value in HVS. Therefore, we introduce one more parameter to allow tuning the degree of masking. In the case of gradient similarity, the index in Eq. (3) becomes: GMS(i) = G r(i)g d (i) αg r (i)g d (i) + c G r (i) + G d (i) αg r (i)g d (i) + c where α is the masking degree coefficient and α [, ]. The degree of masking is smaller when α gets larger. In the special case of α =, there is no masking for large gradients. Because Eq. (5) provides the complete control of the degree of masking, it is expected to correlate with HVS more accurately under a proper parameter setting. (5) Scale M Scale 1 Scale chrom = I + Q GMSDM(Gr, Gd)... GMSD1(Gr, Gd) GMSD(Gr, Gd) Reference image r Overall IQA score: MS-GMSDc = γ MS-GMSD + (1-γ) chrom For grayscale only: MS-GMSD Distorted image d Fig. 1: Multiscale GMSD and its extension to chromatic IQA, MS-GMSD c. 3.. Multiscaled GMSD The subjective assessment of image quality may vary depending on observing distance and a good IQA index should be able to assess the image quality at different scales []. In this work, we incorporate the idea of multi-scale assessment, and examine it on the gradient magnitude similarity deviation. The multi-scaled GMSD (MS-GMSD) method is illustrated in Fig. 1. Both reference image r and distorted image d are iteratively downscaled by half on each dimension, forming image pyramids which contain a set of images with lower resolution. The original scale has index, and the downscaled images have indices 1 through M. At each scale we calculate GMSD using Eq. () and (5), and the GMSD score at j-th scale is denoted as σ j (r, d) (j =, 1,.., M). Then the overall multi-scale GMSD score is calculated as: MS GMSD = M i= w jσ j (r, d) () where w j controls the weight of the different scales, and are normalized such that M j=1 w j = 1. To determine the weights, we should consider the fact that human vision is most sensitive to medium frequencies. As our parameter tuning results will show, GMSD has higher weights at these scales. 3.3. Extension to Color Image Assessment We will show in Section that MS-GMSD exhibits competitive performance on assessing image quality. However, like most IQA indices, it can only assess distortions for grayscale images. In the case of saturation degradation in Fig., there is no response of MS-GMSD since the distorted image still shows the same luminance as the reference. Since chromatic information is an important perspective of perception, we extend MS-GMSD to assess chromatic distortions. We first convert the images to YIQ space, where Y channel represents the luminance information, whereas I and Q channels represent the chrominance components. In this way, we can treat luminance and chrominance information individually. The luminance distortion is still measured using our 15
(a) Reference Image (b) Image with saturation distortion. Fig. : IQA methods that only consider the luminance information cannot to assess chromatic degradation. MS-GMSD index, while the chrominance dissimilarity of I and Q channels are defined as the root mean square errors of the two images: I = (Y I,r i=1 M (i) Y I,d M (i)) ; (7) Q = (Y Q,r i=1 M (i) Y Q,d M (i)) ; () where I an Q represent the chrominance dissimilarity for I and Q channel respectively. It should be noted that chrominance dissimilarity is only evaluated at Scale M because the spatial resolution to color stimulation is much lower than luminance for vision perception. The overall chrominance dissimilarity is calculated by: chrom = I + Q ; (9) The final assessment score MS-GMSD c is the weighted sum of the chrominance and luminance dissimilarity: MS-GMSDc = γms-gmsd + (1 γ)(β 1 chrom ) (1) where β 1 is the scaling parameter that scales the chrom, and γ is the weight balancing the two terms. Because HVS is more sensitive to luminance than chrominance, we should always have a higher weight for MS-GMSD unless luminance distortion is unnoticeable. That is, γ should monotonically increase as a function of MS-GMSD. Also, γ has the range of [, 1]. Considering these requirements, we propose the weight coefficient as a logistic function: γ = 1; (11) 1 + β exp( β 3 MS-GMSD) where β and β 3 are parameters with positive values. The complete procedure of our proposed MS-GMSD c is illustrated in Fig. 1.. EXPERIMETS AD COMPARISO.1. Databases and Evaluation Methods In order to compare our IQA index with other methods, we use four comprehensive datasets in the experiment: TID13 [1], TID [15], CSIQ [1] and LIVE [1]. In each dataset, each distorted image is given a corresponding mean opinion score () assessed by different subjects. Among these four datasets, TID13 contains more comprehensive types of chromatic distortions. We use four metrics to evaluate the consistency between and subjective scores [17]: Spearman rank-order correlation coefficient (SROCC) and Kendall rank-order correlation coefficient (KROCC) are used to evaluate prediction monotonicity; while Pearson linear correlation coefficient (PLCC) and root mean squared error () are utilized to measure prediction accuracy. A better objective index should have higher SROCC, KROCC, PLCC and lower. Readers can refer [17] for the details of these metrics. In order to make a comprehensive comparison, we choose 1 state-of-art FR IQA indices as well as PSR in our experiments. These indices are: SSIM [], MS-SSIM [], IW-SSIM [17], IFC [1], VIF [19], QM [], VSR [1], MAD [1], GSM [11], RFSIM [1], FSIM [7], FSIM c [7], VSI [1] and GMSD [9], among which FSIM c and VSI are IQA metrics designed for color images assessment... Parameter Settings In order to determine the parameters in our index, we tune based on a training set consisting of the first reference images of TID, choosing the parameters that lead to higher SROCC values. For masking control in Eq. (5), we found that α =.5 provides the best results. We use four resolution scales with M = 3, and the corresponding weights in Eq. () are: w =.9, w 1 =.59, w =.9 and w 3 =.19. The parameter result conforms to fact that human vision is most perceptive for structures with medium frequencies. For MS-GMSD c, we tune the parameters on the subset of TID13 because this dataset contains more types of chromatic distortions. The resulting parameters for MS- GMSD c are: β 1 =.1, β =.3 and β 3 = 15..3. Performance Comparison The performance results of the chosen IQA metrics are listed in Table 1. In order to demonstrate benefits to both the multiscale and extension to chrominance, we compare the results for both MS-GMSD and MS-GMSD c. From Table 1, one sees that MS-GMSD steadily outperforms GMSD on all the datasets, showing that incorporating assessment scores of multi-scales is crucial for the IQA index. Even without chrominance component, our proposed MS-GMSD achieves a performance advantage over other methods on TID and CSIQ, and comparable performance as FSIM and MAD on the database of LIVE. For TID13 dataset, MS-GMSD is not as good as FSIM c and V SI, because the later two are color IQA indices. However, it is still more competitive than other grayscale specific meth- 155
Table 1: Performance Comparison of the IQA Indices TID CSIQ LIVE TID13 PSR SSIM MS-SSIM.5531.779.5.7.57.5.573.773.51 1.99.511.7173.5.75.9133..97.7393..13.991.1575.133.119.75.979.9513.5.793.5.73.99.99 13.3597.955.1.9.717.759.95.55.7.7.795.39.919.7.1 IW-SSIM IFC VIF.559.575.791.3.3.5.579.73..95.9119.799.913.771.9195.759.597.7537.91.3.977.13.131.9.957.959.93.175.7579..95.9.9.373 1.3 7.137.7779.539.79.5977.3939.517.319.553.77. 1.3.7 QM VSR MAD GSM RFSIM FSIM.3.7.3.5..5..53.5.59.7.9.1..3..5.73 1.59.915.7.735.7.55.7.1.9.91.995.9.53.7.797.737.75.757.733..95.9.9179.91.175.1575.1.11.1.177.9.97.99.951.91.93.713.71.1.15.71.337.91.931.975.951.935.9597 11.19 1.559.973.37 9. 7.7.3.1.77.79.77.15.7.5.35.55.5951.9.9.7.7..333.59.99.39.975.3.5.39 FSIMc VSI GSMD MS-GSMD MS-GSMDc..979.9.9.9.991.713.713.7.719.7.7.793.9....39.1.7.931.93.9515.955.951.79.757.1.75..919.979.953.951.951.13.979.5.1.1.95.95.957.93.937.33.5.1.37.3.913.9.93.9517.93 7.59.1 7.3 7.7.537.51.95.3.139.95.5.713.35.7.7115.79.9.575.1.95.5959.5.7..5519 1 3 5... (a) Scatter plot of PSR.9 1.5.1.15..5 GMSD FSIMc (d) Scatter plot of FSIMc. 15 1 (c) Scatter plot of MAD.7 5 MAD. (b) Scatter plot of SSIM 1 SSIM PSR.5. (e) Scatter plot of GMSD.3.35.5.1.15..5.3.35 MS-GMSDc (f) Scatter plot of proposed MS-GMSDc Fig. 3: Scatter plots of subjective versus objective IQA scores on the TID13 color dataset. The red line is the fitting curve using the function in [1]. Examples of innacuracies due to the use of a grayscale assessment are circled in dashed yellow. ods. All these results demonstrates that MS-GMSD achieves best performance of assessing grayscale image quality. On the other hand, MS-GMSDc can efficiently assess images with chromatic distortion and improves substantially over MS-GMSD on TID13, while maintaining the performance on other datasets. To further illustrate the effectiveness of the chrominance term, the scatter plot of subjective versus IQA scores on TID is shown in Fig. 3. The fitting curve with the logistic function in [1] is also plotted. In Fig. 3, methods that only utilize the luminance information do not correlate with well for some images because of their unresponsiveness to chromatic distortions. On the contrary, our proposed MS-GMSDc can assess chromatic distortions accurately, thus showing more consistent result than other methods. 5. COCLUSIO In this paper, we propose a multi-scale GMSD using a better similarity index to assess the distortions at different scales and then further extend it to MS-GMSDc for chromatic distortion assessment. The experiment results validate the effectiveness of the approach, and prove that our method achieves the best performance on assessing both grayscale and color images. 15
References [1] Eric C. Larson and Damon M. Chandler, Most apparent distortion: full-reference image quality assessment and the role of strategy, Journal of Electronic Imaging, vol. 19, no. 1, pp. 11, 1. [] Jeffrey Lubin, A human vision system model for objective picture quality measurements, in Broadcasting Convention, 1997. International. IET, 1997, pp. 9 53. [3] John Ross and Harriet D Speed, Contrast adaptation and contrast masking in human vision, Proceedings of the Royal Society of London B: Biological Sciences, vol., no. 1315, pp. 1 7, 1991. [] Weisi Lin and C-C Jay Kuo, Perceptual visual quality metrics: A survey, Journal of Visual Communication and Image Representation, vol., no., pp. 97 31, 11. [5] Zhou Wang and Alan C Bovik, Modern image quality assessment, Synthesis Lectures on Image, Video, and Multimedia Processing, vol., no. 1, pp. 1 15,. [] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P Simoncelli, Image quality assessment: from error visibility to structural similarity, Image Processing, IEEE Transactions on, vol. 13, no., pp. 1,. [7] Lin Zhang, Lei Zhang, Xuanqin Mou, and David Zhang, Fsim: a feature similarity index for image quality assessment, Image Processing, IEEE Transactions on, vol., no., pp. 37 3, 11. [] Zhou Wang, Eero P Simoncelli, and Alan C Bovik, Multiscale structural similarity for image quality assessment, in Signals, Systems and Computers,. Conference Record of the Thirty-Seventh Asilomar Conference on. Ieee, 3, vol., pp. 139 1. [9] Wufeng Xue, Lei Zhang, Xuanqin Mou, and Alan C Bovik, Gradient magnitude similarity deviation: a highly efficient perceptual image quality index, Image Processing, IEEE Transactions on, vol. 3, no., pp. 95, 1. [1] Lin Zhang, Ying Shen, and Hongyu Li, Vsi: a visual saliency-induced index for perceptual image quality assessment, Image Processing, IEEE Transactions on, vol. 3, no. 1, pp. 7 1, 1. [11] Anmin Liu, Weisi Lin, and Manish arwaria, Image quality assessment based on gradient similarity, Image Processing, IEEE Transactions on, vol. 1, no., pp. 15 151, 1. [1] Lin Zhang, Lei Zhang, and Xuanqin Mou, Rfsim: A feature based image quality assessment metric using riesz transforms, in Image Processing (ICIP), 1 17th IEEE International Conference on. IEEE, 1, pp. 31 3. [13] Guan-Hao Chen, Chun-Ling Yang, and Sheng-Li Xie, Gradient-based structural similarity for image quality assessment, in Image Processing, IEEE International Conference on. IEEE,, pp. 99 93. [1] ikolay Ponomarenko, Lina Jin, Oleg Ieremeiev, Vladimir Lukin, Karen Egiazarian, Jaakko Astola, Benoit Vozel, Kacem Chehdi, Marco Carli, Federica Battisti, et al., Image database tid13: Peculiarities, results and perspectives, Signal Processing: Image Communication, vol. 3, pp. 57 77, 15. [15] ikolay Ponomarenko, Vladimir Lukin, Alexander Zelensky, Karen Egiazarian, M Carli, and F Battisti, Tid-a database for evaluation of full-reference visual quality assessment metrics, Advances of Modern Radioelectronics, vol. 1, no., pp. 3 5, 9. [1] Hamid Rahim Sheikh, Muhammad Farooq Sabir, and Alan Conrad Bovik, A statistical evaluation of recent full reference image quality assessment algorithms, Image Processing, IEEE Transactions on, vol. 15, no. 11, pp. 3 351,. [17] Zhou Wang and Qiang Li, Information content weighting for perceptual image quality assessment, Image Processing, IEEE Transactions on, vol., no. 5, pp. 115 119, 11. [1] Hamid Rahim Sheikh, Alan Conrad Bovik, and Gustavo De Veciana, An information fidelity criterion for image quality assessment using natural scene statistics, Image Processing, IEEE Transactions on, vol. 1, no. 1, pp. 117 1, 5. [19] Hamid Rahim Sheikh and Alan C Bovik, Image information and visual quality, Image Processing, IEEE Transactions on, vol. 15, no., pp. 3,. [] iranjan Damera-Venkata, Thomas D Kite, Wilson S Geisler, Brian L Evans, and Alan C Bovik, Image quality assessment based on a degradation model, Image Processing, IEEE Transactions on, vol. 9, no., pp. 3 5,. [1] Damon M Chandler and Sheila S Hemami, Vsnr: A wavelet-based visual signal-to-noise ratio for natural images, Image Processing, IEEE Transactions on, vol. 1, no. 9, pp. 9, 7. 157