Artwork Recognition for Panorama Images Based on Optimized ASIFT and Cubic Projection

Size: px

Start display at page:

Download "Artwork Recognition for Panorama Images Based on Optimized ASIFT and Cubic Projection"

Dominick Hood
6 years ago
Views:

1 Artwork Recognition for Panorama Images Based on Optimized ASIFT and Cubic Projection Dayou Jiang and Jongweon Kim Abstract Few studies have been published on the object recognition for panorama images. To prevent the infringement of artworks in 360-degree images, we put forward an efficient method for artworks recognition inside 360-degree images in this paper. To start with, we employed the improved cubic projection to transform the distorted panorama image. Then, we used the optimized Affine Invariant Feature Transform (ASIFT) algorithm for extracting local features of transformed image. Finally, the feature point matching is based on one-to-one mapping constrain. The overall performance of the method is investigated on panorama dataset and the experimental results are compared with other well-known local feature extraction methods and original panorama image. The experimental results show that using the proposed method can improve around 30% of the accuracy for relatively higher distorted panorama images and reduce the computing time. Index Terms Panorama image, ASIFT, cubic projection. I. INTRODUCTION artwork, recognition, Recently, with the continuous improvement of panorama photography and the advent of easy-to-use 360-degree cameras such as SAMSUNG GEAR, and LG G5, 360-degree videos and images have immediately become very popular. A 360-degree image can easily capture all the viewing directions simultaneously and give users the sense of immersion when viewed [1]. In view of the above-mentioned facts, the potential risks of the copyrighted artworks have been photographed without permissions will be greater than before. Hence copyright infringement of the artworks in 360-degree images will be burning issues. The detection and recognition of unauthorized artworks for the panorama images is required. Although many researches on image recognition have been widely published, quite few studies have been conducted on 360-degree images, not to mention artworks inside panorama images. However, they stay much the same. Those useful methods have been explored for image recognition will sometimes worth applying for object recognition inside panorama images. Since the 1990s, image content based methods have been well accepted to solve image recognition problems. The image content is described by extracting some low visual features. Moreover, these methods can achieve a better performance both on the accuracy and the speed [2]. A variety of local feature extraction algorithms have been found Manuscript received October 10, 2017; revised January 19, Dayou Jiang is with Dept. of Copyright Protection, Sangmyung University, Seoul, Korea ( dyjiang@cclabs.kr).. Jongweon Kim is with Dept. of Electronics Engineering, Sangmyung University, Seoul, Korea (corresponding author; jwkim@smu.ac.kr). in recent years. Among them, the most representative and widely used one is the Scale Invariant Feature Transform (SIFT) [3]. Meanwhile, algorithms such as Speeded Up Robust Features (SURF) [4], Affine SIFT [5], Oriented FAST and Rotated BRIEF (ORB) [6], Binary Robust Invariant Scalable Keypoints (BRISK) [7], and Fast Retina Keypoint (FREAK) [8] have also been employed due to the fact that they can achieve quite good performance. Currently, the deep learning methods such as AlexNet [9], ZFNet [10], GoogLeNet [11], and ResNet [12] are usually exploited to train these local feature vectors to obtain the classification model for large dataset task -- Large Scale Visual Recognition Challenge (LSVRC). With regarding to panorama images recognition, Xiao [13] introduced the problem of scene viewpoint recognition and also studied the canonical view biases exhibited by people taking photos of places. Yang [14] addressed the problem of room structure recognition from a 360 cylindrical panorama. The original panorama was transformed into four perspective projected sub-images. An algorithm to detect and recognize road lane markings from panorama images was presented in [15]. Zhang [16] advocated the use of 360 full-view panoramas in scene understanding and proposed a whole-room context model in 3D. A region based convolutional neutral network (R-CNN) was implemented to train and test on an indoor panorama image dataset in the work [17]. The work [18] developed a novel panorama-to-panorama matching process which either by aggregating features of individual images in a group or by explicitly constructing a larger panorama. Then, an improved ASIFT algorithm for indoor panorama images matching was analyzed and compared with algorithms such as SIFT, SURF, and ASIFT in work [19]. The literatures concerning artwork recognition for panorama images are rare. An artwork identification methodology for 360-degree images which by transforming the 360-degree image into a three-dimensional sphere and wrapping it with a polyhedron was studied in work [20]. The results showed that the method can increase much of identification precision for artwork that displayed on a monitor in a seriously distorted position. What is more, different local features are analyzed for feature matching. However, the more polyhedron employed in the method can improve the performance much more, and that again means large time will be spent in feature detection and matching. Furthermore, visual experiences of panorama image will also become worse. The aim of this paper is to develop an efficient method for artworks recognition for 360-degree images. Therefore, two issues are considered in solving the problem: (1) Use less time doi: /ijmlc

to achieve better performance; (2) use feasible and simple projection to ensure good visual experiences of transformed images from panorama images.

Section III demonstrates the experimental results and performance of the proposed artworks recognition method for panorama artworks datasets. Finally, conclusions are given in Section IV. II. ALGORITHM IMPROVEMENT A.

2 to achieve better performance; (2) use feasible and simple projection to ensure good visual experiences of transformed images from panorama images. The remaining work of the paper is organized as follows: panorama image transform, feature extraction, and feature matching methods are discussed in Section II. Section III demonstrates the experimental results and performance of the proposed artworks recognition method for panorama artworks datasets. Finally, conclusions are given in Section IV. II. ALGORITHM IMPROVEMENT A. Panorama Image Transformation The captured panorama images by using 360 cameras are mainly in equi-rectangular format with an aspect ratio of 2:1. The equi-rectangular projection has been widely used to map the 3D scene onto a 2 dimensional plane. However, this projection has much serious distortions, especially in the two poles. Hence, using original equi-rectangular panorama image for artworks recognition is a worse method. Recently, few of attention on projection of 360-degree images and videos have been reported. Kim [21] proposed an automatic framework for generating content-aware 2D normal view perspective videos from 360 videos, which method is based on Pannini projection [22] model. A new Study of Oculus 360 Degree Video Streaming by using offset cubic map is developed in work [23]. Before long, the Google engineer Brown [24] presented Equi-Angular Cube maps method offer better results and more efficient use of resources, which looks to solve the VR video quality issues. has fewer distortions. Also, it is available and easily obtained. Therefore, this cubic map is selected and improved. The original cubic map converts the panorama into six faces of a cube with each face is rectilinear image. Fig. 1 shows an example of panoramic image with low distortion and small screen. Fig. 2 shows the standard cubic map image from the above mentioned panorama. The improved method meets the demand that the object should be displayed obviously in the transformed images. This can be achieved by adjusting the angle of pitch, yaw and the field of view (FOV). Fig. 3 shows a schematic diagram of yaw, pitch, roll camera (The image is redrawn from work [23]). In Fig. 4, here front face and back faces are highlighted. The x, y and z direction respectively control the pitch, yaw and roll of the view. For different distorted images in different screens, the angle of pitch and roll are changed to suit the change of those situations. The fixed value of FOV is 90. The transform matrix is given by: M roty( yaw) rotx( pitch) rotz( roll) (1) T The value of pitch for large monitor is changed from 20 to 50 degree of Southern Latitude and for small monitor its value is changed from 10 to 30 degree of Southern Latitude. z yaw y roll x pitch Fig. 3. Schematic diagram of yaw, pitch, roll camera. front Fig. 1. Example of a panorama image. left right back Fig. 4. Schematic diagram of standard cubic map. Fig. 2. Corresponding cubic map image of Fig. 1. As for panorama projection, map projections such as Mercator, Goode homolosine, Natural earth and Cubic map are on first consideration. However, most of them cannot be used for artworks recognition. On the one hand, the panorama should be transformed into 3D sphere and then map them to those projections, the whole of this processes spend too much time; on the other hand, those projections still have unavoidable distortions. The difference is cubic map which B. Optimized ASIFT The ASIFT algorithm is preceded by the following steps: 1) Images are transformed by changing the positions of latitude and the longitude. The images are rotated followed by tilts t. In the algorithm, t is the T-subsampling in height direction. t 1 cos (2) height ' height / t (3) 2) Perform the rotations and tilts by changing values of and to ensure the simulated images with views as many as possible. 55

3) Use SIFT algorithm to detect features for all simulated images. The irregular sampling of the parameters of and of ASIFT is shown in Fig. 5 as below (The image is redrawn from work [5]).

For example, when tilt = 2, t = 2, is set as 45 ; when tilt = 3, t = 2, is set as 30.

The images from left to right on the top respectively represents: transform 1 (t =1, =0 ), transform 2 (t = 2, =0 ), transform 3 (t = 2, =45 ); the bottom images from left to right: transform 4 (t =

3 3) Use SIFT algorithm to detect features for all simulated images. The irregular sampling of the parameters of and of ASIFT is shown in Fig. 5 as below (The image is redrawn from work [5]). y θ=45 φ Fig. 5. Sampling of the parameters θ and φ. The sampling steps were validated by SIFT with simulated tilt and t is experimentally fixed to 2.While is changed with the change of t. For example, when tilt = 2, t = 2, is set as 45 ; when tilt = 3, t = 2, is set as 30. Thus, with tilt =2, four simulation images with angle steps of 45 are generated; with tilt =3, six simulation images with angle steps of 30 are obtained. Fig. 2 shows the simulated images of ASIFT. The images from left to right on the top respectively represents: transform 1 (t =1, =0 ), transform 2 (t = 2, =0 ), transform 3 (t = 2, =45 ); the bottom images from left to right: transform 4 (t = 2, =90 ), transform 5 (t = 2, =135 ) and transform 6 (t =2, =0 ). As shown in Fig. 6, we can conclude that, the simulated images of transform 3 and transform 5 are larger than other affine transformed images which means need much time for feature detection and extraction, so to simplify the process of them may be a good idea. What is more, as for panorama images, the effect of rotation transformation in height direction is less, so use less rotation transforms can save the running time and achieve the relative performance as using the whole transforms. Also, to speed up the algorithm, CPU local pool is employed. Fig. 6. Examples of simulated ASIFT images. C. One-To-One Mapping The basic feature matching algorithm used in the x experiment is UBCMATCH [25], which finds the closest descriptor D_B in image B for each descriptor D_A in image A. Although the matches can be filtered for uniqueness by using a threshold which is the ratio of the distance between the best matching keypoint and the distance to the second best one, some error matches still appear. After several test evaluation, the error matched points usually located in a central place. Sometimes, the point in the original artwork can be matched with many pointes in the training image after using UBCMATCH with threshold of 3.0. Therefore, the one-to-one mapping is adopted to solve the problems of some one-to-many error matches. Fig. 7 shows that one keypoint of panorama has been matched with two different keypoints from original artwork image. An example of using one-to-one mapping for feature matching is shown in Fig. 8. Compared with only using UBCMATCH, using one-to-one mapping has reduced the number of mismatched keypoints. The result clearly illustrates the positive effect of one-to-one mapping. Fig. 7. Feature matching result of using ubcmatch. Fig. 8. Feature matching result of using one-to-one mapping. III. SIMULATION RESULTS Experiments are carried out on the artworks panorama images dataset which is created by the 50 famous artworks from the Google website. These images are shown in Fig. 9. The images are downloaded in JPG format with different sizes. Among them, the biggest file size is 7.22MB and the smallest file size is 89.1KB, while at least half of them are less than 1MB. The panorama images are captured and automatically stitched with LG 360 that has dual wide-angle cameras. The default size of panorama image is with 72 DPI. To make the simulation more reality, three different distorted panorama images are captured in three different positions and two different screens which sizes are 32 inch and 79 inch. Thus, the panorama artwork dataset in which 300 panorama images are evenly distributed in 6 kinds of categories is created. The larger distortion means the artworks inside the panorama images away from the equator much more. The experiments are performed in the computer that has Xeon (R) 3.50GHz CPU, 8.00GB RAM, and Windows 7 Professional K 64bit system. 56

that generated from cubic projection and original panorama. As for Low distortion image displayed in 32 inch screen monitor, the results of feature matching are shown in Fig. 12-15. Fig. 9.

Therefore, reducing the experiment images can not only reduce the computing time but also improve the percentage of the "true" matches.

In the feature extraction process, the SIFT method uses the default parameters, by comparison, the SIFT based ASIFT method uses Parallel function with 4 local pools to accelerate processing.

Meanwhile, the threshold value of UBCMATCH for feature matching is 3.0 due to the fact that the value of 2.0 leads to too many error matches for higher distorted panorama images. As shown in Fig.

4 that generated from cubic projection and original panorama. As for Low distortion image displayed in 32 inch screen monitor, the results of feature matching are shown in Fig Fig. 9. Well-known artworks used in the experiment. Considering that the big size can detect large feature points and most of them cannot be "true" matched. Therefore, reducing the experiment images can not only reduce the computing time but also improve the percentage of the "true" matches. The panorama images are resized to , the resized artworks' sizes are , and the generated transformed images' sizes are In the feature extraction process, the SIFT method uses the default parameters, by comparison, the SIFT based ASIFT method uses Parallel function with 4 local pools to accelerate processing. The original uses 5 times SIFT (the whole tilt 1 and tilt 2 of ASIFT), while the improved ASIFT uses 4 times SIFT (the whole 1, the first and third of tilt 2 and the first one of tilt 3 of ASIFT). Meanwhile, the threshold value of UBCMATCH for feature matching is 3.0 due to the fact that the value of 2.0 leads to too many error matches for higher distorted panorama images. As shown in Fig. 10 and Fig. 11, on the contrary, the number of matching lines between the panorama image and corresponding original images is less than with different original image, which will lead to false recognition. To ensure the reliability of the experiments, the preset parameters mentioned above are all selected after several test evaluation. Fig. 12. Feature matching results of using original panorama with Fig. 13. Feature matching results of using cubic projected image with Fig. 14. Feature matching results of using original panorama with different original artwork. Fig. 10. Feature matching results of using SIFT with corresponding original artwork under ubcmatch threshold 2.0. Fig. 11. Feature matching results of using SIFT with different original artwork under ubcmatch threshold 2.0. Three kinds of comparison experiments are conducted to measure the performance of the proposed method. First of all, the following example is constructed using SIFT for the purpose of evaluating the recognition performance of image Fig. 15. Feature matching results of using cubic projected image with The Fig. 12 and Fig. 14 show that original panorama has some true matches for the corresponding artwork, and some error matches with different artworks. Whereas, the Fig.12 and Fig. 13 show that cubic projected image has more true matches than panoramic image with corresponding original artwork. What is more, Fig. 13 and Fig. 15 show that cubic projected image has less error matches than panoramic image with Therefore, the results indicate that the proposed panorama transform method - cubic map- is more suitable for artwork recognition inside panorama. Secondly, the comparison experiments are conducted to estimate the performance of SIFT and ASIFT. Fig showed the feature matching results of using ASIFT algorithm. Since experiments had been tested by using SIFT, here, only the matching results of ASIFT with cubic projected 57

5 images are displayed. By comparison with Fig. 13 and Fig. 15, although the Fig.16 and Fig. 17 showed less matched points, in fact, ASIFT has outperformed SIFT in feature extraction and matching for more distorted images. Fig. 18 showed more true-matched points than Fig. 16. But Fig. 19 has more error-matched points than Fig. 17. Therefore, ASIFT that has more affine transforms may appear less efficient for artwork recognition in panorama. Further experiments on different distorted images should be conducted to verify the performance of ASIFT with different affine transforms. The following example is taken from the experiment which is conducted between ASIFT and optimized ASIFT. High distortion images displayed in 79 inch screen monitor are used. The experiments results are shown in Fig Fig. 20. Feature matching results of using ASIFT with 5 transforms (1-5) for Fig. 16. Feature matching results of using ASIFT with 2 transforms (1-2) for Fig. 21. Feature matching results of using ASIFT with 5 transforms (1-5) for Fig. 17. Feature matching results of using ASIFT with 2 transforms (1-2) for Fig. 22. Feature matching results of using ASIFT with 4 transforms (1,2,4,6) for Fig. 18. Feature matching results of using ASIFT with 5 transforms (1-5) for Fig. 19. Feature matching results of using ASIFT with 5 transforms (1-5) for Although the ASIFT has applied Parallel function with 4 local pools to accelerate processing, the large computing time of it is still a problem in real-time recognition. Therefore, find a way to use less time to achieve good performance is required. Fig. 23. Feature matching results of using ASIFT with 4 transforms (1,2,4,6) for Finally, the whole panorama artwork dataset is used for measuring the performance of proposed method. In all the experiments, the parameters are preset and mentioned above. Numbers of the matched points are computed and sorted in descend order. Since those parameters are specially selected after several test evaluation, the error matches can be ignored. So, to simplify the compute process, all the matches of points are regarded as the true matches. The recognition leads to true or false is determined by the numbers of each image matching. If the artwork image in panorama has the most numbers of the matched pointers when it is matched with its corresponding 58

original artwork, the recognition result will lead to true, otherwise, it will be adjudged false. The accuracy is the percentage of true matched images in each panorama category.

6 original artwork, the recognition result will lead to true, otherwise, it will be adjudged false. The accuracy is the percentage of true matched images in each panorama category. The experimental results of the proposed method compared with other methods shown in Fig. 24. Fig. 24. Results of using proposed method on artwork dataset. Here, in the Fig. 24, 'C' represents cubic map projection; 'O' stands for original panorama; 'S' is SIFT feature method; 'AS' is Affine SIFT method; '1-5' refers to affine transform from 1 to 5; '1246' is the 1st, 2nd, 4th and 6th affine transforms. As shown in Fig. 24, for small screen with high distortion, although the proposed method-cas(1246)-has relatively lower accuracy for artwork recognition, it is the best one among these methods. What is more, for large screen panorama images, low distorted and middle distorted panorama images with small screen, the proposed method showed better performance than others methods and the recognition accuracy are all larger than 85%. The conclusions can be reached that the using the proposed panorama transform method can improve large with compared to original panorama image; the ASIFT has great advantage over SIFT for image recognition; and the optimized ASIFT can use less time to achieve the passable even better performance than ASIFT. In sum, the proposed artwork recognition method for 360-degree image which based on cubic projection and optimized ASIFT is an efficient method. IV. CONCLUSION A Few researches on artworks recognition for panorama images have been performed. The advantage of the proposed method is that it develops an efficient artworks recognition method for panorama images with well visual experience and better performance. In the paper, we used cubic project to transform distorted panorama images and employed optimized ASIFT to reduce the computing time and improve the recognition accuracy. Besides, we adopted one-to-one mapping constrain to remove large error feature matches. The results demonstrate that both the performance accuracy and computing time have been improved obviously. Even if the panorama images are seriously distorted, the satisfied results also can be obtained. However, the algorithm may have shortcomings in the reality of situations where artworks can be displayed anywhere. Also, the method will be more efficient by using GPU to speed up feature extraction and matching tasks. Therefore, powerful GPU will be considered to accelerate the algorithm. In addition, large artworks datasets that included situations as many as possible will be done in the future work of the paper. ACKNOWLEDGMENT This work was supported by Ministry of Culture, Sports and Tourism (MCST) and Korea Creative Content Agency (KOCCA) in the Culture Technology (CT) Research & Development Program REFERENCES [1] T. Ho and M. Budagavi, Dual-fisheye lens stitching for 360-degree imaging, IEEE International Conference on Acoustics, Speech and Signal Processing, [2] G. Zhen, L. Zhuo, J. Zhang, and X. G. Li, A comparative study of local feature extraction algorithms for web pornographic image recognition, Informatics and Computing, pp , [3] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vo1. 60, no. 2, pp , [4] H. Bay, T. Tuytelaars, and L. V. Gool, SURF: Speeded up robust features, in Proc. European Conference on Computer Vision (ECCV), 2006, vol. 3951, no. 3, pp [5] J. M. Morel and G. Yu, ASIFT: A new framework for fully affine invariant image comparison, SIAM Journal on Imaging Sciences, vol. 2, no. 2, pp , [6] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, ORB: An efficient alterative to SIFT or SURF, in Proc. International Conference on Computer Vision (ICCV), 2011, pp [7] L. Stefan, M. Chli, and R. Y. Siegwart, BRISK: Binary robust invariant scalable keypoints, in Proc. International Conference on Computer Vision (ICCV), 2011, pp [8] A. Alahi, R. Ortiz, and P. Vandergheynst, FREAK: Fast retina keypoint, in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, pp , [10] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Proc. European Conference on Computer Vision, 2014, pp

[11] C. Szegedy et al., Going deeper with convolutions, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9. [12] K. M. He, X. Y. Zhang, S. Q. Ren, and J.

7 [11] C. Szegedy et al., Going deeper with convolutions, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp [12] K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp [13] J. X. Xiao, K. A. Ehinger, A. Oliva, and A. Torralba, Recognizing scene viewpoint using panorama place representation, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp [14] H. Yang and H. Zhang, Indoor structure understanding from single 360 cylindrical panorama image, Computer-Aided Design and Computer Graphics, pp , [15] C. Li, L. Creusen, L. Hazelhoff, and P. H. N. D. With, Detection and recognition of road markings in panorama images, in Proc. Asian Conference on Computer Vision, 2016, pp [16] Y. D. Zhang, S. R. Song, P. Tan, and J. X. Xiao, Pano-context: A whole-room 3d context model for panorama scene understanding, in Proc. European Conference on Computer Vision, 2014, pp [17] F. Deng, X. Zhu, and J. Ren, Object detection on panorama images based on deep learning, in Proc. Control, Automation and Robotics (ICCAR), 2017, pp [18] A. Iscen, G. Tolias, Y. Avrithis, T. Furon, and O. Churn, Panorama to panorama matching for location recognition, in Proc. ACM on International Conference on Multimedia Retrieval, 2017, pp [19] H. Fu, D. H. Xie, R. F. Zhong, Y. Wu, and Q. Wu, An improved ASIFT algorithm for indoor panorama image matching, in Proc. Ninth International Conference on Digital Image Processing, 2017, vol [20] X. Jin and J. W. Kim, Artwork identification for 360-degree panorama images using polyhedron-based rectilinear projection and keypoint shapes, Applied Sciences, vol. 7, no. 5, p. 528, [21] Y. W. Kim et al., Automatic content-aware projection for 360 videos, Computing Research Repository (CoRR), [22] T. K. Sharpless, B. Postle, and D. M. German, Pannini: A new projection for rendering wide angle perspective images, in Proc. the Sixth International Conference on Computational Aesthetics in Graphics, Visualization and Imaging, 2010, pp [23] C. Zhou, Z. Li, and Y. Liu, A measurement study of oculus 360, Degree Video Streaming, [24] C. Brown. Bring pixels front and center in VR video. [Online]. Available: d-center-vr-video, [25] VLFeat.org. [Online]. Available: Dayou Jiang He received his M.S. degree in Computer application Technology from YanBian University, China, in He is currently pursuing the Ph.D. degree in Copyright Protection, Sangmyung University, Korea. His research interests are Image retrieval, Image identification, music identification, digital forensics. Jongweon Kim received the Ph.D. degree from University of Seoul, major in signal processing in He is currently a professor of Dept. of Electronics Engineering and director of Creative Content Labs at Sangmyung University in Korea. He has a lot of practical experiences in the digital signal processing and copyright protection technology in the institutional, the industrial, and academic environments. His research interests are in the areas of copyright protection technology, digital rights management, digital watermarking, and digital forensic marking. 60

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850