A2-RL: Aesthetics Aware Reinforcement Learning for Automatic Image Cropping

Size: px
Start display at page:

Download "A2-RL: Aesthetics Aware Reinforcement Learning for Automatic Image Cropping"

Transcription

1 A2-RL: Aesthetics Aware Reinforcement Learning for Automatic Image Cropping Debang Li Huikai Wu Junge Zhang Kaiqi Huang NLPR, Institute of Automation, Chinese Academy of Sciences {debang.li, {jgzhang, Homepage: debangli.github.io/a2rl sliding window based methods usually need tens of thousands of windows which is very time-consuming. Although we can set several different aspect ratios and densely extract candidates, it inevitably costs lots of time and is still unable to cover all conditions. We also believe that the sliding window method is different from human s cropping process and is not that natural for aesthetic quality evaluation. According to our observation, human usually takes a whole view of the input image and makes sequential decisions to find the best region, almost never searching patch by patch as the sliding window method does. Based on above observations, in this paper, we formulate the automatic image cropping problem as a sequential decision-making process, and propose an Aesthetics Aware Reinforcement Learning (A2-RL) model to solve such problem. The sequential decision-making based automatic image cropping process is illustrated in Figure 1. To our knowledge, we are the first to put forward a reinforcement learnarxiv: v1 [cs.cv] 14 Sep 2017 Abstract Image cropping aims at improving the aesthetic quality of images by adjusting their composition. Most previous methods rely on the sliding window mechanism. The sliding window mechanism requires fixed aspect ratios and limits the cropping region with arbitrary size. Moreover, the sliding window method usually produces tens of thousands of windows which is very time-consuming. Motivated by these challenges and also inspired by the human s cropping process, we firstly formulate the aesthetic image cropping as a sequential decision-making process and propose an Aesthetics Aware Reinforcement Learning (A2-RL) framework to address this problem. Particularly, the proposed method develops an aesthetics aware reward function which especially benefits image cropping. Similar to human s decision making and to better utilize the historical experience, we use a LSTM based state representation including both the current and historical experience. We train the agent using the actor-critic architecture in an end-to-end manner. The agent is evaluated on several popular unseen cropping databases. Experiment results show that our method achieves the state-of-the-art performance with much fewer candidate windows and much less time compared with previous methods. 1 Introduction Image cropping is an common task in image editing, which aims to extract well-composed regions from ill-composed images. It can improve the visual quality of images, because the composition plays an important role in the image quality. An excellent automatic image cropping algorithm can give editors professional advices and help them save a lot of time (Kao, He, and Huang 2017). In the past decades, many researchers have devoted their efforts to proposing novel methods (Yan et al. 2013; Fang et al. 2014; Chen et al. 2017a) for automatic image cropping. Most of these methods follow a three-step pipeline: 1) densely extract candidates with the sliding window method on the input image, 2) Extract carefully-designed features from each region and 3) Use a classifier or ranker to grade each window and find the best region. Although these works have achieved pretty good performance, they may not find the best results due to the limitations of the sliding window method, which requires fixed aspect ratios and limits the cropping region with arbitrary size. What s more, these Input Step 1 Step T-3 Step T-2 Step T-1 Step T: Termination & Output Figure 1: Illustration of the sequential decision-making based automatic cropping process. The cropping agent starts from the whole image and takes actions to find the best cropping window in the input image. At each step, it takes an action (yellow and red arrow) and transforms the previous window (dashed-line yellow rectangle) to a new state (red rectangle). The agent takes a termination action and stops at step T. We also use the VFN (Chen et al. 2017b) to evaluate the input image and the cropped image. The scores of original image and cropped image are and respectively, which validates the capability of our agent.

2 ing based method for automatic image cropping. The A2- RL model can finish the cropping process within several or a dozen steps and get results of arbitrary shape, which can overcome the disadvantages of the sliding window method. Particularly, A2-RL model develops a novel aesthetics aware reward function which especially benefits the image cropping. Inspired by human s decision making, the historical experience is also explored to assist the current decision with a LSTM based state representation. We test the model on three unseen popular cropping databases (Yan et al. 2013; Fang et al. 2014; Chen et al. 2017a), and the experiment results demonstrate that our method obtains the state-of-the-art cropping performance with much fewer candidate windows and much less time compared with related methods. 2 Related Work Image cropping aims at improving the composition of images, which is very important for the aesthetic quality. There are a number of previous works for aesthetic quality assessment. Many early works (Ke, Tang, and Jing 2006; Datta et al. 2006; Luo, Wang, and Tang 2011; Dhar, Ordonez, and Berg 2011) focus on designing handcrafted features based on intuitions from human s perception or photographic rules. For example, some low-level features such as colorfulness and the rule of thirds are proposed to discriminate whether an image is pleasing or not (Datta et al. 2006). Some high-level attributes, including composition and content, are also used to describe images (Dhar, Ordonez, and Berg 2011). Recently, thanks to the fast development of deep learning and newly proposed large scale databases (Murray, Marchesotti, and Perronnin 2012), there are many new works (Kong et al. 2016; Mai, Jin, and Liu 2016; Deng, Loy, and Tang 2017) which accomplish aesthetic quality assessment with convolutional neural networks. Previous automatic image cropping methods can be divided into two classes, attention-based and aesthetics-based methods. The basic approach of attention-based methods (Suh et al. 2003; Stentiford 2007; Park et al. 2012; Chen et al. 2016) is to find the most visually salient regions in the original images. Attention-based methods can find cropping windows that draw more attention from people, but they may not generate very pleasing cropping windows, because they hardly consider about the image composition (Chen et al. 2017a). For those aesthetics-based image cropping methods, they aim to find the most pleasing cropping window from original image. As these methods use the aesthetics quality as criterion, they use almost the same features as aesthetics quality assessment. Some of these works (Nishiyama et al. 2009; Fang et al. 2014) use aesthetic quality classifiers to discriminate the quality of candidate windows. Other works use RankSVM (Chen et al. 2017a) or RankNet (Chen et al. 2017b) to grade each candidate window. There are also change-based methods (Yan et al. 2013), which compares original images with cropped images so as to throw away distracting regions and retain high quality ones. Most current aesthetics-based methods (Fang et al. 2014; Chen et al. 2017a; 2017b) still rely on the sliding window method to obtain the candidate windows. As discussed above, the sliding window method uses fixed aspect ratios and limits window with arbitrary size. What s more, these methods need tens of thousands of candidates to finish cropping process. In this paper, we propose a reinforcement learning based strategy to search the cropping window. With this method, we can find final result with only several or a dozen candidates of arbitrary size. Reinforcement learning based strategies have been successfully applied in many domains of computer vision, including object detection (Caicedo and Lazebnik 2015; Jie et al. 2016), image caption (Ren et al. 2017) and visual relationship detection (Liang, Lee, and Xing 2017). The active object localization method (Caicedo and Lazebnik 2015) achieves the best performance among detection algorithms without region proposals. Tree-RL method (Jie et al. 2016) uses reinforcement learning to obtain region proposals and achieve comparable result with much fewer region proposals compared to RPN (Ren et al. 2015). To our best knowledge, we are the first to put forward a deep reinforcement learning based method for automatic image cropping. 3 Aesthetics Aware Reinforcement Learning When a person crops an image, he may first take a look at the whole image and try to get a patch as initial result. After that, he may continue searching the better cropping windows based on the whole image and previous cropping results until obtaining a satisfactory result. Inspired by such observation, we think automatic image cropping can be formulated as a sequential decision-making process. In the decision-making process, an agent interacts with the environment, and takes a series of actions to optimize a target. As illustrated in Figure 2, for our problem, the agent receives observation from the input image and the cropping window. Then it samples action from the action space according to the observation and historical experiences. The sampled action is executed by the agent to manipulate shape and position of the cropping window. After each action, the agent receives a reward according to the aesthetic score of cropped image. Its target is to find a most pleasing window in the original image by maximizing the accumulated reward. In this section, we first introduce the state space, the action space and the aesthetics aware reward of our A2-RL model, then we detail the architecture of our aesthetics aware reinforcement learning (A2-RL) model and the training process. 3.1 State and Action Space At each step, the agent decides which action to execute according to the current state. The state must provide the agent with a comprehensive information for better decision. As the A2-RL model formulates the automatic image cropping as a sequential decision-making process, the state can be represented as s t = {o 1, o 2,, o t 1, o t }, where o t is the current observation of the agent. This formulation is similar to human s decision making process, which considers the current observation and historical experience. A2-RL model uses the features of cropping window and input image as the current observation o t. Agent can learn about the global

3 Observation (o t ) Global Feature (retained) Agent FC FC FC LSTM Action Space (14) Termination CONV5 State value sample CONV Local Feature Aesthetics score Reward Function Figure 2: Illustration of the A2-RL model architecture. In the forward pass, the feature of cropped window is extracted by VFN (Chen et al. 2017b) and concatenated with the feature of whole image which is extracted and retained previously. Then, the concatenated feature vector is fed into the actor-critic branch which has two outputs. The actor output is used to sample action from action space so as to manipulate the cropping window. The critic output is used to estimate the expected reward at the current state. In addition, the feature of cropping window is also fed into an image quality evaluation branch. The output of this branch is the aesthetics score of the input cropping window and stored to compute rewards for actions. In this model, both global feature and local feature are 1000-dimension vectors, three fully-connected layers and the LSTM layer all output 1024-dim feature vectors. information and the local information with such observation. Both the local feature and the global feature are extracted in FC6 layer of the pre-trained View Finding Network (VFN) (Chen et al. 2017b), which is modified from the original AlexNet (Krizhevsky, Sutskever, and Hinton 2012). In the A2-RL model, we use a LSTM unit to memorize historical observations {o 1, o 2,, o t 1 }, and combine them with the current observation o t to form the state s t. We choose 14 pre-defined actions to form the action space which can be divided into four groups: scaling actions, position translation actions, aspect ratio translation actions and a termination action. The first three groups aim to adjust the size, position and shape of the cropping window, including 5, 4 and 4 actions respectively. These three groups follow similar definitions in (Jie et al. 2016), but with different scales. All these actions adjust the shape and position by 0.05 times of the original image size, which could capture more accurate cropping windows than a large scale. The termination action is a trigger for agent, when this action is chosen, the agent will stop the cropping process and output the current cropping window as the final result. Theoretically, the agent can cover any window with different size and position on the original image. The observation and action space are illustrated in Figure 2 for an intuitional representation. 3.2 Aesthetics Aware Reward The goal of the A2-RL model is to find the most pleasing cropping window on the original input image. So, the reward function must guide the agent to find a cropping window with better aesthetic quality at each step. In the A2-RL model, the VFN (Chen et al. 2017b) is used to give each cropped image a quality score. When an action is executed, the difference of aesthetic scores between the last cropping window and the current cropping window can be used to compute the reward for this action. More detailed, if the aesthetic score of the current window is higher than the last one, agent will receive a positive reward. On the contrary, if the score becomes lower, the agent will get a negative reward. For an input image I, we denote the score given by the VFN as S VFN (I). The current cropped image and the last cropped image are denoted as I t and I t 1 respectively, and we use a sign function to compute the basic reward for the current action, which can be formulated as: r basict = sign(s VFN (I t ) S VFN (I t 1 )) (1) This is the foundation of our aesthetics aware reward function. We also add other heuristic constraints for better cropping policies. We think well-composed images aspect ratio are limited in a particular range. In the A2-RL model, if the aspect ratio of the current window is lower than 0.5 or higher than 2, the agent will receive a negative reward as a punishment for the corresponding action. Here, we empirically set the negative reward as -5, because we want the agent learns a strict rule not to let such situation happen. To speed up the cropping process, we give the agent an additional negative reward t at every step, where t is the number of steps agent has taken since the beginning. This constraint will result in a lower reward when the agent takes too many steps. Let ar denote the aspect ratio of the current window, the whole reward function r t for the agent taking an action a t under state s t can be formulated as: { r t (s t, a t ) = 5, if ar < 0.5 or ar > 2 r basict t, otherwise (2)

4 3.3 A2-RL Model With the defined state space, action space and reward function, here we introduce the architecture of our Aesthetics Aware Reinforcement Learning (A2-RL) model. The detailed architecture of the model is illustrated in Figure 2. The A2-RL model starts with a 5-layer convolution block and a fully-connected layer for feature representation. Then the model splits into two branches, the first one is the actorcritic branch, the other is an image quality evaluation branch. The actor-critic branch is composed of three fully connected layers and a LSTM layer. The LSTM layer is used to memorize historical observations. The actor-critic branch has two outputs, the first one is policy output, which is also named Actor, the other output is value output, also named Critic. The policy output is a fourteen-dimension vector, each dimension corresponding to the probability of taking relevant action. The value output is the estimation of the current state, which is the expected accumulated reward in such situation. The image quality evaluation branch outputs an aesthetic quality score for the cropped image, which is used to compute reward. In the image cropping process, the A2-RL model provides the agent with probability of each action under the current state. As shown in Figure 2, the model feeds the cropped image into feature representation unit and extracts feature at first. Then the feature is combined with the global feature which is extracted in the first forward pass and retained for the following process. The combined feature vector is then fed into the actor-critic branch. According to the policy output, the agent samples relevant action and adjusts size and position of the cropping window correspondingly. For example, in Figure 2, the agent executes sampled action to shrink the cropping window from left and top with 0.05 times the size of image. Forward pass will continue until the termination action is sampled. 3.4 Training A2-RL Model To make our A2-RL model learn good cropping policies, our training process is based on asynchronous advantage actor-critic (A3C) algorithm (Mnih et al. 2016). Different from the original A3C, we replace asynchronous mechanism with mini-batch to increase diversity. In training stage, we use advantage function (Mnih et al. 2016) and entropy regularization term (Williams and Peng 1991) to form optimization objective of the policy output. We use R t to denote accumulated reward at step t, which is k 1 i=0 γi r t+i + γ k V (s t+k ; θ v ), where γ is the discount factor, V (s t ; θ v ) is the value output under state s t, and k ranges from 0 to t max. t max is the maximum number of steps before updating. The optimization objective of the policy output is to maximize the advantage function R t V (s t ; θ v ) and entropy of the policy output H(π(s t ; θ)), where π(s t ; θ) is the probability distribution of policy output and H is the entropy function. Entropy in optimization objective is used to increase the diversity of actions, which can make the agent learn flexible actions. The optimization objective of the value output is to minimize (R t V (s t ; θ v )) 2 /2. So, gradients of actor-critic branch can be formulated as θ logπ(a t s t ; θ)(r t V (s t ; θ v )) + β θ H(π(s t ; θ)) and θv (R t V (s t ; θ v )) 2 /2, where β is to control the influence of entropy. The whole training procedure of A2-RL model is described in Algorithm 1. T max and t max means maximum number of steps the agent takes before termination and updating network respectively. Algorithm 1: Training procedure of A2-RL model Input: original image I 1 f local = f global = F eature extractor(i) 2 previous score = S VFN (I) 3 T = 0 4 repeat 5 t = 0 6 repeat 7 o t = concat(f global, f local ) 8 s t = LST M AC (o t ) //Actor-Critic with LSTM 9 Perform a t according to policy π(a t s t ; θ) and get the cropped image I c 10 fc6 local = F eature extractor(i c ) 11 score = S VFN (I c ) 12 r t = reward(previous score, score, I c, T ) 13 t = t + 1 and T = T previous score = score 15 until { t t max or a t is termination action; 0 if at is termination action 16 R = V (s t ; θ v ) for other actions 17 for i {t 1,..., 0} do 18 R r i + γr 19 dθ dθ + θ logπ(a t s t ; θ)(r t V (s t ; θ v )) +β θ H(π(s t ; θ)) 20 dθ v dθ v + θv (R t V (s t ; θ v )) 2 /2 21 end 22 Update θ with dθ and θ v with dθ v 23 until T T max or a t is termination action; 4 Experiments 4.1 Experimental Settings In this section, we will introduce our way to obtain training data, implementation details during training procedure and evaluation data and metrics. Training Data To train our network, we select images from a large scale aesthetics image database named AVA (Murray, Marchesotti, and Perronnin 2012), which consists of training images and testing images. All these images are labeled with aesthetic score rating from one to ten by several people. As the score distribution is extremely unbalanced, we simply divide them into three classes: low quality, middle quality and high quality. These three classes correspond to score from one to four, four to seven and seven to ten respectively. We choose about 3000 images from each class to compose the training set. Finally, there are 9000 images in the training set. With these training data, the A2-RL model can learn cropping policies

5 for images with diverse quality, which can make the model generalize well to different images. Implementation Details RMSProp (Tieleman and Hinton 2012) algorithm is utilized to optimize the A2-RL model and the learning rate is set to We set the batch size as 32 and the weight of entropy loss β as The discount factor of reward is set to be T max and t max are set as 50 and 10 respectively. We also choose 900 images from AVA database as validation set following the same way as training set. As the A2-RL model aims to get a cropping window with higher aesthetic quality score than original image, on validation set, we use the improvement of aesthetic score between the original image and the cropped image as metric. We train the network on training set for 20 epochs and validate the model on validation set every epoch. The model which achieves best average improvement on validation set is chosen as the our final A2-RL model. Evaluation Data and Metric To evaluate the cropping ability of our agent, we test it on three unseen automatic image cropping databases, including CUHK Image Cropping Database (Yan et al. 2013), Flickr Cropping Database (Chen et al. 2017a) and Human Cropping Database (Fang et al. 2014). The first two databases use the same evaluation metrics, but the last one uses a different metric. We adopt the same metric as the original papers for fair comparison. There are 950 testing images in CUHK Image Cropping Database, which are annotated by three different expert photographers. Flickr Cropping Database contains 348 testing images, and each image has only one annotation. On these two databases, previous works (Yan et al. 2013; Chen et al. 2017a; 2017b) use the same evaluation metrics to measure cropping accuracy, including average intersectionover-union(iou) and average boundary displacement. In this paper, we denote the ground truth window of image i as W g i and cropping window as Wi c. The average IoU of N images can be computed as 1/N N area(w g i W i c )/area(w g i W i c ) (3) i=1 The average boundary displacement computes the average distance between the four edges of ground truth window and cropping window. In image i, we denote four edges of the ground truth window as B g i (l), Bg i (r), Bg i (u), Bg i (b), correspondingly, four edges of the cropping window are denoted as Bi c(l), Bc i (r), Bc i (u), Bc i (b). The average boundary displacement of N images can be computed as 1/N N i=1 j={l,r,u,b} B g i (j) Bc i (j) /4 (4) Human Cropping Database contains 500 testing images, each is annotated by ten people. Because it has more annotations for each image than the first two databases, so the evaluation metric is a little different. Previous works (Fang et al. 2014; Kao, He, and Huang 2017) on this database use top-k maximum intersection-over-union(iou) as the evaluation metric. This metric is similar to previous average IoU, Method Avg IoU Avg Disp Error edn RankSVM+DeCAF VFN+SW A2-RL w/o LSTM A2-RL(Ours) Table 1: Cropping accuracy on Flickr Cropping Database (Chen et al. 2017a). The best results are highlighted in bold. the only difference is that it computes the IoU between proposed cropping window and ten ground truth windows, then it chooses the maximum IoU as result. Top-k means to use k best cropping windows to compute result. 4.2 Evaluation of Cropping Accuracy In this section, we compare cropping accuracy of our A2- RL model with conventional sliding window based cropping methods to validate its effectiveness. As sliding window based VFN (Chen et al. 2017b) achieves the best results among methods without using supervised cropping data, we mainly compare our A2-RL model with this cropping method. Our A2-RL model uses actor-critic based reinforcement learning method to search the best cropping windows sequentially with only several candidate windows. But original VFN-based cropping method uses sliding window to densely extract candidate windows. We also compare with several baselines on these databases. CUHK and Flickr Cropping Databases As the previous VFN method (Chen et al. 2017b) is only evaluated on CUHK and Flickr cropping databases, we also mainly compare our framework with VFN on these two databases. Notably, the original VFN not only uses sliding window candidates, but also uses the ground truth window of test images as candidate, which leads to a remarkably high performance on these two databases. As A2-RL model aims to search the best cropping window, and in practice, there won t be any ground truth window for cropping algorithms, so, in this experiment, we don t use any ground truth window for both frameworks for fair comparison. It s also worthy to mention that, A2-RL model has never seen images from both databases during training. Besides the two frameworks discussed above, we also compare some other cropping methods. We choose the best attention-based method edn reported in (Chen et al. 2017a) on behalf of the attention-based cropping algorithms. This method computes images saliency maps with algorithms from (Vig, Dorr, and Cox 2014), and search the best cropping window by maximizing the difference of average saliency between the cropping window and other region. We also choose the best result (RankSVM+DeCAF 7 ) reported in (Chen et al. 2017a) as another baseline. In this method, aesthetic feature DeCAF 7 is extracted from AlexNet and a RankSVM is trained to find the best cropping window among all the candidates. For all these sliding-window based methods, including edn, RankSVM+DeCAF 7 and VFN+SW (sliding window), the results are all reported with

6 Method Annotation I Annotation II Annotation III Avg IoU Avg Disp Error Avg IoU Avg Disp Error Avg IoU Avg Disp Error edn RankSVM+DeCAF LearnChange VFN+SW A2-RL w/o LSTM A2-RL(Ours) Table 2: Cropping accuracy on CUHK Image Cropping Database (Yan et al. 2013). The best results are highlighted in bold. Method Top-1 Max IoU Top-2 Max IoU Top-3 Max IoU Top-4 Max IoU Top-5 Max IoU (Fang et al. 2014) (Kao, He, and Huang 2017) A2-RL w/o LSTM A2-RL(Ours) Table 3: Cropping accuracy on Human Cropping Database (Fang et al. 2014). The best results are highlighted in bold. the same sliding window setting as (Chen et al. 2017a). Experiments on Flicker Cropping Database are shown in Table 1. VFN+SW and A2-RL are the two mainly comparable frameworks. The first one, VFN+SW, means the original VFN framework (Chen et al. 2017b), which uses sliding window(sw) to densely extract crop candidates. The second one, A2-RL is our reinforcement learning based framework. Similar to Flickr Cropping Database, we list cropping accuracy of above methods on CUHK Cropping Database in Table 2. As there are 3 annotations for each image, following previous works (Yan et al. 2013; Chen et al. 2017a; 2017b), we list the result for each annotation separately. All symbols in Table 2 are the same as Table 1. What s more, we also report the best result in (Yan et al. 2013), in which this database is proposed. Notably, the method is trained with supervised cropping data on this database, which is not very fair for us to compare. As this method is change-based, we denote it as LearnChange in Table 2. From Table 1 and Table 2, we can see that our A2-RL model outperforms other cropping algorithms consistently on these two databases, which demonstrates the effectiveness of our A2-RL model. Human Cropping Database We also evaluate our A2- RL model on Human Cropping Database. Following previous works (Fang et al. 2014; Kao, He, and Huang 2017) on this database, top-k maximum intersection-over-union(iou) is used for cropping accuracy metric as discussed above. We choose two state-of-the-art methods (Fang et al. 2014; Kao, He, and Huang 2017) as our baselines. The cropping accuracy of our A2-RL model and several baselines is listed in Table 3. As A2-RL model finds one cropping window at a time, we make the agent searches on the input image k times for top-k IoU. From Table 3, we can see that our A2-RL model still outperforms the state-of-the-art methods. 4.3 Evaluation of Time Efficiency In this section, we study the time efficiency of our A2-RL model. We compare our A2-RL model with sliding window Method Avg Avg Avg Avg IoU Disp Steps Time(s) VFN+SW VFN+SW VFN+SW A2-RL(Ours) Table 4: Time Efficiency comparison on Flickr Cropping Databse. VFN+SW, VFN+SW+ and VFN+SW++ correspond different number of candidate windows, where VFN+SW follows original setting (Chen et al. 2017b). The best results are highlighted in bold. based VFN model on Flickr Cropping Database. Experimental results are shown in Table 4. The Avg Steps and Avg Time mean average value of steps and time methods cost to finish cropping process on a single image. We augment the number of sliding window in this experiment. VFN+SW, VFN+SW+ and VFN+SW++ in Table 4 correspond different number of candidate windows, where VFN+SW follows original setting (Chen et al. 2017b). Notably, all results in Table 4 are evaluated on the same machine, which has a single NVIDIA GeForce Titan X pascal GPU with 12GB memory and Intel Core i7-6800k CPU with 6 cores. From Table 4, we can easily find that cropping accuracy is improved as we augment the number of sliding windows, but the consumed time is also much higher. Unsurprisingly, our A2-RL model needs much fewer steps and costs much less time than other methods. The average number of steps our A2-RL model takes is more than 10 times less than the average number of steps the sliding window based method takes, but our A2-RL model still get a better cropping accuracy. This result shows the capacities of our RL-based model, with the novel aesthetics aware reward and historypreserved state representation, our model learns to use as few actions as possible to obtain a more pleasant image.

7 (a) VFN+Sliding Window (b) A2-RL(Ours) (c) Ground Truth Figure 3: Image cropping examples on Flickr Cropping Database (Chen et al. 2017a). Best viewed in color. 4.4 Experiment Analysis In this section, we analyse the experiment results and study our model. RL Search vs Sliding Window From Table 1, Table 2 and Table 4, we can find out A2-RL is better than VFN+SW in cropping accuracy and time efficiency consistently. The main difference between these two methods is the way to get the cropping candidates. From this observation, we conclude that our proposed RL-search method is better than sliding window, which is very obvious. Although sliding window can densely extract candidates, it still fails to find very accurate candidate due to the fixed aspect ratios. On the contrary, our A2-RL model can find cropping window with arbitrary size. Observation+History Experience vs only Observation We use LSTM unit to memorize historical observations {o 1, o 2,, o t 1 } and combine them with current observation o t to form state s t. In this section, we study the effect of LSTM unit in our model. We abandon LSTM unit in A2-RL model, so agent only uses current observation o t as state s t to make decision. We train a new agent as such setting and evaluate it on above three databases. Results are showed in Table 1, Table 2 and Table 3, where the results of new agent denoted as A2-RL w/o LSTM. From these results, we can find that the cropping accuracy of new model is much lower in original A2-RL model, which demonstrate the importance of historical experiences. 4.5 Qualitative Analysis We show several cropping results on Flickr Cropping Database of different methods in Figure 3. From Figure 3, we can find that A2-RL model can find better cropping windows with arbitrary aspect ratio than VFN+sliding window, which demonstrates the capability of our model in an intuitive representation. More qualitative results are shown in supplementary materials due to the limit of pages. 5 Conclusion In this paper, we formulated the aesthetic image cropping as a sequential decision-making process and proposed a novel Aesthetics Aware Reinforcement Learning (A2-RL) model to address this problem. With the aesthetics aware reward and LSTM-based state representation which includes both the current and historical experience, our A2-RL model could learn good policies for automatic image cropping. The agent finished cropping process within several or a dozens steps and got cropping window with arbitrary size. Experiments on several unseen databases showed that our model can achieve the state-of-the-art cropping accuracy with much fewer candidate windows and much less time. References Caicedo, J. C., and Lazebnik, S Active object localization with deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision,

8 Chen, J.; Bai, G.; Liang, S.; and Li, Z Automatic image cropping: A computational complexity study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Chen, Y.-L.; Huang, T.-W.; Chang, K.-H.; Tsai, Y.-C.; Chen, H.-T.; and Chen, B.-Y. 2017a. Quantitative analysis of automatic image cropping algorithms: A dataset and comparative study. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Chen, Y.-L.; Klopp, J.; Sun, M.; Chien, S.-Y.; and Ma, K.-L. 2017b. Learning to compose with professional photographs on the web. arxiv preprint arxiv: Datta, R.; Joshi, D.; Li, J.; and Wang, J. Z Studying aesthetics in photographic images using a computational approach. In European Conference on Computer Vision, Springer. Deng, Y.; Loy, C. C.; and Tang, X Image aesthetic assessment: An experimental survey. IEEE Signal Processing Magazine 34(4): Dhar, S.; Ordonez, V.; and Berg, T. L High level describable attributes for predicting aesthetics and interestingness. In 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Fang, C.; Lin, Z.; Mech, R.; and Shen, X Automatic image cropping using visual composition, boundary simplicity and content preservation models. In Proceedings of the 22nd ACM international conference on Multimedia, Jie, Z.; Liang, X.; Feng, J.; Jin, X.; Lu, W.; and Yan, S Tree-structured reinforcement learning for sequential object localization. In Advances in Neural Information Processing Systems, Kao, Y.; He, R.; and Huang, K Automatic image cropping with aesthetic map and gradient energy map. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Ke, Y.; Tang, X.; and Jing, F The design of high-level features for photo quality assessment. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 1, Kong, S.; Shen, X.; Lin, Z.; Mech, R.; and Fowlkes, C Photo aesthetics ranking network with attributes and content adaptation. In European Conference on Computer Vision, Springer. Krizhevsky, A.; Sutskever, I.; and Hinton, G. E Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, Liang, X.; Lee, L.; and Xing, E. P Deep variationstructured reinforcement learning for visual relationship and attribute detection. arxiv preprint arxiv: Luo, W.; Wang, X.; and Tang, X Content-based photo quality assessment. In 2011 IEEE International Conference on Computer Vision (ICCV), Mai, L.; Jin, H.; and Liu, F Composition-preserving deep photo aesthetics assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, Murray, N.; Marchesotti, L.; and Perronnin, F Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nishiyama, M.; Okabe, T.; Sato, Y.; and Sato, I Sensation-based photo cropping. In Proceedings of the 17th ACM international conference on Multimedia, ACM. Park, J.; Lee, J.-Y.; Tai, Y.-W.; and Kweon, I. S Modeling photo composition and its application to photo rearrangement. In th IEEE International Conference on Image Processing (ICIP), Ren, S.; He, K.; Girshick, R.; and Sun, J Faster R- CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems (NIPS). Ren, Z.; Wang, X.; Zhang, N.; Lv, X.; and Li, L.-J Deep reinforcement learning-based image captioning with embedding reward. arxiv preprint arxiv: Stentiford, F Attention based auto image cropping. In Workshop on Computational Attention and Applications, ICVS, volume 1. Suh, B.; Ling, H.; Bederson, B. B.; and Jacobs, D. W Automatic thumbnail cropping and its effectiveness. In Proceedings of the 16th annual ACM symposium on User interface software and technology, Tieleman, T., and Hinton, G Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4(2): Vig, E.; Dorr, M.; and Cox, D Large-scale optimization of hierarchical features for saliency prediction in natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Williams, R. J., and Peng, J Function optimization using connectionist reinforcement learning algorithms. Connection Science 3(3): Yan, J.; Lin, S.; Bing Kang, S.; and Tang, X Learning the change for automatic image cropping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

arxiv: v3 [cs.cv] 12 Mar 2018

arxiv: v3 [cs.cv] 12 Mar 2018 A2-RL: Aesthetics Aware Reinforcement Learning for Image Cropping Debang Li 1,2, Huikai Wu 1,2, Junge Zhang 1,2, Kaiqi Huang 1,2,3 1 CRIPAC & NLPR, Institute of Automation, Chinese Academy of Sciences,

More information

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts

Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts Automatic Image Cropping and Selection using Saliency: an Application to Historical Manuscripts Marcella Cornia, Stefano Pini, Lorenzo Baraldi, and Rita Cucchiara University of Modena and Reggio Emilia

More information

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS Kuan-Chuan Peng and Tsuhan Chen Cornell University School of Electrical and Computer Engineering Ithaca, NY 14850

More information

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising Peng Liu University of Florida pliu1@ufl.edu Ruogu Fang University of Florida ruogu.fang@bme.ufl.edu arxiv:177.9135v1 [cs.cv]

More information

A Geometry-Sensitive Approach for Photographic Style Classification

A Geometry-Sensitive Approach for Photographic Style Classification A Geometry-Sensitive Approach for Photographic Style Classification Koustav Ghosal 1, Mukta Prasad 1,2, and Aljosa Smolic 1 1 V-SENSE, School of Computer Science and Statistics, Trinity College Dublin

More information

arxiv: v1 [cs.cv] 22 Oct 2017

arxiv: v1 [cs.cv] 22 Oct 2017 Deep Cropping via Attention Box Prediction and Aesthetics Assessment Wenguan Wang, and Jianbing Shen Beijing Lab of Intelligent Information Technology, School of Computer Science, Beijing Institute of

More information

arxiv: v1 [cs.cv] 5 Jan 2017

arxiv: v1 [cs.cv] 5 Jan 2017 Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study Yi-Ling Chen 1,2 Tzu-Wei Huang 3 Kai-Han Chang 2 Yu-Chen Tsai 2 Hwann-Tzong Chen 3 Bing-Yu Chen 2 1 University

More information

Colorful Image Colorizations Supplementary Material

Colorful Image Colorizations Supplementary Material Colorful Image Colorizations Supplementary Material Richard Zhang, Phillip Isola, Alexei A. Efros {rich.zhang, isola, efros}@eecs.berkeley.edu University of California, Berkeley 1 Overview This document

More information

Continuous Gesture Recognition Fact Sheet

Continuous Gesture Recognition Fact Sheet Continuous Gesture Recognition Fact Sheet August 17, 2016 1 Team details Team name: ICT NHCI Team leader name: Xiujuan Chai Team leader address, phone number and email Address: No.6 Kexueyuan South Road

More information

Research on Hand Gesture Recognition Using Convolutional Neural Network

Research on Hand Gesture Recognition Using Convolutional Neural Network Research on Hand Gesture Recognition Using Convolutional Neural Network Tian Zhaoyang a, Cheng Lee Lung b a Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China E-mail address:

More information

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. ECE 289G: Paper Presentation #3 Philipp Gysel DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition ECE 289G: Paper Presentation #3 Philipp Gysel Autonomous Car ECE 289G Paper Presentation, Philipp Gysel Slide 2 Source: maps.google.com

More information

arxiv: v1 [cs.lg] 2 Jan 2018

arxiv: v1 [cs.lg] 2 Jan 2018 Deep Learning for Identifying Potential Conceptual Shifts for Co-creative Drawing arxiv:1801.00723v1 [cs.lg] 2 Jan 2018 Pegah Karimi pkarimi@uncc.edu Kazjon Grace The University of Sydney Sydney, NSW 2006

More information

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness

Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness Travel Photo Album Summarization based on Aesthetic quality, Interestingness, and Memorableness Jun-Hyuk Kim and Jong-Seok Lee School of Integrated Technology and Yonsei Institute of Convergence Technology

More information

RAPID: Rating Pictorial Aesthetics using Deep Learning

RAPID: Rating Pictorial Aesthetics using Deep Learning RAPID: Rating Pictorial Aesthetics using Deep Learning Xin Lu 1 Zhe Lin 2 Hailin Jin 2 Jianchao Yang 2 James Z. Wang 1 1 The Pennsylvania State University 2 Adobe Research {xinlu, jwang}@psu.edu, {zlin,

More information

TRANSFORMING PHOTOS TO COMICS USING CONVOLUTIONAL NEURAL NETWORKS. Tsinghua University, China Cardiff University, UK

TRANSFORMING PHOTOS TO COMICS USING CONVOLUTIONAL NEURAL NETWORKS. Tsinghua University, China Cardiff University, UK TRANSFORMING PHOTOS TO COMICS USING CONVOUTIONA NEURA NETWORKS Yang Chen Yu-Kun ai Yong-Jin iu Tsinghua University, China Cardiff University, UK ABSTRACT In this paper, inspired by Gatys s recent work,

More information

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation Mohamed Samy 1 Karim Amer 1 Kareem Eissa Mahmoud Shaker Mohamed ElHelw Center for Informatics Science Nile

More information

THE aesthetic quality of an image is judged by commonly

THE aesthetic quality of an image is judged by commonly 1 Image Aesthetic Assessment: An Experimental Survey Yubin Deng, Chen Change Loy, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1610.00838v1 [cs.cv] 4 Oct 2016 Abstract This survey aims at reviewing

More information

AUGMENTED CONVOLUTIONAL FEATURE MAPS FOR ROBUST CNN-BASED CAMERA MODEL IDENTIFICATION. Belhassen Bayar and Matthew C. Stamm

AUGMENTED CONVOLUTIONAL FEATURE MAPS FOR ROBUST CNN-BASED CAMERA MODEL IDENTIFICATION. Belhassen Bayar and Matthew C. Stamm AUGMENTED CONVOLUTIONAL FEATURE MAPS FOR ROBUST CNN-BASED CAMERA MODEL IDENTIFICATION Belhassen Bayar and Matthew C. Stamm Department of Electrical and Computer Engineering, Drexel University, Philadelphia,

More information

Semantic Segmentation on Resource Constrained Devices

Semantic Segmentation on Resource Constrained Devices Semantic Segmentation on Resource Constrained Devices Sachin Mehta University of Washington, Seattle In collaboration with Mohammad Rastegari, Anat Caspi, Linda Shapiro, and Hannaneh Hajishirzi Project

More information

Biologically Inspired Computation

Biologically Inspired Computation Biologically Inspired Computation Deep Learning & Convolutional Neural Networks Joe Marino biologically inspired computation biological intelligence flexible capable of detecting/ executing/reasoning about

More information

ASSESSING PHOTO QUALITY WITH GEO-CONTEXT AND CROWDSOURCED PHOTOS

ASSESSING PHOTO QUALITY WITH GEO-CONTEXT AND CROWDSOURCED PHOTOS ASSESSING PHOTO QUALITY WITH GEO-CONTEXT AND CROWDSOURCED PHOTOS Wenyuan Yin, Tao Mei, Chang Wen Chen State University of New York at Buffalo, NY, USA Microsoft Research Asia, Beijing, P. R. China ABSTRACT

More information

Liangliang Cao *, Jiebo Luo +, Thomas S. Huang *

Liangliang Cao *, Jiebo Luo +, Thomas S. Huang * Annotating ti Photo Collections by Label Propagation Liangliang Cao *, Jiebo Luo +, Thomas S. Huang * + Kodak Research Laboratories *University of Illinois at Urbana-Champaign (UIUC) ACM Multimedia 2008

More information

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology ISSN: 2454-132X Impact factor: 4.295 (Volume 4, Issue 1) Available online at www.ijariit.com Hand Detection and Gesture Recognition in Real-Time Using Haar-Classification and Convolutional Neural Networks

More information

Can you tell a face from a HEVC bitstream?

Can you tell a face from a HEVC bitstream? Can you tell a face from a HEVC bitstream? Saeed Ranjbar Alvar, Hyomin Choi and Ivan V. Bajić School of Engineering Science, Simon Fraser University, Burnaby, BC, Canada Email: {saeedr,chyomin, ibajic}@sfu.ca

More information

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL

VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL VISUAL ANALOGIES BETWEEN ATARI GAMES FOR STUDYING TRANSFER LEARNING IN RL Doron Sobol 1, Lior Wolf 1,2 & Yaniv Taigman 2 1 School of Computer Science, Tel-Aviv University 2 Facebook AI Research ABSTRACT

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Deep Learning Barnabás Póczos Credits Many of the pictures, results, and other materials are taken from: Ruslan Salakhutdinov Joshua Bengio Geoffrey Hinton Yann LeCun 2

More information

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION Journal of Advanced College of Engineering and Management, Vol. 3, 2017 DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION Anil Bhujel 1, Dibakar Raj Pant 2 1 Ministry of Information and

More information

arxiv: v1 [cs.ce] 9 Jan 2018

arxiv: v1 [cs.ce] 9 Jan 2018 Predict Forex Trend via Convolutional Neural Networks Yun-Cheng Tsai, 1 Jun-Hao Chen, 2 Jun-Jie Wang 3 arxiv:1801.03018v1 [cs.ce] 9 Jan 2018 1 Center for General Education 2,3 Department of Computer Science

More information

Camera Model Identification With The Use of Deep Convolutional Neural Networks

Camera Model Identification With The Use of Deep Convolutional Neural Networks Camera Model Identification With The Use of Deep Convolutional Neural Networks Amel TUAMA 2,3, Frédéric COMBY 2,3, and Marc CHAUMONT 1,2,3 (1) University of Nîmes, France (2) University Montpellier, France

More information

Multi-task Learning of Dish Detection and Calorie Estimation

Multi-task Learning of Dish Detection and Calorie Estimation Multi-task Learning of Dish Detection and Calorie Estimation Department of Informatics, The University of Electro-Communications, Tokyo 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585 JAPAN ABSTRACT In recent

More information

Augmenting Self-Learning In Chess Through Expert Imitation

Augmenting Self-Learning In Chess Through Expert Imitation Augmenting Self-Learning In Chess Through Expert Imitation Michael Xie Department of Computer Science Stanford University Stanford, CA 94305 xie@cs.stanford.edu Gene Lewis Department of Computer Science

More information

International Journal of Advance Engineering and Research Development

International Journal of Advance Engineering and Research Development Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 6, June -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Aesthetic

More information

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING

GESTURE RECOGNITION FOR ROBOTIC CONTROL USING DEEP LEARNING 2017 NDIA GROUND VEHICLE SYSTEMS ENGINEERING AND TECHNOLOGY SYMPOSIUM AUTONOMOUS GROUND SYSTEMS (AGS) TECHNICAL SESSION AUGUST 8-10, 2017 - NOVI, MICHIGAN GESTURE RECOGNITION FOR ROBOTIC CONTROL USING

More information

A Deep-Learning-Based Fashion Attributes Detection Model

A Deep-Learning-Based Fashion Attributes Detection Model A Deep-Learning-Based Fashion Attributes Detection Model Menglin Jia Yichen Zhou Mengyun Shi Bharath Hariharan Cornell University {mj493, yz888, ms2979}@cornell.edu, harathh@cs.cornell.edu 1 Introduction

More information

Photo Quality Assessment based on a Focusing Map to Consider Shallow Depth of Field

Photo Quality Assessment based on a Focusing Map to Consider Shallow Depth of Field Photo Quality Assessment based on a Focusing Map to Consider Shallow Depth of Field Dong-Sung Ryu, Sun-Young Park, Hwan-Gue Cho Dept. of Computer Science and Engineering, Pusan National University, Geumjeong-gu

More information

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 - Lecture 11: Detection and Segmentation Lecture 11-1 May 10, 2017 Administrative Midterms being graded Please don t discuss midterms until next week - some students not yet taken A2 being graded Project

More information

LIGHT FIELD (LF) imaging [2] has recently come into

LIGHT FIELD (LF) imaging [2] has recently come into SUBMITTED TO IEEE SIGNAL PROCESSING LETTERS 1 Light Field Image Super-Resolution using Convolutional Neural Network Youngjin Yoon, Student Member, IEEE, Hae-Gon Jeon, Student Member, IEEE, Donggeun Yoo,

More information

Automatic Aesthetic Photo-Rating System

Automatic Aesthetic Photo-Rating System Automatic Aesthetic Photo-Rating System Chen-Tai Kao chentai@stanford.edu Hsin-Fang Wu hfwu@stanford.edu Yen-Ting Liu eggegg@stanford.edu ABSTRACT Growing prevalence of smartphone makes photography easier

More information

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16 A Fuller Understanding of Fully Convolutional Networks Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16 1 pixels in, pixels out colorization Zhang et al.2016 monocular depth

More information

Creating an Agent of Doom: A Visual Reinforcement Learning Approach

Creating an Agent of Doom: A Visual Reinforcement Learning Approach Creating an Agent of Doom: A Visual Reinforcement Learning Approach Michael Lowney Department of Electrical Engineering Stanford University mlowney@stanford.edu Robert Mahieu Department of Electrical Engineering

More information

Vehicle Color Recognition using Convolutional Neural Network

Vehicle Color Recognition using Convolutional Neural Network Vehicle Color Recognition using Convolutional Neural Network Reza Fuad Rachmadi and I Ketut Eddy Purnama Multimedia and Network Engineering Department, Institut Teknologi Sepuluh Nopember, Keputih Sukolilo,

More information

Driving Using End-to-End Deep Learning

Driving Using End-to-End Deep Learning Driving Using End-to-End Deep Learning Farzain Majeed farza@knights.ucf.edu Kishan Athrey kishan.athrey@knights.ucf.edu Dr. Mubarak Shah shah@crcv.ucf.edu Abstract This work explores the problem of autonomously

More information

Image Resizing based on Summarization by Seam Carving using saliency detection to extract image semantics

Image Resizing based on Summarization by Seam Carving using saliency detection to extract image semantics Image Resizing based on Summarization by Seam Carving using saliency detection to extract image semantics 1 Priyanka Dighe, Prof. Shanthi Guru 2 1 Department of Computer Engg. DYPCOE, Akurdi, Pune 2 Department

More information

Lecture 23 Deep Learning: Segmentation

Lecture 23 Deep Learning: Segmentation Lecture 23 Deep Learning: Segmentation COS 429: Computer Vision Thanks: most of these slides shamelessly adapted from Stanford CS231n: Convolutional Neural Networks for Visual Recognition Fei-Fei Li, Andrej

More information

Convolutional Neural Network-Based Infrared Image Super Resolution Under Low Light Environment

Convolutional Neural Network-Based Infrared Image Super Resolution Under Low Light Environment Convolutional Neural Network-Based Infrared Super Resolution Under Low Light Environment Tae Young Han, Yong Jun Kim, Byung Cheol Song Department of Electronic Engineering Inha University Incheon, Republic

More information

The use of a cast to generate person-biased photo-albums

The use of a cast to generate person-biased photo-albums The use of a cast to generate person-biased photo-albums Dave Grosvenor Media Technologies Laboratory HP Laboratories Bristol HPL-2007-12 February 5, 2007* photo-album, cast, person recognition, person

More information

AVA: A Large-Scale Database for Aesthetic Visual Analysis

AVA: A Large-Scale Database for Aesthetic Visual Analysis 1 AVA: A Large-Scale Database for Aesthetic Visual Analysis Wei-Ta Chu National Chung Cheng University N. Murray, L. Marchesotti, and F. Perronnin, AVA: A Large-Scale Database for Aesthetic Visual Analysis,

More information

GPU ACCELERATED DEEP LEARNING WITH CUDNN

GPU ACCELERATED DEEP LEARNING WITH CUDNN GPU ACCELERATED DEEP LEARNING WITH CUDNN Larry Brown Ph.D. March 2015 AGENDA 1 Introducing cudnn and GPUs 2 Deep Learning Context 3 cudnn V2 4 Using cudnn 2 Introducing cudnn and GPUs 3 HOW GPU ACCELERATION

More information

Selective Detail Enhanced Fusion with Photocropping

Selective Detail Enhanced Fusion with Photocropping IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 11 April 2015 ISSN (online): 2349-6010 Selective Detail Enhanced Fusion with Photocropping Roopa Teena Johnson

More information

LANDMARK recognition is an important feature for

LANDMARK recognition is an important feature for 1 NU-LiteNet: Mobile Landmark Recognition using Convolutional Neural Networks Chakkrit Termritthikun, Surachet Kanprachar, Paisarn Muneesawang arxiv:1810.01074v1 [cs.cv] 2 Oct 2018 Abstract The growth

More information

Radio Deep Learning Efforts Showcase Presentation

Radio Deep Learning Efforts Showcase Presentation Radio Deep Learning Efforts Showcase Presentation November 2016 hume@vt.edu www.hume.vt.edu Tim O Shea Senior Research Associate Program Overview Program Objective: Rethink fundamental approaches to how

More information

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho Learning to Predict Indoor Illumination from a Single Image Chih-Hui Ho 1 Outline Introduction Method Overview LDR Panorama Light Source Detection Panorama Recentering Warp Learning From LDR Panoramas

More information

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab. 김강일

신경망기반자동번역기술. Konkuk University Computational Intelligence Lab.  김강일 신경망기반자동번역기술 Konkuk University Computational Intelligence Lab. http://ci.konkuk.ac.kr kikim01@kunkuk.ac.kr 김강일 Index Issues in AI and Deep Learning Overview of Machine Translation Advanced Techniques in

More information

Deep learning architectures for music audio classification: a personal (re)view

Deep learning architectures for music audio classification: a personal (re)view Deep learning architectures for music audio classification: a personal (re)view Jordi Pons jordipons.me @jordiponsdotme Music Technology Group Universitat Pompeu Fabra, Barcelona Acronyms MLP: multi layer

More information

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO

11/13/18. Introduction to RNNs for NLP. About Me. Overview SHANG GAO Introduction to RNNs for NLP SHANG GAO About Me PhD student in the Data Science and Engineering program Took Deep Learning last year Work in the Biomedical Sciences, Engineering, and Computing group at

More information

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING

REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING REINFORCEMENT LEARNING (DD3359) O-03 END-TO-END LEARNING RIKA ANTONOVA ANTONOVA@KTH.SE ALI GHADIRZADEH ALGH@KTH.SE RL: What We Know So Far Formulate the problem as an MDP (or POMDP) State space captures

More information

WITH continuous miniaturization of silicon technology

WITH continuous miniaturization of silicon technology IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. X., X. 8, MONTH 20XX 1 Leveraging expert feature knowledge for predicting image aesthetics Michal Kucer, Student Member, IEEE, Alexander C. Loui, Fellow, IEEE,

More information

Image Manipulation Detection using Convolutional Neural Network

Image Manipulation Detection using Convolutional Neural Network Image Manipulation Detection using Convolutional Neural Network Dong-Hyun Kim 1 and Hae-Yeoun Lee 2,* 1 Graduate Student, 2 PhD, Professor 1,2 Department of Computer Software Engineering, Kumoh National

More information

Video Object Segmentation with Re-identification

Video Object Segmentation with Re-identification Video Object Segmentation with Re-identification Xiaoxiao Li, Yuankai Qi, Zhe Wang, Kai Chen, Ziwei Liu, Jianping Shi Ping Luo, Chen Change Loy, Xiaoou Tang The Chinese University of Hong Kong, SenseTime

More information

arxiv: v1 [cs.cv] 15 Apr 2016

arxiv: v1 [cs.cv] 15 Apr 2016 High-performance Semantic Segmentation Using Very Deep Fully Convolutional Networks arxiv:1604.04339v1 [cs.cv] 15 Apr 2016 Zifeng Wu, Chunhua Shen, Anton van den Hengel The University of Adelaide, SA 5005,

More information

AUDIO TAGGING WITH CONNECTIONIST TEMPORAL CLASSIFICATION MODEL USING SEQUENTIAL LABELLED DATA

AUDIO TAGGING WITH CONNECTIONIST TEMPORAL CLASSIFICATION MODEL USING SEQUENTIAL LABELLED DATA AUDIO TAGGING WITH CONNECTIONIST TEMPORAL CLASSIFICATION MODEL USING SEQUENTIAL LABELLED DATA Yuanbo Hou 1, Qiuqiang Kong 2 and Shengchen Li 1 Abstract. Audio tagging aims to predict one or several labels

More information

Semantic Segmentation in Red Relief Image Map by UX-Net

Semantic Segmentation in Red Relief Image Map by UX-Net Semantic Segmentation in Red Relief Image Map by UX-Net Tomoya Komiyama 1, Kazuhiro Hotta 1, Kazuo Oda 2, Satomi Kakuta 2 and Mikako Sano 2 1 Meijo University, Shiogamaguchi, 468-0073, Nagoya, Japan 2

More information

THE aesthetic quality of an image is judged by commonly

THE aesthetic quality of an image is judged by commonly 1 Image Aesthetic Assessment: An Experimental Survey Yubin Deng, Chen Change Loy, Member, IEEE, and Xiaoou Tang, Fellow, IEEE arxiv:1610.00838v2 [cs.cv] 20 Apr 2017 Abstract This survey aims at reviewing

More information

3D-Assisted Image Feature Synthesis for Novel Views of an Object

3D-Assisted Image Feature Synthesis for Novel Views of an Object 3D-Assisted Image Feature Synthesis for Novel Views of an Object Hao Su* Fan Wang* Li Yi Leonidas Guibas * Equal contribution View-agnostic Image Retrieval Retrieval using AlexNet features Query Cross-view

More information

Deep Neural Network Architectures for Modulation Classification

Deep Neural Network Architectures for Modulation Classification Deep Neural Network Architectures for Modulation Classification Xiaoyu Liu, Diyu Yang, and Aly El Gamal School of Electrical and Computer Engineering Purdue University Email: {liu1962, yang1467, elgamala}@purdue.edu

More information

Global Contrast Enhancement Detection via Deep Multi-Path Network

Global Contrast Enhancement Detection via Deep Multi-Path Network Global Contrast Enhancement Detection via Deep Multi-Path Network Cong Zhang, Dawei Du, Lipeng Ke, Honggang Qi School of Computer and Control Engineering University of Chinese Academy of Sciences, Beijing,

More information

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect

tsushi Sasaki Fig. Flow diagram of panel structure recognition by specifying peripheral regions of each component in rectangles, and 3 types of detect RECOGNITION OF NEL STRUCTURE IN COMIC IMGES USING FSTER R-CNN Hideaki Yanagisawa Hiroshi Watanabe Graduate School of Fundamental Science and Engineering, Waseda University BSTRCT For efficient e-comics

More information

Hand Gesture Recognition by Means of Region- Based Convolutional Neural Networks

Hand Gesture Recognition by Means of Region- Based Convolutional Neural Networks Contemporary Engineering Sciences, Vol. 10, 2017, no. 27, 1329-1342 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ces.2017.710154 Hand Gesture Recognition by Means of Region- Based Convolutional

More information

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired

Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired 1 Mobile Cognitive Indoor Assistive Navigation for the Visually Impaired Bing Li 1, Manjekar Budhai 2, Bowen Xiao 3, Liang Yang 1, Jizhong Xiao 1 1 Department of Electrical Engineering, The City College,

More information

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems Emeric Stéphane Boigné eboigne@stanford.edu Jan Felix Heyse heyse@stanford.edu Abstract Scaling

More information

arxiv: v2 [cs.cv] 28 Mar 2017

arxiv: v2 [cs.cv] 28 Mar 2017 License Plate Detection and Recognition Using Deeply Learned Convolutional Neural Networks Syed Zain Masood Guang Shu Afshin Dehghan Enrique G. Ortiz {zainmasood, guangshu, afshindehghan, egortiz}@sighthound.com

More information

Counterfeit Bill Detection Algorithm using Deep Learning

Counterfeit Bill Detection Algorithm using Deep Learning Counterfeit Bill Detection Algorithm using Deep Learning Soo-Hyeon Lee 1 and Hae-Yeoun Lee 2,* 1 Undergraduate Student, 2 Professor 1,2 Department of Computer Software Engineering, Kumoh National Institute

More information

arxiv: v1 [cs.cv] 27 Nov 2016

arxiv: v1 [cs.cv] 27 Nov 2016 Real-Time Video Highlights for Yahoo Esports arxiv:1611.08780v1 [cs.cv] 27 Nov 2016 Yale Song Yahoo Research New York, USA yalesong@yahoo-inc.com Abstract Esports has gained global popularity in recent

More information

A New Framework for Supervised Speech Enhancement in the Time Domain

A New Framework for Supervised Speech Enhancement in the Time Domain Interspeech 2018 2-6 September 2018, Hyderabad A New Framework for Supervised Speech Enhancement in the Time Domain Ashutosh Pandey 1 and Deliang Wang 1,2 1 Department of Computer Science and Engineering,

More information

Video Encoder Optimization for Efficient Video Analysis in Resource-limited Systems

Video Encoder Optimization for Efficient Video Analysis in Resource-limited Systems Video Encoder Optimization for Efficient Video Analysis in Resource-limited Systems R.M.T.P. Rajakaruna, W.A.C. Fernando, Member, IEEE and J. Calic, Member, IEEE, Abstract Performance of real-time video

More information

Integrated Digital System for Yarn Surface Quality Evaluation using Computer Vision and Artificial Intelligence

Integrated Digital System for Yarn Surface Quality Evaluation using Computer Vision and Artificial Intelligence Integrated Digital System for Yarn Surface Quality Evaluation using Computer Vision and Artificial Intelligence Sheng Yan LI, Jie FENG, Bin Gang XU, and Xiao Ming TAO Institute of Textiles and Clothing,

More information

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material

Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material Synthetic View Generation for Absolute Pose Regression and Image Synthesis: Supplementary material Pulak Purkait 1 pulak.cv@gmail.com Cheng Zhao 2 irobotcheng@gmail.com Christopher Zach 1 christopher.m.zach@gmail.com

More information

Learning to Understand Image Blur

Learning to Understand Image Blur Learning to Understand Image Blur Shanghang Zhang, Xiaohui Shen, Zhe Lin, Radomír Měch, João P. Costeira, José M. F. Moura Carnegie Mellon University Adobe Research ISR - IST, Universidade de Lisboa {shanghaz,

More information

arxiv: v2 [cs.cv] 11 Oct 2016

arxiv: v2 [cs.cv] 11 Oct 2016 Xception: Deep Learning with Depthwise Separable Convolutions arxiv:1610.02357v2 [cs.cv] 11 Oct 2016 François Chollet Google, Inc. fchollet@google.com Monday 10 th October, 2016 Abstract We present an

More information

Tutorial of Reinforcement: A Special Focus on Q-Learning

Tutorial of Reinforcement: A Special Focus on Q-Learning Tutorial of Reinforcement: A Special Focus on Q-Learning TINGWU WANG, MACHINE LEARNING GROUP, UNIVERSITY OF TORONTO Contents 1. Introduction 1. Discrete Domain vs. Continous Domain 2. Model Based vs. Model

More information

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni. Lesson 08 Convolutional Neural Network Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni Lesson 08 Convolution we will consider 2D convolution the result

More information

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS Bulletin of the Transilvania University of Braşov Vol. 10 (59) No. 2-2017 Series I: Engineering Sciences ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS E. HORVÁTH 1 C. POZNA 2 Á. BALLAGI 3

More information

Supplementary Material: Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs

Supplementary Material: Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs Supplementary Material: Deep Photo Enhancer: Unpaired Learning for Image Enhancement from Photographs with GANs Yu-Sheng Chen Yu-Ching Wang Man-Hsin Kao Yung-Yu Chuang National Taiwan University 1 More

More information

Playing Atari Games with Deep Reinforcement Learning

Playing Atari Games with Deep Reinforcement Learning Playing Atari Games with Deep Reinforcement Learning 1 Playing Atari Games with Deep Reinforcement Learning Varsha Lalwani (varshajn@iitk.ac.in) Masare Akshay Sunil (amasare@iitk.ac.in) IIT Kanpur CS365A

More information

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB S. Kajan, J. Goga Institute of Robotics and Cybernetics, Faculty of Electrical Engineering and Information Technology, Slovak University

More information

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks

Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Attention-based Multi-Encoder-Decoder Recurrent Neural Networks Stephan Baier 1, Sigurd Spieckermann 2 and Volker Tresp 1,2 1- Ludwig Maximilian University Oettingenstr. 67, Munich, Germany 2- Siemens

More information

Hash Function Learning via Codewords

Hash Function Learning via Codewords Hash Function Learning via Codewords 2015 ECML/PKDD, Porto, Portugal, September 7 11, 2015. Yinjie Huang 1 Michael Georgiopoulos 1 Georgios C. Anagnostopoulos 2 1 Machine Learning Laboratory, University

More information

Playing CHIP-8 Games with Reinforcement Learning

Playing CHIP-8 Games with Reinforcement Learning Playing CHIP-8 Games with Reinforcement Learning Niven Achenjang, Patrick DeMichele, Sam Rogers Stanford University Abstract We begin with some background in the history of CHIP-8 games and the use of

More information

Visual Quality Assessment for Projected Content

Visual Quality Assessment for Projected Content Visual Quality Assessment for Projected Content Hoang Le, Carl Marshall 2, Thong Doan, Long Mai, Feng Liu Portland State University 2 Intel Corporation Portland, OR USA Hillsboro, OR USA {hoanl, thong,

More information

Impact of Automatic Feature Extraction in Deep Learning Architecture

Impact of Automatic Feature Extraction in Deep Learning Architecture Impact of Automatic Feature Extraction in Deep Learning Architecture Fatma Shaheen, Brijesh Verma and Md Asafuddoula Centre for Intelligent Systems Central Queensland University, Brisbane, Australia {f.shaheen,

More information

Compact Deep Convolutional Neural Networks for Image Classification

Compact Deep Convolutional Neural Networks for Image Classification 1 Compact Deep Convolutional Neural Networks for Image Classification Zejia Zheng, Zhu Li, Abhishek Nagar 1 and Woosung Kang 2 Abstract Convolutional Neural Network is efficient in learning hierarchical

More information

Park Smart. D. Di Mauro 1, M. Moltisanti 2, G. Patanè 2, S. Battiato 1, G. M. Farinella 1. Abstract. 1. Introduction

Park Smart. D. Di Mauro 1, M. Moltisanti 2, G. Patanè 2, S. Battiato 1, G. M. Farinella 1. Abstract. 1. Introduction Park Smart D. Di Mauro 1, M. Moltisanti 2, G. Patanè 2, S. Battiato 1, G. M. Farinella 1 1 Department of Mathematics and Computer Science University of Catania {dimauro,battiato,gfarinella}@dmi.unict.it

More information

Convolutional Neural Networks for Small-footprint Keyword Spotting

Convolutional Neural Networks for Small-footprint Keyword Spotting INTERSPEECH 2015 Convolutional Neural Networks for Small-footprint Keyword Spotting Tara N. Sainath, Carolina Parada Google, Inc. New York, NY, U.S.A {tsainath, carolinap}@google.com Abstract We explore

More information

Palmprint Recognition Based on Deep Convolutional Neural Networks

Palmprint Recognition Based on Deep Convolutional Neural Networks 2018 2nd International Conference on Computer Science and Intelligent Communication (CSIC 2018) Palmprint Recognition Based on Deep Convolutional Neural Networks Xueqiu Dong1, a, *, Liye Mei1, b, and Junhua

More information

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c Exploring the effects of transducer models when training convolutional neural networks to eliminate reflection artifacts in experimental photoacoustic images Derek Allman a, Austin Reiter b, and Muyinatu

More information

Classification of Digital Photos Taken by Photographers or Home Users

Classification of Digital Photos Taken by Photographers or Home Users Classification of Digital Photos Taken by Photographers or Home Users Hanghang Tong 1, Mingjing Li 2, Hong-Jiang Zhang 2, Jingrui He 1, and Changshui Zhang 3 1 Automation Department, Tsinghua University,

More information

Size Does Matter: How Image Size Affects Aesthetic Perception?

Size Does Matter: How Image Size Affects Aesthetic Perception? Size Does Matter: How Image Size Affects Aesthetic Perception? Wei-Ta Chu, Yu-Kuang Chen, and Kuan-Ta Chen Department of Computer Science and Information Engineering, National Chung Cheng University Institute

More information

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 78

Recognition: Overview. Sanja Fidler CSC420: Intro to Image Understanding 1/ 78 Recognition: Overview Sanja Fidler CSC420: Intro to Image Understanding 1/ 78 Textbook This book has a lot of material: K. Grauman and B. Leibe Visual Object Recognition Synthesis Lectures On Computer

More information

Convolutional Neural Network-based Steganalysis on Spatial Domain

Convolutional Neural Network-based Steganalysis on Spatial Domain Convolutional Neural Network-based Steganalysis on Spatial Domain Dong-Hyun Kim, and Hae-Yeoun Lee Abstract Steganalysis has been studied to detect the existence of hidden messages by steganography. However,

More information

Scalable systems for early fault detection in wind turbines: A data driven approach

Scalable systems for early fault detection in wind turbines: A data driven approach Scalable systems for early fault detection in wind turbines: A data driven approach Martin Bach-Andersen 1,2, Bo Rømer-Odgaard 1, and Ole Winther 2 1 Siemens Diagnostic Center, Denmark 2 Cognitive Systems,

More information