arxiv: v1 [stat.ml] 10 Nov 2017

Similar documents
Deep Learning for Infrastructure Assessment in Africa using Remote Sensing Data

Colorful Image Colorizations Supplementary Material

arxiv: v1 [cs.lg] 2 Jan 2018

Machine Learning and Decision Making for Sustainability

CROSS-LAYER FEATURES IN CONVOLUTIONAL NEURAL NETWORKS FOR GENERIC CLASSIFICATION TASKS. Kuan-Chuan Peng and Tsuhan Chen

NU-Net: Deep Residual Wide Field of View Convolutional Neural Network for Semantic Segmentation

An Introduction to Convolutional Neural Networks. Alessandro Giusti Dalle Molle Institute for Artificial Intelligence Lugano, Switzerland

Semantic Segmentation on Resource Constrained Devices

Lecture 23 Deep Learning: Segmentation

Tiny ImageNet Challenge Investigating the Scaling of Inception Layers for Reduced Scale Classification Problems

DYNAMIC CONVOLUTIONAL NEURAL NETWORK FOR IMAGE SUPER- RESOLUTION

Detection and Segmentation. Fei-Fei Li & Justin Johnson & Serena Yeung. Lecture 11 -

A Fuller Understanding of Fully Convolutional Networks. Evan Shelhamer* Jonathan Long* Trevor Darrell UC Berkeley in CVPR'15, PAMI'16

Road detection with EOSResUNet and post vectorizing algorithm

Learning Pixel-Distribution Prior with Wider Convolution for Image Denoising

Convolutional Networks for Image Segmentation: U-Net 1, DeconvNet 2, and SegNet 3

VALIDATION OF THE CLOUD AND CLOUD SHADOW ASSESSMENT SYSTEM FOR LANDSAT IMAGERY (CASA-L VERSION 1.3)

Convolutional neural networks

Introduction to Machine Learning

DEEP LEARNING ON RF DATA. Adam Thompson Senior Solutions Architect March 29, 2018

Rapid Computer Vision-Aided Disaster Response via Fusion of Multiresolution, Multisensor, and Multitemporal Satellite Imagery

Table of contents. Vision industrielle 2002/2003. Local and semi-local smoothing. Linear noise filtering: example. Convolution: introduction

Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks

CS 7643: Deep Learning

arxiv: v3 [cs.cv] 18 Dec 2018

Land Cover Analysis to Determine Areas of Clear-cut and Forest Cover in Olney, Montana. Geob 373 Remote Sensing. Dr Andreas Varhola, Kathry De Rego

Suneel Marthi Jose Luis Contreras. June 11, 2018 Berlin Buzzwords, Berlin, Germany

JUMPSTARTING NEURAL NETWORK TRAINING FOR SEISMIC PROBLEMS

Deep Neural Network Architectures for Modulation Classification

arxiv: v1 [cs.cv] 19 Jun 2017

Convolutional Neural Network-based Steganalysis on Spatial Domain

Preparing Remote Sensing Data for Natural Resources Mapping (image enhancement, rectifications )

Land Remote Sensing Lab 4: Classication and Change Detection Assigned: October 15, 2017 Due: October 27, Classication

An Introduction to Geomatics. Prepared by: Dr. Maher A. El-Hallaq خاص بطلبة مساق مقدمة في علم. Associate Professor of Surveying IUG

TEMPORAL ANALYSIS OF MULTI EPOCH LANDSAT GEOCOVER IMAGES IN ZONGULDAK TESTFIELD

Deep Learning. Dr. Johan Hagelbäck.

Sommersemester Prof. Dr. Christoph Kleinn Institut für Waldinventur und Waldwachstum Arbeitsbereich Fernerkundung und Waldinventur.

White Paper. Medium Resolution Images and Clutter From Landsat 7 Sources. Pierre Missud

Tracking transmission of details in paintings

INDIAN VEHICLE LICENSE PLATE EXTRACTION AND SEGMENTATION

Lesson 08. Convolutional Neural Network. Ing. Marek Hrúz, Ph.D. Katedra Kybernetiky Fakulta aplikovaných věd Západočeská univerzita v Plzni.

Derek Allman a, Austin Reiter b, and Muyinatu Bell a,c

DSNet: An Efficient CNN for Road Scene Segmentation

Classification Accuracies of Malaria Infected Cells Using Deep Convolutional Neural Networks Based on Decompressed Images

Generating an appropriate sound for a video using WaveNet.

Fully Convolutional Networks for Semantic Segmentation

Correlating Filter Diversity with Convolutional Neural Network Accuracy

AN OBJECT-ORIENTED CLASSIFICATION METHOD ON HIGH RESOLUTION SATELLITE DATA , China -

ROAD RECOGNITION USING FULLY CONVOLUTIONAL NEURAL NETWORKS

Fully Convolutional Network with dilated convolutions for Handwritten

CanImage. (Landsat 7 Orthoimages at the 1: Scale) Standards and Specifications Edition 1.0

arxiv: v1 [cs.ce] 9 Jan 2018

Landsat 8 Pansharpen and Mosaic Geomatica 2015 Tutorial

Automatic processing to restore data of MODIS band 6

SIMULATION-BASED MODEL CONTROL USING STATIC HAND GESTURES IN MATLAB

Image Manipulation Detection using Convolutional Neural Network

Driving Using End-to-End Deep Learning

Image interpretation and analysis

Learning to Predict Indoor Illumination from a Single Image. Chih-Hui Ho

Introduction to Remote Sensing

Author(s) Corr, Philip J.; Silvestre, Guenole C.; Bleakley, Christopher J. The Irish Pattern Recognition & Classification Society

San Diego State University Department of Geography, San Diego, CA. USA b. University of California, Department of Geography, Santa Barbara, CA.

Raster is faster but vector is corrector

IMAGE TYPE WATER METER CHARACTER RECOGNITION BASED ON EMBEDDED DSP

Biologically Inspired Computation

arxiv: v1 [cs.cv] 3 May 2018

Preprocessing and Segregating Offline Gujarati Handwritten Datasheet for Character Recognition

Urban Classification of Metro Manila for Seismic Risk Assessment using Satellite Images

Domain Adaptation & Transfer: All You Need to Use Simulation for Real

A Novel Method for Enhancing Satellite & Land Survey Images Using Color Filter Array Interpolation Technique (CFA)

MSR Asia MSM at ActivityNet Challenge 2017: Trimmed Action Recognition, Temporal Action Proposals and Dense-Captioning Events in Videos

Abstract Quickbird Vs Aerial photos in identifying man-made objects

Understanding Neural Networks : Part II

Removing Thick Clouds in Landsat Images

CS231A Final Project: Who Drew It? Style Analysis on DeviantART

TimeSync V3 User Manual. January Introduction

Image Fusion. Pan Sharpening. Pan Sharpening. Pan Sharpening: ENVI. Multi-spectral and PAN. Magsud Mehdiyev Geoinfomatics Center, AIT

GE 113 REMOTE SENSING

DISTINGUISHING URBAN BUILT-UP AND BARE SOIL FEATURES FROM LANDSAT 8 OLI IMAGERY USING DIFFERENT DEVELOPED BAND INDICES

Remote Sensing. The following figure is grey scale display of SPOT Panchromatic without stretching.

The Normal Baseline. Dick Gent Law of the Sea Division UK Hydrographic Office

Digital Image Processing

Spectral Detection and Localization of Radio Events with Learned Convolutional Neural Features

EFFECTS OF IONOSPHERIC SMALL-SCALE STRUCTURES ON GNSS

Geomatica OrthoEngine v10.2 Tutorial DEM Extraction of GeoEye-1 Data

A Pan-Sharpening Based on the Non-Subsampled Contourlet Transform and Discrete Wavelet Transform

The studies began when the Tiros satellites (1960) provided man s first synoptic view of the Earth s weather systems.

LANDSAT-SPOT DIGITAL IMAGES INTEGRATION USING GEOSTATISTICAL COSIMULATION TECHNIQUES

Evaluation of FLAASH atmospheric correction. Note. Note no SAMBA/10/12. Authors. Øystein Rudjord and Øivind Due Trier

Color Constancy Using Standard Deviation of Color Channels

Lecture 6: Multispectral Earth Resource Satellites. The University at Albany Fall 2018 Geography and Planning

366 Glossary. Popular method for scale drawings in a computer similar to GIS but without the necessity for spatial referencing CEP

Wadehra Kartik, Kathpalia Mukul, Bahl Vasudha, International Journal of Advance Research, Ideas and Innovations in Technology

Convolutional Neural Networks

Classification in Image processing: A Survey

to Geospatial Technologies

Chapter 17. Shape-Based Operations

Semantic Segmentation in Red Relief Image Map by UX-Net

Artifacts Reduced Interpolation Method for Single-Sensor Imaging System

Planet Labs Inc 2017 Page 2

Transcription:

Poverty Prediction with Public Landsat 7 Satellite Imagery and Machine Learning arxiv:1711.03654v1 [stat.ml] 10 Nov 2017 Anthony Perez Department of Computer Science Stanford, CA 94305 aperez8@stanford.edu George Azzari Department of Earth System Science gazzari@stanford.edu David Lobell Department of Earth System Science dlobell@stanford.edu Abstract Christopher Yeh Department of Computer Science Stanford, CA 94305 chrisyeh@stanford.edu Marshall Burke Department of Earth System Science mburke@stanford.edu Stefano Ermon Department of Computer Science ermon@cs.stanford.edu Obtaining detailed and reliable data about local economic livelihoods in developing countries is expensive, and data are consequently scarce. Previous work has shown that it is possible to measure local-level economic livelihoods using high-resolution satellite imagery. However, such imagery is relatively expensive to acquire, often not updated frequently, and is mainly available for recent years. We train CNN models on free and publicly available multispectral daytime satellite images of the African continent from the Landsat 7 satellite, which has collected imagery with global coverage for almost two decades. We show that despite these images lower resolution, we can achieve accuracies that exceed previous benchmarks. 1 Introduction Policy makers and philanthropic organizations rely on data about local economic livelihood to direct their efforts in places that most need aid [12], [6], [9]. Traditionally, such data has come from expensive and logistically challenging household surveys. This has meant that nationallyrepresentative surveys are conducted only intermittently, with 39 of 59 African countries conducting fewer than two surveys during the years 2000 to 2010 from which nationally representative poverty measures could be constructed [12]. As a result, both policymakers and researchers lack key data with which to target anti-poverty programs or to measure their effectiveness. Previous work [8] introduced transfer learning methods for estimating economic livelihood indicators in 5 African countries from satellite imagery: Malawi, Nigeria, Rwanda, Tanzania, and Uganda. Reasoning that nighttime light intensity is correlated with urban developments, Jean et al. trained a convolutional neural network (CNN) to predict nighttime light intensity from daytime satellite images. They then trained simpler models on image features extracted by the CNN to estimate an 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA.

Asset Wealth Index (AWI). [8] found that the CNN features were useful in predicting asset wealth within the poorest segment of the population, especially when compared to established methods. Building on [8], we introduce the following contributions: 1. Publicly available, freely distributable satellite imagery with a long time series. Satellite images pulled from the Google Static Maps API (as done by [8]) are limited by Google s licensing terms, cannot be re-distributed, and do not have information about the date when each satellite image was taken. In contrast, we use publicly available, freely distributable multi-spectral satellite imagery from Landsat 7, available from 1999 to today. 2. Multi-spectral satellite imagery. Although Landsat 7 images are lower resolution than the zoom-level 16 Google Static Maps images used by Jean et al. (15-30m/px instead of approx. 2.5m/px), we achieve equivalent or better results by incorporating additional spectral bands beyond the visible spectrum information available in Google Static Maps. 2 Data and Preprocessing We created yearly composite satellite images of the African continent from 2004 to 2015 captured by the Landsat 7 satellite [1]. Each annual composite is created by taking the median of each cloud free pixel available during that year. This preprocessing has seen success in similar applications as a method to gather clear satellite imagery. Landsat 7 images have 9 spectral bands (we use both the low-gain and high-gain Thermal band) ranging in resolution from 60 meters per pixel (m/px) to 15m/px [1]. We apply pan-sharpening to the RGB bands [10] [2] to produce 15m resolution versions, as others have shown this technique to be beneficial in a variety of satellite imagery tasks [10]. As in [8], we use transfer learning with nighttime lights labels coming from DMSP [3]. We bin the nighttime light intensities into 3 classes: low, medium, and high brightness. Likewise, our AWI labels come from Demographic and Health Surveys (DHS) for multiple countries in Africa for the years of interest. However, the coverage in these surveys is sparse compared to the large amounts of satellite image data that we have at our disposal. We sample training imagery (Figure 1) more densely near locations where labeled survey data is available in order to create more similar image distributions across the transfer learning domains. We take care dividing our sampled images into training, validation, and test splits such that there is no spatial overlap among the splits (though some images within each split may overlap). Figure 1: Sampling of daytime satellite images based on their nighttime light intensity: (left) low - class 0, (middle) medium - class 1, (right) high - class 2. The locations are divided into training (blue), validation (orange) and test (green) splits. Note this visualization shows overlap as images displayed at their actual size would be difficult to see. 3 Methods 3.1 CNNs Most existing CNN models are designed to work with 3-channel RGB images and thus are not directly compatible with multi-band satellite images. Thus, we adapted several existing architectures 2

to work on multi-band satellite images: 18- and 34-layer ResNets [7] and VGG-F [11]. We trained each model using all bands and using only the RGB bands. When using only the RGB bands, we initialized the CNNs with weights pre-trained on the ImageNet dataset [5]. When using all 9 bands, we modified the filters of the first convolutional layer to have a depth of 9 instead of a depth of 3. In other words, the dimension of the weights becomes [F, F, 9, 64] instead of the usual [F, F, 3, 64], where F is the filter size: 7 for ResNets, 11 for VGG-F. The weights for the RGB bands are initialized as usual, and the weights for the non-rgb bands are set to the mean of the 3 RGB weights at the same position in the same filter. We trained each CNN model to predict the nighttime light intensity class (0, 1, or 2) from Landsat 7 daytime satellite images. We run training for 60 epochs and choose the weights from the epoch in which the model achieved the highest accuracy on the validation split. Then, we run the trained models on images corresponding to the DHS survey locations and save the features extracted by the last layer of the CNN. 3.2 Multiple Resolution Imagery A challenge in dealing with satellite imagery is that the bands of the imagery may have different resolutions, as explained above. A naive workaround is to upsample all bands to the same, highest resolution, which may cause artifacts due to duplicated pixel values and poor utilization of pretrained weights. Instead, we upsample all bands to the highest resolution of 15m/px using Nearest Neighbors and apply dilated convolutions (also called atrous convolutions) [4] in the first layer of the CNN. At a high level, the goal of modifying the first layer implementation is to preserve the ability to initialize the network from weights pre-trained on RGB image datasets (such as Imagenet) while removing potential artifacts caused by the mismatched resolutions of the multi-spectral imagery. The dilated convolutional layers we implement vary with with the overall model architecture. The VGG-F model begins with an 11x11 stride-4 convolution in its first layer, whose weights are a [11, 11, 9, 64] tensor initialized from ImageNet as described above. Then the convolutional windows in the first layer are dilated to match the resolution of the original bands: the filters corresponding to the 15m bands are dilated at a rate of 1, the 30m bands at rate 2, and the 60m bands at rate 4. A stride of 4 is still applied, but no pixels are "skipped" by the convolution because nearest-neighbor upsampling to 15m results in a duplication of pixel values at a factor equivalent to the dilation rate. For example, each pixel in the 60m bands is replaced with a 4x4 block of pixels of the same value in the upsampled image. The dilation of 4 applied to this band realigns the convolutional window to the original pixel values, and thus the stride of 4 only skips the duplicated values. The ResNet family of models, as described in [7], use a 7x7 stride-2 convolution in the first layer of the network. Our specialized implementation uses a stride of 1 and adds dilation in the same manner as our VGG-F first layer implementation. 3.3 From Image Features to Poverty Metric We also tested several models for predicting poverty metrics from the image features extracted by the CNNs, including ridge regression and gradient-boosted trees (GBTs). We trained each model with leave-one-country-out cross-validation. 4 Results Table 1 provides a quantitative comparison of several models trained using our methods as well as results from [8]. We also show results training ridge regression and GBT models on only the scalar nightlights value from each DHS survey location. The squared correlation coefficient (r 2 ) between nightlights and the AWI was 0.57, which several models we trained were able to surpass. However, applying a non-linear method, such as GBTs, to predict the AWI from nightlights yields a stronger r 2 value of 0.66. One major finding in [8] was that their convolutional features make much stronger AWI predictions in the poorer segment of the wealth distribution, and we see similar behavior in our models as well. Figure 2a and Figure 2b show the results for the VGG-F, 9 Band / ridge model. In Figure 2b, we 3

Table 1: Results for mean out-of-country predictions. Results are obtained by repeating for each country the process of training on 4 countries and predicting locations in the 5th. Aggregate Residual r 2 indicates the squared correlation between residuals of predictions from nightlights and residuals of predictions from the model, aggregated across all five countries. Model Mean Train r 2 Mean Test r 2 Aggregate Residual r 2 Nightlights / GBT 0.63 0.66 1.0 VGG-F, RGB / ridge 0.71 0.64 0.69 VGG-F, 9 Band / ridge 0.68 0.63 0.70 ResNet-18, 9 Band / ridge 0.69 0.64 0.73 ResNet-34, 9 Band / ridge 0.70 0.65 0.74 Jean et al. [8] 0.53 0.54 0.76 consider the case of training and predicting only on cluster locations that fall below a certain poverty percentile. As in [8] we achieve a significantly higher r 2 value than nightlights when training on only the poorer datapoints. Our VGG-F model trained on Landsat 7 imagery surpasses results described in [8] trained on Google Static Maps imagery. However, this only holds when sampling training and test folds in a manner that is agnostic to country borders (marked "Pooled" or "Block CV Pooled" in Figure 2b). We observe that restricting train and test folds to each be exactly the data from a single country results in significantly poorer performance when training on the poorest data (marked "OOC Overall" in Figure 2b). In Figure 2a we examine leave-one-country-out training with a ridge regression model using image features extracted by the VGG-F 9 Band CNN. We train the model on DHS survey data from 4 of the 5 countries and then have it predict only on datapoints from the left-out country that are below a particular wealth percentile threshold. We see that when the model is applied to countries that it has not seen before, its performance suffers the most in poorer areas. 5 Conclusion Our results show that the current state-of-the-art in satellite-based poverty prediction lends itself to predicting relative wealth within a single country where some ground truth data is available, but may struggle with extrapolating across country borders. Using some combination of nightlights and predictions from the proposed models may yield further improvements. Furthermore, while we only trained models to predict economic livelihood with a single year of Landsat 7 imagery, we could extend our predictions to all of the years that Landsat 7 has been active (since 1999). This opens up the possibility of analyzing changes in local economic levels over time at a much higher granularity than before, especially in developing countries that typically experience long intervals between nationally-representative household surveys. 4

(a) Figure 2: (a) Results of leave-one-country-out training of a ridge regression model on DHS survey data for each left-out country. All Countries indicates the aggregated predictions across all 5 countries. We compute the r 2 value on only the datapoints below a wealth percentile threshold within the test set. The horizontal axis plots the wealth percentile threshold. For example, a value at 0.5 on the horizontal axis is the r 2 value computed from the poorest half of the datapoints and their corresponding predictions. (b) The horizontal axis specifies a wealth percentile. During training and testing, all data above the wealth percentile is ignored. The vertical axis plots the r 2 value between predictions and the true AWI. OOC Overall corresponds to out-of-country predictions (data is divided into folds by country). Nightlights GBT and Nightlights Ridge operate in the same manner, using gradient boosted trees and ridge regression respectively. Pooled and Block CV Pooled correspond to cross validated r 2 values. The cross validation is agnostic to the country, so training and testing data may reside in the same country. The Block CV Pooled model removes any training imagery that overlaps with test imagery. (b) 5

References [1] Landsat 7: Description of Spectral Bands. https://landsat.usgs.gov. [Online; accessed 8-June- 2017]. [2] Panchromatic Image Sharpening of Landsat 7 ETM+. https://landsat.usgs.gov/ panchromatic-image-sharpening-landsat-7-etm. [Online; accessed 9-June-2017]. [3] Version 4 dmsp-ols nighttime lights time series. [Online; accessed 9-June-2017]. [4] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. CoRR, abs/1412.7062, 2014. [5] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009. [6] Devarajan. Rev. Income Wealth 59, S9 S15, 2013. [7] K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. CoRR, abs/1603.05027, 2016. [8] N. Jean, M. Burke, M. Xie, W. M. Davis, D. B. Lobell, and S. Ermon. Combining satellite imagery and machine learning to predict poverty. Science, 2016. [9] Jerven. Poor numbers: How we are misled by african development statistics and what to do about it. Cornell Univ. Press, 2013. [10] K. Kpalma, M. Chikr El-Mezouar, and N. Taleb. Recent Trends in Satellite Image Pan-sharpening techniques. In 1st International Conference on Electrical, Electronic and Computing Engineering, Vrniacka Banja, Serbia, June 2014. [11] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014. [12] World Bank. Povcalnet online poverty analysis tool, http:// iresearch.worldbank.org/povcalnet/. 2015. 6