High-performance computing for soil moisture estimation S. Elefante 1, W. Wagner 1, C. Briese 2, S. Cao 1, V. Naeimi 1 1 Department of Geodesy and Geoinformation, Vienna University of Technology, Vienna, Austria 2 EODC Earth Observation Data Centre for Water Resources Monitoring, Vienna, c/o University of Technology, Vienna, Austria BIDS Conference, Santa Cruz de Tenerife, 15-17 March 2016
Contents Introduction Case Studies Case 1: Envisat ASAR WS (150m) Case 2: Sentinel-1 C-SAR IW HR (20m) Lesson learned & Outlook 2
Surface Soil Moisture Estimation Preprocessing Parameters 3
Earth Observation Satellites for Microwave Missions 4
The Earth Observation Data Center (EODC) The EODC is a private-public partnership aiming at providing a collaborative IT infra-structure for archiving, processing, and distributing EO data (www.eodc.eu) Through multi-national partnerships from science, the public and private sectors users can get direct access to EO data storage and running data-intensive geoscientific models TU Wien is an EODC partner 5
EODC environment Virtual Machines (VMs) Supercomputer 24/7 Operations & Rolling Archive Petabyte-Scale Disk Storage Tape Storage 6
The Vienna Scientific Cluster (VSC-3) 2020 nodes with 64 GBytes of RAM Each node equipped with 2 (8 cores) CPUs (Intel Xeon Processor E5-2650 v2 from the Ivy Bridge-EP family) Middleware installed: Simple Linux Utility for Resource Management (SLURM) Distributed volume managed by the BeeGFS Energy-efficient cooling is provided by the mineral-oil based CarnotJet System VSC-3 was ranked 85th in the November 2014 worldwide TOP500 supercomputer list 7
Contents Introduction Case Studies Case 1: Envisat ASAR WS (150m) Case 2: Sentinel-1 C-SAR IW HR (20m) Lesson learned & Outlook 8
Preprocessing chain on VSC-3 Sentinel 1 Toolbox (S1TBX) Version 1.1.1 Python 2.7 SAR Geophysical Toolbox (SGRT) 2.2 9
Case 1: Envisat ASAR WS (150m) Read VSC-3 Write S1 & ASA Archive (testing NFS solution) VSC-3 BeeGFS storage Input: Data type: Envisat ASAR WS (150m) Number of files: 31,199 Single input dataset size: 12 692 MB Total input data size: 5.401 TB 20% of the 8 years ASA WS archive!! 10
The processing Happy to start the processing But The jobs crashed when around 200 nodes were simultaneously working!!! 11
Why (Possible reasons)? Input: 200 nodes reading simultaneously files from S1 & ASA NFS archive Output: 200 nodes * 30 files = 6000 files to write on the BeeGFS storage The Sentinel & Envisat Archive (NFS testing) 93 NFS disks (83 x 4T, 10 x 8T) Connected through high-speed InfiniBand (40 Gb/sec) to the VSC-3 system The distributed BeeGFS VSC-3 volume 360 spinning disks Connected through around 160 Gb/sec bandwidth (evaluated experimentally) 12
What if I use the RAM disk/cache? Data can be temporarily stored in the RAM disk/cache Advantages Data access is much faster than any physical storage Disadvantages Part of the RAM available on the node will be used to store the data (I do not have 64 GBytes of RAM available anymore) 13
Case 1: Output data in the RAM disk (Caching) Read VSC-3 Write S1 & ASA Archive (testing NFS solution) Caching the output data VSC-3 BeeGFS storage Jobs: Number of independent jobs: 15,600 Number of images for each Job: 2 The output data are temporary stored in the RAM disk (Caching) Only at the end of the processing data are copied on BeeGFS 14
Case 1: SLURM Processing Queue 450 nodes assigned Start of the processing End of the processing 15
Cade 1: Processing speed caching input data Poor Scalability!! Average processing time Processing time significantly change for a given data file size 16
Case 1: Input and output data in the RAM disk (Caching) Read VSC-3 Write S1 toolbox S1 & ASA Archive keeps open (testing NFS solution) the files Caching the input and output data VSC-3 BeeGFS storage The input image and output data are temporary stored in the RAM disk (Caching) Only at the end of the processing data are copied on BeeGFS 17
Case 2: SLURM Processing queueing > 600 nodes assigned 18
Case 2: Processing speed caching input and output data Great Scalability!! 19
Contents Introduction Case Studies Case 1: Envisat ASAR WS (150m) Case 2: Sentinel-1 C-SAR IW HR (20m) Lesson learned & Outlook 20
From the previous experiments we learned that We need to cache input and output data The storage volume can handle the massive data amount when input and output data are stored in the RAM disk (Caching) Can we evaluate the time to read/write on a given storage Sentinel data? 21
Reading Performance: NFS storage and BeeGFS 10 minutes with 170 nodes!! 1 minute with 500 nodes 600 S1 & ASA storage (NFS testing solution) 60 VSC-3 storage (BeeGFS) Elapsed time to read an image (sec) 500 400 300 200 100 Elapsed time to read an image (sec) 50 40 30 20 10 0 0 50 100 150 200 0 0 100 200 300 400 500 600 Number of nodes Number of nodes A standard sample of S1-A images has been considered Size around 1 GByte and range 0.7 1.7 GByte 22
Case 2: Sentinel-1 IW HR Preprocessing Input: Jobs: Data type: Sentinel-1 IW GRD HR (20m) Number of files: 1,075 Single input dataset size: 0.8 1.7 GB Total input data size: 1.2 TB Number of independent jobs: 1,075 Number of images for each Job: 1 0 3.5 h Caching the input image and output data. Only at the end of the processing data are copied on BeeGFS 23
Case 2: S1 Processing speed caching input and output data Great Scalability!! 50 minutes to process S1 data (to improve) 24
Examples of Processing on VSC-3 90% of the 8 years ASA GM archive in 3.5 days instead of 167 days!! Test n. 1 n. 2 n. 3 n. 4 SAR product mode ASAR GM ASAR WS ASAR WS S-1 IW GRDH Spatial resolution 1 km 150 m 150 m 20 m Total number of data files 189,621 31,199 31,199 1,075 Number of images for job / Total Number of jobs 8 / 23,703 2 / 15,600 2 / 15,600 1 / 1,075 Input data file size range 1-73 MB 12-692 MB 12-692 MB 0.8 1.7 GB Total input data files size 1.579 TB 5.401 TB 5.401 TB 1.2 TB Max. number of simultaneous running nodes 417 454 612 396 Number of cores used by Sentinel-1 Toolbox 4 8 8 8 Input data caching on node False False True True 20% of the 8 years ASA WS archive in 8 hours instead of 353 days!! Output data caching on node True True True True Averaged processing time (seconds/mb) 9.18 5.65 2.39 2.69 Elapsed time including SLURM queueing Estimated elapsed time using only 1 node 3.5 days 4 days 8 hours 3.5 hours 167 days 353 days 353 days 37 days 25
Contents Introduction Case Studies Case 1: Envisat ASAR WS (150m) Case 2: Sentinel-1 C-SAR IW HR (20m) Lesson learned & Outlook 26
Lesson learned & Outlook Lesson learned Proved the capability to process EO Big Data within EODC platform 90% of the 8 years ASA GM archive in 3.5 days instead of 167 days 20% of the 8 years ASA WS archive in 8 hours instead of 353 days Optimal processing strategy is to temporary store the input and the output data in the RAM disk (Caching) during the processing The bottleneck for EO Big Data processing is the data I/O Outlook A new archive/storage volume will be put in place this year Reduce the processing time for each image (e.g. using multicore processing) Processing data at global scale 27
Thank you very much for your attention! The authors would like to thank for their help and support: B. Bauer-Marschallinger, R. Brunnthaler, A. Dostalova, S. Hahn, S. Hasenauer, H. Thüminger This study was supported by the Austrian Research Promotion Agency (FFG) through P4EODC - Preparing for the Initial Services of the EODC (project number 36786), WetMon - Enabling an operational Sentinel-1 wetland monitoring service for the EODC (project number 848001) and by EUMETSAT through Satellite Application Facility on Support to Hydrology and Water Management (H-SAF). http://rs.geo.tuwien.ac.at/remote-sensing/
Earth Observation Satellites for Microwave Missions 29
From Byte to PetaByte Processing Challenge 1 Byte 1 KiloByte 1 MegaByte 1 GigaByte 1 TeraByte 1 PetaByte
Current employed satellites data Envisat Instrument: Advance Synthetic Aperture Radar (ASAR) Period: 1st March 2002 8th April 2012 Acquisition Mode: Sentinel-1 Global Monitoring (GM) with 1km resolution Wide Swath (WS) with 150m resolution Instrument: C-Band SAR Period: Sentinel-1A : Launched on 3rd April 2014 Sentinel-1B : planned to be launched in 2016 Acquisition Mode: Interferometric Wide Swath (IW) with High Resolution(HR) 20m 31
From Byte to PetaByte Processing Challenge 1 Byte 1 KiloByte 1 MegaByte 1 GigaByte 1 TeraByte 1 PetaByte
Case 1: Envisat ASAR GM (1km) VSC-3 NFS Archive Internal BeeGFS storage Input: Jobs: Data type: Envisat ASAR GM (1km) Number of files: 189,621 Single input dataset size: 1 73 MB Total input data size: 1.579 TB Number of independent jobs: 23,703 Number of images for each job: 8 33
SLURM Processing Queue 34
Case 1: Processing Speed Poor Scalability!! 35
Directly reading vs caching Read from cache data Read directly from NFS 36