Hiding Virtual Computing and Supercomputing inside a Notebook: GISandbox Science Gateway & Other User Experiences Eric Shook

Similar documents
Get Automating with Infoblox DDI IPAM and Ansible

AUTOMATION ACROSS THE ENTERPRISE

Ansible in Depth WHITEPAPER. ansible.com

XSEDE at a Glance Aaron Gardner Campus Champion - University of Florida

Infoblox and Ansible Integration

Challenges in Transition

Architecting Systems of the Future, page 1

Research Data - Infrastructure and Services Wim Jansen European Commission DG CONNECT einfrastructure

Programme TOC. CONNECT Platform CONNECTION Client MicroStation CONNECT Edition i-models what is comming

AGENTLESS ARCHITECTURE

Ansible Tower Quick Install

Ansible Tower Quick Install

THE ADVANCED RESEARCH COMPUTING LANDSCAPE IN BRITISH COLUMBIA AND CANADA

NICIS: Stepping stone to a SA Cyberinfrastructure Commons?

Establishment of a Multiplexed Thredds Installation and a Ramadda Collaboration Environment for Community Access to Climate Change Data

ANSIBLE AUTOMATION AT TJX

GPU-accelerated track reconstruction in the ALICE High Level Trigger

SCAI SuperComputing Application & Innovation. Sanzio Bassini October 2017

Sourcing in Scientific Computing

BIO Helmet EEL 4914 Senior Design I Group # 3 Frank Alexin Nicholas Dijkhoffz Adam Hollifield Mark Le

Zero Touch Provisioning of NIOS on Openstack using Ansible

American Chemical Society The ACS International Center

Scientific Computing Activities in KAUST

ANSIBLE TOWER OVERVIEW AND ROADMAP. Bill Nottingham Senior Principal Product Manager

AVL X-ion. Adapts. Acquires. Inspires.

Jupyter Notebook. portal.biohpc.swmed.edu

EarthCube Conceptual Design: Enterprise Architecture for Transformative Research and Collaboration Across the Geosciences

Proposal Solicitation

Rapid Deployment of Bare-Metal and In-Container HPC Clusters Using OpenHPC playbooks

Implementing BIM for infrastructure: a guide to the essential steps

Computer Science as a Discipline

NUIT Support of Researchers

Behind the scenes of a FOSS-powered HPC cluster at UCLouvain

Application of Maxwell Equations to Human Body Modelling

Petascale Design Optimization of Spacebased Precipitation Observations to Address Floods and Droughts

A Balanced Introduction to Computer Science, 3/E

Center for Hybrid Multicore Productivity Research (CHMPR)

High Performance Computing Facility for North East India through Information and Communication Technology

Infrastructure as Code CS398 - ACC

Curriculum Connections. Connecting the Art of Hokusai and Henri Rivière to Local Landscapes

Impact from Industrial use of HPC HPC User Forum #59 Munich, Germany October 2015

DevOPS, Ansible and Automation for the DBA. Tech Experience 18, Amsersfoot 7 th / 8 th June 2018

Geocoding Techniques and Options for US and International Locations. Thomas Oaks Tosia Shall

From Internal Validation to Sensitivity Test: How Grid Computing Facilitates the Construction of an Agent-Based Simulation in Social Sciences

Ansible + Hadoop. Deploying Hortonworks Data Platform with Ansible. Michael Young Solutions Engineer February 23, 2017

A Study of Optimal Spatial Partition Size and Field of View in Massively Multiplayer Online Game Server

EM Simulation of Automotive Radar Mounted in Vehicle Bumper

Ansible Tower Quick Setup Guide

University of California, Santa Barbara. CS189 Fall 17 Capstone. VR Telemedicine. Product Requirement Documentation

Ansible Tower on the AWS Cloud

The recommended way for deploying a OSS DC/OS cluster on GCE is using Terraform.

NRC Workshop on NASA s Modeling, Simulation, and Information Systems and Processing Technology

Distributed spectrum sensing in unlicensed bands using the VESNA platform. Student: Zoltan Padrah Mentor: doc. dr. Mihael Mohorčič

Hardware Software Science Co-design in the Human Brain Project

HASHICORP TERRAFORM AND RED HAT ANSIBLE AUTOMATION Infrastructure as code automation

INTEGRATED DESIGN & TEST

Connecting Place: Making Geospatial more Intelligent and Accessible for decision makers. Andrew Coote Chief Executive 3rd February 2013

The UK e-infrastructure Landscape Dr Susan Morrell Chair of UKRI e-infrastructure Group

in which I will not talk about the elephant whale in the room

Software Version x.x.xx Document Number xx-xx-xxxx Printed 12/11/12

NetApp Sizing Guidelines for MEDITECH Environments

IN DEPTH INTRODUCTION ARCHITECTURE, AGENTS, AND SECURITY

Competition Manual. 11 th Annual Oregon Game Project Challenge

Computational Efficiency of the GF and the RMF Transforms for Quaternary Logic Functions on CPUs and GPUs

Parallel Computing 2020: Preparing for the Post-Moore Era. Marc Snir

Automated Test Summit 2005 Keynote

FROM BRAIN RESEARCH TO FUTURE TECHNOLOGIES. Dirk Pleiter Post-H2020 Vision for HPC Workshop, Frankfurt

President Barack Obama The White House Washington, DC June 19, Dear Mr. President,

Smarter oil and gas exploration with IBM

MSc(CompSc) List of courses offered in

THE ROLE OF USER CENTERED DESIGN PROCESS IN UNDERSTANDING YOUR USERS

TRIL Technology Research for Independent Living. Seamus Small TRIL Centre Manager 11 th May 2011

Open-Source Hardware: Stone Soups and Not Stone Statues, Please

DevOps: the perfect ally for Science Operation. Rocio Guerra - ESA ADASS XXVVIII, College Park MD, 15th November 2018

Cloud and Devops - Time to Change!!! PRESENTED BY: Vijay

From Gamers to Tango Dancers Bridging Games Engines and Distributed Control System Frameworks for Virtual Reality (VR) based scientific simulations

ANSYS v14.5. Manager Installation Guide CAE Associates

The Five R s for Developing Trusted Software Frameworks to increase confidence in, and maximise reuse of, Open Source Software

Executive Summary. Chapter 1. Overview of Control

MARCO MALAVOLTI

Artificial intelligence, made simple. Written by: Dale Benton Produced by: Danielle Harris

From Cloud Computing To Online Gaming. Mark Sung General Manager zillians.com

twitter.com/twc_rp Research Announcement

MORSE, the essential ingredient to bring your robot to real life

CUBES IN A CAVE VISUALISATION AND ANALYSIS CHALLENGES FOR THE SKA ERA. Christopher Fluke

High Performance Computing for Engineers

GeoServer Clustering Revisited

23270: AUGMENTED REALITY FOR NAVIGATION AND INFORMATIONAL ADAS. Sergii Bykov Technical Lead Machine Learning 12 Oct 2017

OPEN SOURCING ANSIBLE

Document downloaded from:

BMOSLFGEMW: A Spectrum of Game Engine Architectures

Modeling & Simulation Capability for Consequence Management

Human Computer Interaction (HCI, HCC)

Big Data Visualization for Planetary Science

Perspective platforms for BOINC distributed computing network

Modelling and simulation of complex human physiology systems

INTRODUCTION WHY CI/CD

Extending On-Premises Network-Attached Storage to Google Cloud Storage with Komprise

UNDERSTANDING LTE WITH MATLAB

Massive Multi-Agent Simulation - Master Seminar

Transcription:

Hiding Virtual Computing and Supercomputing inside a Notebook: GISandbox Science Gateway & Other User Experiences Eric Shook Domain Champion for GIS, XSEDE Department of Geography, Environment and Society University of Minnesota eshook@umn.edu 1

My Background Started as a Computer Science graduate student working with Open Science Grid and the LHC (eventually as a grid and systems administrator) Switched to Geography and now my research is cyberinfrastructure-based geographic information science and systems (CyberGIS) Helped develop two XSEDE Science Gateways: GISolve and the CyberGIS Gateway Now developing my own science gateway: GISandbox 2

My Perspective My background as a grid and cluster administrator and science gateway developer has influenced my perspective as a user I can appreciate the difficulties in developing usable systems (and their documentation!) I can really appreciate bugs, early prototypes, and associated difficulties with shared systems But as a user, I appreciate an easy-to-use system. :) 3

My Quest It should be as easy as or easier than using my desktop computer* * Whether it be supercomputers, virtual computers, containers, clouds,... 4

My Quest Use Cases GISandbox Considerations 5

Singularity Containers 6

Comparing Spatial Computing Systems A small interdisciplinary team aiming to compare the performance of several established and cutting-edge systems for processing big spatial data Spatial databases Spark-based systems Parallel programming languages for spatial data Perfect use case for containerization Setup and configure the container and software first Run each system on XSEDE, profiling the execution Tweak system configurations to improve performance Wash, rinse, and repeat 7

Early User Experience on Bridges Bridges supports Singularity (common across XSEDE) Documentation for Singularity is minimal A quick search on other XSEDE systems yielded similar results Bridges did not have a base image (Shout out to Roberto and PSC admins for creating one for me) Even with a base image we hit a showstopping issue: permissions. We could not tweak the systems software or rebuild the image Required root-level permission, which Singularity does not allow IMPORTANT: This is not a negative reflection on Bridges, because I believe I would have had this same experience on any XSEDE machine. I just happen to use Bridges heavily so it was the best testbed for me. 8

Early User Experience on Bridges That said XSEDE and PSC support for helping me explore Singularity was phenomenal PSC and Bridges folks responded to my request immediately and helped me consider my options It's just that some users may not know to ask... 9

Build and Configure Containers On-Site For containers to be successful on XSEDE I believe users must be able to build and configure them directly on the supercomputer Tedious to copy, tweak, rebuild, and copy back Optimizations are difficult if hardware is different Containers currently work for stable software (read traditional HPC community), but are unworkable for rapidly evolving software (read many non-traditional HPC communities). I argue containers should best serve these new communities it should make their lives easier. 10

Paleoscape Model 11

Paleoscape Model and Human Origins Simulate Climate and Vegetation during the Last Glacial Maximum (~140,000 years ago) Republic of South Africa Altered coastline, climate, flora, and fauna Simulate Humans Using Agent-based Models Images courtesy of Curtis Marean + Paleoscape Team

Paleoscape XSEDE Supercomputers Coupled model approach (Shook, et al. 2015) won Best Accelerating Discovery Paper at XSEDE'15

The Challenge for Many Users: Terminals Command-line Interface Batch Queuing System Split architecture: Head Nodes and Compute Nodes High learning curve Bridges supercomputer at the Pittsburgh Supercomputing Center 14

My Quest Use Cases GISandbox Considerations 15

Science Gateways Science gateways allow science & engineering communities to access shared data, software, computing services, instruments, educational materials, and other resources specific to their disciplines. (sciencegateways.org) Lower the barrier to entry for science disciplines Common platform for collaborative science Scientists use the allocation for the science gateway so there is no need to write an allocation proposal XSEDE Science Gateways: https://www.xsede.org/gateways-listing Science Gateway Community Institute: https://sciencegateways.org 16

GISandbox Play place for researchers and educators to learn about, experiment with, and advance geographic information systems and science (gisandbox.org) Text cell Code cell (Interactive) Output GISandbox User Interface (Jupyter Notebooks) 17

GISandbox Architecture (10,000 foot view) Comet supercomputer at the San Diego Supercomputing Center Source: http://ucsdnews.ucsd.edu Jetstream Cloud Computing Resource Credits: https://github.com/koldunovn/python_for_geosciences Bridges supercomputer at the Pittsburgh Supercomputing Center Source: http://insidehpc.com 18

GISandbox Architecture (10,000 foot view) Comet supercomputer at the San Diego Supercomputing Center Source: http://ucsdnews.ucsd.edu Jetstream Cloud Resource Jupyter magic command to run code cells on Comet or Bridges supercomputrs Credits: https://github.com/koldunovn/python_for_geosciences Bridges supercomputer at the Pittsburgh Supercomputing Center Source: http://insidehpc.com 19

GISandbox Architecture (10,000 foot view) 1,944 Nodes 24 cores per node Standard memory: 128 GB Large memory: 1.5 TB GPU + Haswell processors Dedicated Virtual Machine 24 cores 60 GB RAM Comet supercomputer at the San Diego Supercomputing Center Source: http://ucsdnews.ucsd.edu Hybrid System Regular memory: 128 GB Large memory: 12 TB or 3 TB Supports DBs, Spark, and more Jetstream Cloud Resource Credits: https://github.com/koldunovn/python_for_geosciences Bridges supercomputer at the Pittsburgh Supercomputing Center Source: http://insidehpc.com 20

GISandbox Behind-The-Scenes* Jetstream Virtual Machine Ansible scripts to build the system and software Jupyter Improvements Ubuntu JupyterHub + CILogon (to use XSEDE credentials for login) Python 2, Python 3, R GIS and other specific software Code cell Supercomputing job Bridges and Comet CPU & Storage Allocations GIS and other specific software will match the VM * ECSS Davide Del Vento, Jun Wang, and now Andrea Zonca to the rescue!! 21

GISandbox Architecture (Details) Slide courtesy: Sergiu Sanielevici and Davide Del Vento 22

GISandbox Architecture (A Dream?) GISandbox Container (Identical SW to the Jetstream instance) Right now that is too much work, but it shouldn't be Slide courtesy: Sergiu Sanielevici and Davide Del Vento 23

My Quest Use Cases GISandbox Considerations 24

Considerations From a User Perspective Storage Many XSEDE SPs have limited storage allocations Containers may put pressure on XSEDE to increase storage for users and projects Reinventing the wheel As I work through these difficulties I am encountering others who are doing the same thing. A mechanism for users and SPs to share experiences (and code) would be helpful. 25

Considerations From a User Perspective Supercomputer DMZ We need a demilitarized zone on each supercomputer to build and configure images. Something like a supercomputer running a VM running a container for isolation. (Systems hat) Security is next to impossible, but (Users hat) for containers to be truly useful we need to work toward figuring this out. Collaboration Need to figure out how to share containers and data between containers/users 26

Thank You Eric Shook eshook@umn.edu I want to thank Davide Del Vento, Jun Wang, Eroma Abeysinghe, Andrea Zonca, and many from PSC for all their help! 27