Data Quality: The elephant in the (big data) room

Similar documents
Prove it: generating commercial evidence on behaviour change for UK government policy making a case study on smart meters

Welcome to the future of energy

Engaging UK Climate Service Providers a series of workshops in November 2014

ITU Telecom World 2018 SMART ABC

9 October Opportunities to Promote Data Sharing UCL and the YODA Project. Emma White. Associate Director

THESIS PRESENTATION. Gabriele Goebel-Heise 5617A011-4

NATIONAL SCIENCE AND TECHNOLOGY COUNCIL (NSTC)

National Workshop on Responsible Research & Innovation in Australia 7 February 2017, Canberra

Measurement for Generation and Dissemination of Knowledge a case study for India, by Mr. Ashish Kumar, former DG of CSO of Government of India

A Study on the Approaches of Value Realization of Open Government Data

Engaging Stakeholders

Enabling ICT for. development

Main lessons learned from the German national innovation system

CREATIVE ECONOMY PROGRAMME. Development through Creativity

THEME 4: FLEXIBILITY (TORRITI, READING)

A New Path for Science?

Capturing the impacts of Liverpool 08 Evaluating European Capital of Culture

Copyright: Conference website: Date deposited:

Inclusively Creative

eco Report: M2M Future Trends 2015

T E Wellington House, Wellington Street, Leeds LS1 2DE

Public Consultation: Science 2.0 : science in transition

DEFRA estimates that approximately 1,200 EU laws, a quarter of the total, relate to its remit.

Satellite Environmental Information and Development Aid: An Analysis of Longer- Term Prospects

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

HOW TO BUILD AN INNOVATION ECOSYSTEM?

Data Science in the Energy Sector

Future City Glasgow. City of Glasgow

Technology Roadmaps as a Tool for Energy Planning and Policy Decisions

The Contribution of the Social Sciences to the Energy Challenge

Introduction. digitalsupercluster.ca

DCH-RP e-infrastructure Concertation Workshop. Laila Valdovska, systemlibrarian Culture Information Systems Centre Tallinn,

Future. Ready. SM. Using Meters as Distribution Sensors for Capacitor Bank Monitoring. White Paper

Connections with Leading Thinkers. Economist Fernanda de Negri discusses the merits and shortcomings of Brazil s innovation policies.

Europe s e-infrastructures: The starting blocks for Open Science & Innovation

Marie Skłodowska- Curie Actions under Horizon2020

Seoul Initiative on the 4 th Industrial Revolution

RIETI BBL Seminar Handout

Progress in Open Access to European research data

WFEO STANDING COMMITTEE ON ENGINEERING FOR INNOVATIVE TECHNOLOGY (WFEO-CEIT) STRATEGIC PLAN ( )

e-infrastructures and e-science in the European Research Area

Report on the Results of. Questionnaire 1

CHAPTER 1 PURPOSES OF POST-SECONDARY EDUCATION

Life Science Ontario Diversity of Members, Unity of Voice. January 2014

Conclusions on the future of information and communication technologies research, innovation and infrastructures

FELLOWSHIP SUMMARY PAPER. Digital Inclusion in New Zealand A CALL TO ACTION

Technology Trends for Government

MORE POWER TO THE ENERGY AND UTILITIES BUSINESS, FROM AI.

Enabling the partnership between Asia and Europe to inspire eco-innovation of SMEs

PART III: CROSS-CUTTING ISSUES

Tourism research and policy

Position Paper. CEN-CENELEC Response to COM (2010) 546 on the Innovation Union

Across the Divide Tackling Digital Exclusion in Glasgow. Douglas White

Offshore Renewable Energy Catapult

Energy for society: The value and need for interdisciplinary research

Response to. Second Consultation on Possible National Rollout Scenarios for the Smart Metering Cost Benefit Analysis (CER/10/197)

Emerging Transportation Technology Strategic Plan for the St. Louis Region Project Summary June 28, 2017

EXECUTIVE SUMMARY. St. Louis Region Emerging Transportation Technology Strategic Plan. June East-West Gateway Council of Governments ICF

Science, Technology, Engineering and Public Policy. Internship Opportunities

Data Analytics Skills Escalator. Dr Andrew Dean

Whole of Society Conflict Prevention and Peacebuilding

A Comprehensive Statewide Study of Gambling Impacts: Implications for Public Health

Il programma di lavoro SSH 2013

Growing the national institute for data science and artificial intelligence

Brief to the. Senate Standing Committee on Social Affairs, Science and Technology. Dr. Eliot A. Phillipson President and CEO

Economic and Social Council

United Nations Framework Convention on Climate Change UNFCCC EXPERT GROUP ON TECHNOLOGY TRANSFER FIVE YEARS OF WORK

From Observational Data to Information IG (OD2I IG) The OD2I Team

Service Science: A Key Driver of 21st Century Prosperity

Summary Remarks By David A. Olive. WITSA Public Policy Chairman. November 3, 2009

Gender Responsive Technology Assistance. Karina Kolbrún Larsen Knowledge and Communications Manager / Gender Coordinator

Connected Living -- Smart Cities The Impact of Big Data for Smart Cities. Smart Cities Forum, Brussels, 6 Sept 2013

National Policy Implications

American Chemical Society The ACS International Center

Research Partnership Platform. Legal and Regulatory Challenges of the Sharing Economy

Creative Informatics Research Fellow - Job Description Edinburgh Napier University

The Emerging Economy 2030:

Our digital future. SEPA online. Facilitating effective engagement. Enabling business excellence. Sharing environmental information

FDA Centers of Excellence in Regulatory and Information Sciences

SEMICONDUCTOR INDUSTRY ASSOCIATION FACTBOOK

Integrated Transformational and Open City Governance Rome May

FUTURE OF MOBILITY. Dr Rupert Wilmouth Head of Sustainable Economy

Applied Research APPLIED KNOWLEDGE INNOVATIVE RESEARCH PROVEN RESULTS. nscc.ca/appliedresearch

A PAN-NORTHERN APPROACH TO SCIENCE. April 2016

Sparking a New Economy. Canada s Advanced Manufacturing Supercluster

Workshop on Enabling Technologies in CSF for EU Research and Innovation Funding

SMART PLACES WHAT. WHY. HOW.

Written response to the public consultation on the European Commission Green Paper: From

Publishable summary. 1 P a g e

Ministry of Industry. Indonesia s 4 th Industrial Revolution. Making Indonesia 4.0. Benchmarking Implementasi Industri 4.0 A.T.

Executive Summary Industry s Responsibility in Promoting Responsible Development and Use:

Developing a Strong Nuclear Safety Culture. Larry Weber Chief Nuclear Officer, Senior Vice President American Electric Power Cook Nuclear Plant

COST Action CHARME (CA15110)

FUTURE PLANS AND INNOVATION CENSUSES SOUTH AFRICA S STORY

Research and Application of Agricultural Science and Technology Information Resources Sharing Technology Based on Cloud Computing

Climate Change Innovation and Technology Framework 2017

Report ECIA Workshop: Creative Industries Policies, a knowledge exchange

ᴙhetort THE BIG CREW CHANGE AND LOCAL CAPACITY DEVELOPMENT IN GHANA. Tayo Ajimoko April 2015, Accra

Supportive publishing practices in DRR: Leaving no scientist behind

Stakeholders Conference. Conclusions. EU-EECA S&T cooperation: The way forward. Athens June 2009

Transcription:

Data Quality: The elephant in the (big data) room Chris Park Data Scientist UK Data Service DataFirst Data Quality Workshop Cape Town, South Africa 6-7 July 2017

Janitors? Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labour of collecting and preparing unruly digital data, before it can be explored for useful nuggets. New York Times

Data Science

2016 CrowdFlower Survey

2016 CrowdFlower Survey

Key Messages Data cleaning and data quality are important in the present era of Big Data et al. It might be cheaper to store data now, but it is harder to keep track, standardize, and curate data for secondary research. The way forward is to work across disciplines and sectors, e.g. academia, government, and industry, to provide standardized access to and use of data that has potential to provide public value, e.g. energy data.

UK Data Service Curator of the UK s largest collection of digital social and economic research data. Serving the data needs of social and economics researchers since 1967. Promotes data sharing and reproducibility, a topic of increasing importance, e.g. data as academic output. Undergone a number of key transformations in response to changing user needs.

Decline of Survey Data, 1980-2010 AER: American Economic Review JPE: Journal of Political Economy QJE: Quarterly Journal of Economics ECMA: Econometrica Chetty, R. (2012). The Transformative Potential of Administrative Data for Microeconometric Research. Retrieved from http://conference.nber.org/confer/2012/si2012/ls/chettyslides.pdf

Rise of Administrative Data, 1980-2010 AER: American Economic Review JPE: Journal of Political Economy QJE: Quarterly Journal of Economics ECMA: Econometrica Chetty, R. (2012). The Transformative Potential of Administrative Data for Microeconometric Research. Retrieved from http://conference.nber.org/confer/2012/si2012/ls/chettyslides.pdf

Human Activity

Human Activity

Same architecture, different infrastructure And also: in response to changing user needs, diversifying into new and emerging forms of data with public impact, e.g. energy data.

Smarter Household Energy Data Partnership between UK Data Service, UCL Centre for Energy Epidemiology, and DataFirst. Explore ways to scale up research using household energy data, e.g. benefits and barriers. Energy research is important: Energy is the linchpin of modern economic activity, Efficient use can help reduce negative impact on the environment and help consumers save money on their bills, Linking with sociodemographic data can help Identify and support fuel poor households, etc.

Energy Research Key lies in linking energy data with administrative data such as building and sociodemographic data. Topics studied include: Forecasting based on machine learning. Helps with estimating supply. Help consumers save money on their bills by shifting energy consumption to lower-tariff times of the week. Disaggregating energy use to break down consumption to the appliance level.

Barriers to Energy Research Heavily anonymized e.g. limited ability to link with other datasets. Limited and biased sample e.g. recruitment-based studies One-time dataset e.g. sprawl, limited reproducibility Data governance and provenance issues e.g. no standard documentation

Barriers to Energy Research Missing and duplicate observations and lack of standardized markers. e.g., NA, NULL, 99, 99, etc. Timestamp formats: different combinations of date, time, and date + time columns, and handling of time zones. e.g. Daylight saving: false features - Duplicates when clock turns back 1 hour, - Missing when clock shifts forward 1 hour. 80-90% of time spent in janitorial work.

Key Messages Data cleaning and data quality is important in the era of Big Data et al. It might be cheaper to store data now, but it is harder to keep track, standardize, and curate data. Way forward is through collaborative projects between academia, government, and industry that facilitate access to and use of data with policy implications, e.g. energy data.

From Dumb to Smart : Meters

Why Smart Meters? Better control and oversight over own energy use. No more estimated bills, and no more meter readers visiting your home. Researchers can have access to raw, unadjusted data. Opportunity to standardize how energy data stored and shared to encourage reproducibility.

Smart Meter Roll-out Plans in Europe

Data Quality Challenges Retrieved from https://www.intechopen.com/source/html/50727/media/fig2.png

Lessons learned and way forward Academia, industry, and government all have something to offer. Smart meters provide a unique opportunity to demonstrate how data-driven innovation across industries and sectors can create public value. Want: a unified, standardized, and secure interface to smart meter data that can help researchers and policymakers.

Smart Meter Research Portal Serve as a knowledge base for intervention and longitudinal studies using energy data across the sociotechnical spectrum. Provide seamless access to standardized smart meter data at half-hourly, daily, or monthly resolutions. Facilitate secure data linkage service within an ISOcertified, trusted digital repository. Use cutting-edge technology based on the big data platform at the UK Data Service.

Data Service as a Platform

Key Messages Data cleaning and data quality is important in the era of Big Data et al. It might be cheaper to store data now, but it is harder to keep track, standardize, and curate data. The way forward is to work across disciplines and sectors, e.g. academia, government, and industry, to provide standardized access to and use of data that has potential to provide public value, e.g. energy data.

Chris Park Big Data Network Support UK Data Service chris.park@essex.ac.uk