Serving the humanities: daydreams and nightmares

Similar documents
COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

Memorandum on the long-term accessibility. of digital information in Germany

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

Access to Research Infrastructures under Horizon 2020 and beyond

Social media corpora, datasets and tools: An overview

Academic and Student Mobility Models after Brexit. John Wood

European Rail Research Advisory Council

REPORT ON THE INTERNATIONAL CONFERENCE MEMORY OF THE WORLD IN THE DIGITAL AGE: DIGITIZATION AND PRESERVATION OUTLINE

Open access to research data in a European policy context

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE REGIONS

Open Science for the 21 st century. A declaration of ALL European Academies

FSD and CESSDA ERIC: Trusted, sustainable and integrated infrastructures

Open Data, Open Science, Open Access

UNIACCESS. Design of Universal Accessibility Systems for Public Transport

Sharing the effort a personal view on D3.4

Strategy EXECUTIVE SUMMARY NATIONAL DOCUMENTATION CENTRE NHRF

Synergies between the ESIFs and H Research Infrastructures

FP7-INFRASTRUCTURES

ORTOLANG: a French infrastructure for Open Resources and TOols for LANGuage

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

At its meeting on 18 May 2016, the Permanent Representatives Committee noted the unanimous agreement on the above conclusions.

ccess to Cultural Heritage Networks Across Europe

Open Science at Web-Scale: Breaking

EBLIDA submission to the European Commission Consultation: Europeana: next steps

Common Lab Research Infrastructure for the Arts and Humanities

Digitisation Plan

Positioning Libraries in the Digital Preservation Landscape

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

Belgian Position Paper

Digital Preservation Policy

e-infrastructures in FP7: Call 9 (WP 2011)

A Digitisation Strategy for the University of Edinburgh

Introduction to Planets. Hans Hofman Nationaal Archief Netherlands Barcelona, 27 March 2009

Open Access at the Max Planck Society

Expectations around Impact in Horizon 2020

The European Approach

Communication, Dissemination and Exploitation of results Mirela Atanasiu Head of Unit

Can Linguistics Lead a Digital Revolution in the Humanities?

HORIZON Leadership in Enabling and Industrial Technologies (LEIT)

Pre-Operational Validation (POV) Examples of Public Procurement of R&D services within EU funded Security Research actions. Paolo Salieri 1/2/2017

European Charter for Access to Research Infrastructures - DRAFT

Hamburg, 25 March nd International Science 2.0 Conference Keynote. (does not represent an official point of view of the EC)

How to write a Successful Proposal

Cooperation between the ESA Climate Change Initiative and the EC Copernicus Climate Change Service

TECHNOLOGICAL AND ORGANISATIONAL ASPECTS OF GLOBAL RESEARCH DATA INFRASTRUCTURES TOWARDS YEAR 2020

Columba oenas. Report under the Article 12 of the Birds Directive Period Annex I International action plan. No No

A New Platform for escience and data research into the European Ecosystem.

Digital Preservation Strategy Implementation roadmaps

Brief presentation of the results Ioana ISPAS ERA NET COFUND Expert Group

H2020 Focused Group Training "Legal and Financial Issues"

PROGRESS REPORT

EUROPEAN GNSS APPLICATIONS IN H2020

Reframing Collections for a Digital Age: A Preparatory Study for. Collecting and Preserving Web-based Art Research Materials

Online Access to Cultural Heritage through Digital Collections: the MICHAEL Project

Fact Sheet IP specificities in research for the benefit of SMEs

Action Line Cyber-Physical Systems Addressing the challenges and fostering innovation in Cyber-Physical Systems

Christophe DESSAUX Ministère de la Culture et de la Communication Association MICHAEL Culture

European Nuclear Education Network Association

Attribution and impact for social science data

Access to scientific information in the digital age: European Commission initiatives

Planetary Data System (PDS) At the DPS Astrophysics Assets Workshop

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL. on the evaluation of Europeana and the way forward. {SWD(2018) 398 final}

Scolopax rusticola Europe/South & West Europe & North Africa

THE ASEAN FRAMEWORK AGREEMENT ON ACCESS TO BIOLOGICAL AND GENETIC RESOURCES

Europe s e-infrastructures: The starting blocks for Open Science & Innovation

STRATEGIC FRAMEWORK Updated August 2017

COUNCIL OF THE EUROPEAN UNION. Brussels, 9 December 2008 (16.12) (OR. fr) 16767/08 RECH 410 COMPET 550

Developing Research Infrastructures for 2020 and beyond

Developing Research Infrastructures for 2020 and beyond

European Cloud Initiative. Key Issues Paper of the Federal Ministry of Education and Research

Roadmap for European Universities in Energy December 2016

Establishing a Development Agenda for the World Intellectual Property Organization

Data users and data producers interaction: the Web-COSI project experience

A DEER FOR EUROPE: A DISTRIBUTED EUROPEAN ELECTRONIC RESOURCE

Office of Science and Technology Policy th Street Washington, DC 20502

demonstrator approach real market conditions would be useful to provide a unified partner search instrument for the CIP programme

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

Report OIE Animal Welfare Global Forum Supporting implementation of OIE Standards Paris, France, March 2018

DIGITAL BR ITAIN: THE INTER IM R EPOR T R ESPONSE FR OM THE BR ITISH LIBR AR Y INTR ODUCTION

Anne Gilliland Summer School in the Study of Old Books Zadar, Croatia, 27 September, 2009

Finland s drive to become a world leader in open science

2 Development of multilingual content and systems

SEMINAR: Preparing research data for open access

POSITION PAPER. GREEN PAPER From Challenges to Opportunities: Towards a Common Strategic Framework for EU Research and Innovation funding

OCLC Global Council April 12, Europeana. Elisabeth Niggemann Director General, Deutsche Nationalbibliothek and Member, OCLC Board of Trustees

What is a collection in digital libraries?

Greece. Stefanos Kollias NTUA Greek NRG Representative. Map of Greece, late 17 th -early 18 th century Egg tempera on panel Benaki Museum

Podiceps nigricollis nigricollis Europe/South & West Europe & North Africa

OpenAIRE: a pillar for Open Science in the EU

WAY TO A DIGITAL NATION

OpenUP. IRCDL 2018 Udine, Gennaio

Assessment of Smart Machines and Manufacturing Competence Centre (SMACC) Scientific Advisory Board Site Visit April 2018.

WIPO Development Agenda

Crex crex Europe & Western Asia/Sub-Saharan Africa

Copyright 2008, Paul Conway.

From Observational Data to Information IG (OD2I IG) The OD2I Team

The Digital National Library of Scotland Strategic Plan

Promoting citizen-based services through local cultural partnerships

Final Report. MAASiFiE. Report Nr 1.2 May 2017

Transcription:

Serving the humanities: daydreams and nightmares Steven Krauwer CLARIN ERIC Future of Language Resources 1

Overview CLARIN in a nutshell The dream The vision Phasing CLARIN ERIC The nightmares Action lines The future of language resources Future of Language Resources 2

CLARIN in a nutshell Common Language Resources and Technology Infrastructure (http://www.clarin.eu) Basic idea: European federation of digital repositories with language data and tools (text, speech, multimodal, gesture ) with access to language and speech technology tools through web services to retrieve, manipulate, enhance, explore and exploit data with uniform single sign-on access to archives and tools target audience humanities and social sciences scholars to cover all EU and associated countries and all languages relevant for target audience 3

The CLARIN dream give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943) find European TV news interviews that involve speakers with a Bavarian accent summarize all articles in European newspapers of August 2010 about OCR in Finnish show me the pronoun systems of the languages of Nepal 4

The vision: the role of language Language is at the heart of many disciplines in the Humanities and Social Sciences (HSS), e.g. as an object of study as a means of human communication as a means of human expression as a record of our history as part of one s cultural identity as carrier of knowledge and information CLARIN wants to support them all Language and speech technology are part of this (e.g. in the form of computational linguistics or speech science) but just a part! 5

The vision: what CLARIN wants to offer CLARIN makes it possible for the researcher to find resources (metadata search), and to refer to them in a persistent way (persistent identifiers) CLARIN allows for content search in and across collections CLARIN offers access to web services and workflows to perform complex linguistic & content operations and visualisations CLARIN covers both historical and contemporary language material in all modalities CLARIN serves both expert and non-expert users CLARIN offers access to depositing and long term preservation services 6

Phasing of CLARIN Does CLARIN exist? Yes and no. 2008-2011: CLARIN Preparatory Phase Project, EC funded Goal: designing the infrastructure technically and organisationally, and lining up the players 2012-2015 Construction Phase, jointly funded by the participating countries, no EC funding Goal: building the European infrastructure 2015- : Exploitation Phase, jointly funded by the participating countries, no EC funding Goal: making and keeping it running, populating it, and ensuring that it follows new trends in technology and research 7

CLARIN ERIC CLARIN ERIC is the governance and coordination body, but will not run or fund operational data services An ERIC is new type of intergovernmental legal entity, created by the EC, essentially a consortium of countries, with no end point CLARIN ERIC member countries pay a modest annual fee Countries will each set up a national CLARIN consortium, that will provide data and linguistic services and create data and tools It is up to the countries to decide how to shape and fund their CLARIN consortia and how to relate them to other activities at the national level (e.g. research programmes, digitisation programmes, etc) CLARIN ERIC established by the EC on Feb 29th 2012, with 9 founding members: AT, BG, CZ, DE, DK, EE, NL, PL, DLU More in the pipeline but we want all European countries in! 8

What is so nice about ERICs? They are legal entities, not projects, which helps to make them more sustainable Members are governments, committing themselves for longer periods of time (min. 5 years) CLARIN ERIC is a sign of recognition by governments and EC of the importance of sharing language resources Closeness to funding agencies may help to enforce use of standards and sharing of data in projects they fund Good starting point for international collaboration as third countries can join or make collaboration agreements (e.g. through agencies or data centres) ERICs may submit proposals for EC funding But: bulk of the funding dependent on funding mechanisms and cycles in participating countries 9

The CLARIN nightmare (1) give me digital copies of all contemporary documents in European archives that discuss the Great Plague of England (1348-1350) give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943) find European TV news interviews that involve speakers with a Bavarian accent summarize all articles in European newspapers of August 2010 about OCR in Finnish Show me the pronoun systems of the languages of Nepal 10

The CLARIN nightmare (2) give me all negative articles about Islam or about soccer in the Slovenski Narod daily newspaper (1868-1943) Newspapers are commercial products are we sure that we can access them freely? Many digitized newspapers are just pictures how can we analyze their structure, and do we have usable OCR to read them? Topic and attitude extraction tools exist but do they exist for Slovak, do they fit together and will the same tools still be available in 5 years time? How to formulate such a query without technical knowledge? 11

The CLARIN nightmare (3) Do HSS scholars realize at all that they should be interested in these things? Some do, most don t; we should make an effort to show them the potential benefits of adopting these new methods Showcases and visualisation tools are indispensable Distinguish between lost and future generation Are the tools offered by language and speech technology the answers to the problems of HSS scholars as they see them? Technologists have a strong tendency to offer more and better gearboxes to people who are just waiting for a bus with comfortable seats Use and adaptation of existing tools may always require intervention by technologically skilled people 12

CLARIN s answer: Action lines (1) Coverage: consolidate 9 members, reach out to others, 15 members in 3 years, 20 in 5 years Legal: common license templates promoted for new and legacy data, collaborate with others, talk to legislators about IPR, establish Access and Authentication for single sign-on Integration of data: standards action plan, tools for mapping, tools for curation; identify priority areas for cross border research Integration of services: interoperability, identify chainable services, work on showcases that convince potential users Preservation: identify at least 1 centre per country, work on change of culture, follow broader data initiatives; in 3 years all data and tools from funded projects deposited 13

CLARIN s answer: Action lines (2) Ease of access: Knowledge Sharing Infrastructure to support ease of access, awareness, training & support, curricula development, centres of expertise; Portal targeting different audiences; emphasis on interfaces and visualization Crossing borders: use language as vehicle to collaborate with other disciplines; inter-research Infrastructure and international collaboration; explore industrial collaboration models Sustainability: demonstrate societal impact; review sustainability models; after 3 years vision and strategy 14

The future of language resources There is still a lot of work to do but it is good to realize that CLARIN is not a project: it has a start but no fixed end Legacy resources need to be upgraded, new resources should comply with community standards from the start Further development of language and speech technology for all languages (from big to small) is essential, but it should be kept in mind that proven technologies may not work for older variants of languages, and require adaptation Much effort needed to ensure adoption of digital methods in the humanities and social sciences (education, showcases, visualisation) None of the problems above are local to Europe international collaboration is the best way forward 15

Congratulations LDC! Questions? 16