How to Open up? (Digital) Libraries at the Service of (Digital) Scholars. Laurent Romary Inria team ALMAnaCH

Similar documents
Elements of a scientific communication policy

EPISCIENCES - an overlay publication platform

The role of SciELO on the road towards the Professionalization, Internationalization and Financial Sustainability of developing country journals

Stewardship of Cultural Heritage Data. In the shoes of a researcher.

Digitisation Plan

Strategy for a Digital Preservation Program. Library and Archives Canada

Laurent Romary, Inria DARIAH, director DARIAH - SHAPING EUROPEAN RESEARCH IN THE ARTS AND HUMANITIES

Increased Visibility in the Social Sciences and the Humanities (SSH)

RECOMMENDATIONS. COMMISSION RECOMMENDATION (EU) 2018/790 of 25 April 2018 on access to and preservation of scientific information

COMMISSION RECOMMENDATION. of on access to and preservation of scientific information. {SWD(2012) 221 final} {SWD(2012) 222 final}

ORTOLANG: a French infrastructure for Open Resources and TOols for LANGuage

HANDSOME LAMS?: COLLABORATIONS AROUND COLLECTIONS AT YALE UNIVERSITY

Access to Research Infrastructures under Horizon 2020 and beyond

National Perpetual Access & Digital Preservation CRKN & Scholars Portal

The Digital National Library of Scotland Strategic Plan

The 2018 Publishing Landscape: Technological Horizons. Lyndsey Dixon Editorial Director, APAC Journals Taylor & Francis Group

What is a collection in digital libraries?

Moderator: Pauline Simpson. The OpenAIRE Initiative: Fostering Open Science For European Researchers

Online Access to Cultural Heritage through Digital Collections: the MICHAEL Project

Social Networks and Archival Context R&D to Cooperative

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Workshop on the Open Archives Initiative (OAI) and Peer Review Journals in Europe: A Report

Digital Preservation Program: Organizational Policy Framework (06/07/2010)

Digital Preservation Analyst

Digital Preservation Policy

New forms of scholarly communication Lunch e-research methods and case studies

Catching Up: Creating a Digital Preservation Policy After the Fact

Open Access and Repositories : A Status Report from the World of High-Energy Physics

Research Data Preservation in Canada A White Paper

GAMS: More than a Digital Asset Management System

Serving the humanities: daydreams and nightmares

DCH-RP e-infrastructure Concertation Workshop. Laila Valdovska, systemlibrarian Culture Information Systems Centre Tallinn,

Greece. Stefanos Kollias NTUA Greek NRG Representative. Map of Greece, late 17 th -early 18 th century Egg tempera on panel Benaki Museum

Starting a Digital Preservation Program

BRICKS, an example of collaboration between Public and Private. Francesco S Nucci Engineering - Ingegneria Informatica

Open Data, Open Science, Open Access

FSD and CESSDA ERIC: Trusted, sustainable and integrated infrastructures

RESEARCH DATA MANAGEMENT PROCEDURES 2015

INFS 326: COLLECTION DEVELOPMENT MRS. FLORENCE O. ENTSUA-MENSAH

ccess to Cultural Heritage Networks Across Europe

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT, THE COUNCIL, THE EUROPEAN ECONOMIC AND SOCIAL COMMITTEE AND THE COMMITTEE OF THE REGIONS

DEPUIS project: Design of Environmentallyfriendly Products Using Information Standards

Looking for commitment : Finnish open access journals, infrastructure and funding

Open Science policy and infrastructure support in the European Commission. Joint COAR-SPARC Conference. Porto, 15 April 2015

14 th Berlin Open Access Conference Publisher Colloquy session

2018 NISO Calendar of Educational Events

Distributed Robotics: Building an environment for digital cooperation. Artificial Intelligence series

Introduction to Planets. Hans Hofman Nationaal Archief Netherlands Barcelona, 27 March 2009

Europeana and AccessIT Shkodra, Albania 26/27 June 2012 Rob Davies, MDR Partners, Coordinator

PLOS. Open Science at PLOS. Open Access Week, October Nicola Stead, Senior Editor, PLOS ONE

Measuring and Analyzing the Scholarly Impact of Experimental Evaluation Initiatives

Office of Science and Technology Policy th Street Washington, DC 20502

A STUDY ON THE DOCUMENT INFORMATION SERVICE OF THE NATIONAL AGRICULTURAL LIBRARY FOR AGRICULTURAL SCI-TECH INNOVATION IN CHINA

The National Library Service (SBN) towards Digital

Committee on Development and Intellectual Property (CDIP)

Questions for the public consultation Europeana next steps

University of Oxford Gardens, Libraries and Museums Digital Strategy

Strategy EXECUTIVE SUMMARY NATIONAL DOCUMENTATION CENTRE NHRF

Publishing open access: a guide for authors

Methodology for Agent-Oriented Software

TeesRep policy document

Library Special Collections Mission, Principles, and Directions. Introduction

Open Science in the Digital Single Market

Development, Use and Provision of Research Software

LIBER and its EU projects

springer.com The Big Deal A Quest Dr Frans Lettenstrom Director, Library Sales Saloniki November 2011

CO-ORDINATION MECHANISMS FOR DIGITISATION POLICIES AND PROGRAMMES:

INFS 326: COLLECTION DEVELOPMENT

e-infrastructures in FP7: Call 9 (WP 2011)

NCRIS Capability 5.7: Population Health and Clinical Data Linkage

Long term preservation, discovery, access and exploitation of Earth Science data: the CASPAR and GENESI-DR combined approach

National Biodiversity Information System. Brenda Daly South African National Biodiversity Institute

Research on the Impetus Mechanism of Institutional Repositories

SciELO SA: Past, Present and Future (September 2018)

The HL7 RIM in the Design and Implementation of an Information System for Clinical Investigations on Medical Devices

Memorandum on the long-term accessibility. of digital information in Germany

Continuity and change Opportunities and challenges for the future of research libraries in a data-intensive age

REPORT ON THE INTERNATIONAL CONFERENCE MEMORY OF THE WORLD IN THE DIGITAL AGE: DIGITIZATION AND PRESERVATION OUTLINE

Positioning Libraries in the Digital Preservation Landscape

OCLC Global Council April 12, Europeana. Elisabeth Niggemann Director General, Deutsche Nationalbibliothek and Member, OCLC Board of Trustees

PROGRESS REPORT

Why we need a Network of Usage Data Providers - OpenAIRE Impact Metrics Results

Digital Preservation Strategy Implementation roadmaps

Project Title: Submitter: Team Problem Statement

General Briefing v.1.1 February 2016 GLOBAL INTERNET POLICY OBSERVATORY

Standards and privacy engineering ISO, OASIS, PRIPARE and Other Important Developments

Convergence of Knowledge and Culture

International Symposium on Knowledge Communities 2012

Facilitate Open Science Training for European Research

Attribution and impact for social science data

Open Access at the Max Planck Society

The NEW IUScholarWorks at Indiana University. Repositories, Journals, and Scholarly Publishing

The research archives in the digital environment: the Sapienza Digital Library project

REPORT FROM THE COMMISSION TO THE EUROPEAN PARLIAMENT AND THE COUNCIL. on the evaluation of Europeana and the way forward. {SWD(2018) 398 final}

Trends in. Archives. Practice MODULE 8. Steve Marks. with an Introduction by Bruce Ambacher. Edited by Michael Shallcross

MINERVA: IMPROVING THE PRODUCTION OF DIGITAL CULTURAL HERITAGE IN EUROPE. Rossella Caffo - Ministero per i Beni e le Attività Culturali, Italia

Consume, Reproduce, and Extend: Reviving the Research Lifecycle by Capturing and Connecting Our Work

NBU Digital Collection Bulgarian Cultural Heritage Materials

Opening Science & Scholarship

STRATEGIC FRAMEWORK Updated August 2017

Transcription:

How to Open up? (Digital) Libraries at the Service of (Digital) Scholars Laurent Romary Inria team ALMAnaCH

Upside - down Libraries as scientific content openers Forgetting the old duties A pragmatic view based on a national and institutional experience Policies and infrastructures It s the digital turn, stupid! Identifying the role and limits of technologies

A paradigm change Provides content Publishes content Consumes content Publishes content

A paradigm change Once upon a time Collection development Budget management Cataloguing and warehousing A tough history Serial crisis Big deals Green and gold OA Diving in the (cold waters of a) new world Coordinating scholarly publication management Digital content management licencing, persitent identifiers, access statistics Drafts, reports, publication, theses, and data no management of physical content nor of open access (it s in the genes) High-Low low stewardship? Research and learning material

A FAVOURABLE CONTEXT

4,400 People (60 % paid by Inria) Inria a research organisation with a vision 8 Research Centers in France International 41 conferences 66 Scientific publications 4,450 180 Project teams 250 Active patents 1,000 Doctoral students 100 Post-Doctoral 300 R&D engineers 3,500 Scientists A BUDGET OF 265M Vision for a scientific information policy Maximising the dissemination of our scientific assets (visibility and swift dissemination of knowledge), for a reasonable price Constitution of a reliable and sovereign institutional corpus (documentation, preservation, access), with clear public governance principles Contribution to shaping the scientific communication landscape in terms of editorial processes and usage made of scientific productions

Inria scientific information policy in concrete terms Setting up priorities Printed material as disposable goods Deposit mandate on scientific publications Rejecting hybrid open access Engaging in developing new publication models Consequences Less collection development on the basis of our acquisitions (national consortia, national licences) More collection development as part of the digital curation activities

Two additional contextual elements A strong political support State: loi pour une république numérique Published in September 2016 Two articles on open access and text and data mining Higher education and research A wide network of institutions favouring the use of publication repositories A technical infrastructure Centred on a national service Unit: CCSD CNRS, Inria, Université de Lyon Development of a comprehensive scientific information management platform

The infrastructure at our service

HAL a multi-purpose publication repository Services Centralised publishing environment Preprints, articles, theses Institutional collections and portal Individual webpages Library support Decentralised moderation/curation Paper, metadata Support to setting up collections (persons, teams, laboratories) Raising awareness: from pre-print to final publications Technologies Long term archiving Persistent identifiers Deep meta-data scheme Import-export facilities TEI, Bibtex, EndNote, etc. And Grobid

Going beyond the limitations of pdf publishing text-mining Cow (structured data) Hamburger (unstructured data) Converting PDF to XML is a bit like converting hamburgers into cows. You may be best off printing it and then scanning the result through a decent OCR package. Michael Kay (http://lists.xml.org/archives/xml-dev/200607/msg00509.html) Inspired from: Duncan Hull

Structuring content GROBID: information extraction from PDF documents Meta-data: title, authors, affiliations, abstracts Bibliographical references (with crossref consolidation) Standards-based representation TEI (Text Encoding Initiative) as a reference format State of the art performance(crf models) Cf. M. Lipinski, et al., 2013 Used at EPO/ResearchGate/Mendeley/CERN/NASA Integrated in HAL Automatic meta-data extraction for author s deposit

Sciencesconf a conference management tool Services Technologies Library support Online conference management tool Abstracts, full paper, peer review, conference program, registration Modular services Unique authentication system Connection to HAL authorities Upload to HAL Supervises the creation of a conference instance Create conference series Moderates the integration of the papers as a collection in HAL

épisciences an overlay journal platform Services Overlay certification environment Post publication peer review Traditional reviewing environment Invited editors - special issues Technologies OAI-PMH interface with repositories Publication: arxiv, HAL, CWI Data: Nakala, Ortolang Versioning Cataloguing facilities Library support Setting up new journals Editorial support Experimenting new usages: e.g. data journals

AureHAL authorities management Services General purpose authorities Authors Organisations Journals National and EU projects Technologies Connection to external databases ORCID, IdRef, VIAF, etc. Integration in the French organisational framework APIs Standardised export formats (TEI, again) Library support Curates the creation and deduplication of authorities Confronts with external data bases (e.g. EU projects)

AN ENABLING INFRASTRUCTURE FOR ADDITIONAL SERVICES

1. Setting up an institutional digital library IFIP - International Federation for Information Processing One of the major scholarly society in Information & Communications Organized in technical committees and working groups Objective: setting up a sustainable digital library All volumes publishes as IFIP conferences Visibility, trust, technical facilities Publication agreement with Springer Systematic publication in the AICT, LNBIP and LNCS series 3-year embargo before free online dissemination Provision of meta-data (XML Springer) and author post-review manuscripts (PDF) Note: XML Springer formats often contains full-text For Inria: an experiment in ingesting and aggregating legacy collections

IFIP DL - architecture XML-Springer + PDF XML-TEI + PDF Validation, enrichment TEI-ODD Specification XSLT TEI-ODD Specification

IFIP: technical setting Standardized ingestion workflow Towards a unified TEI-based format for scholarly papers Cf. EU Peer project (massive ingestion of preprints) Istex, OpenEdition, HAL, EPO More flexible and extensible than JATS (e.g. full text)

IFIP: editorial management The central role of library staff Quality check: format, content Collection management: conference series and volumes, technical committees, working groups, etc. Maintenance of authorities in AureHAL: authors, affiliations Approximate affiliations and duplicates Creation and maintenance of XML schemas and XSLT style sheets Next step: introducing more automation Objective: limiting boring, repetitive tasks Automatic quality check Entity matching

The entity-matching problem Inria Sophia-Antipolis: Prof Philippe Robert, PUPH at CMRR CHU Nice Hospital, COBTek Inria Paris: Philippe Robert Research Director - Responsable RAP team

Mapping The mapping is a set of domain specific wrappers transforming the input/output from/to the internal data model. The internal data model is composed of: entities attributes relations

HAL authors Local attributes: surname, last name, title, email, etc Comparison using distances depending upon the type of the attribute: e. g. Person Name is computed using Dice Sorensen, Cosine similarity and Jaro Winkler. Relation attributes: co-authorship, affiliations, popularity*, years of activity*, etc. Relations are computed using affiliation information Preliminary results are promising, F-1 score of 0.964

HAL organisation Representation and errors are more frequent

HAL organisation Local attributes: name, address, region, country, etc Relation attributes: mother-child (e.g Inria -> Inria Paris), authors affiliated, global relations* Like for authors, relations are computed using complementary information. Preliminary results are still too low, F-1 score of 0.345

2. Exploiting repository contents: AnHALytics Objectives Designing a scholarly dashboard Scientific profiles: publications and authors Experimenting various data extraction mecanisms Reference content from the HAL publication repositories Meta-data: title, periods, authors, affiliations Full-text: full-text indexing, conceptual search Technical background NERD and Grobid full text Librarians are shaping the service with developpers Researchers, Inria management

S O U R C E S K N O W L E D G E B A S E G R A P H D B Entités de la recherche I N D E X PDF OAI-PMH TEI XML TEI P E R S I S T A N C E... PDF OCR XML/TEI ASSETS O C R G R O B I D (N) E R D K E Y T E R M G R O B I D - Q U B I O G R O B I D... ANNEXES A N N O T A T E U R S

Making things concrete AnHALytics - demo Inria

Plugging additional technologies - 1 Grobid quantities Identifying measured quantities in scientific documents Value, unit, measured entity Multiple search scenarios in a scholarly context Example (example 2, streptomycin)

Plugging additional services - 2 Re-publishing content Grobid-TEI as a pivot format in the publishing environment Generation of multiple derived format HTML epub Braille How far are we from this? Improving the performances of Grobid (e.g. book) Partially implemented in various initiatives Revues.org Istex There again, a central role for the library staff Identifying user expectations Maintaining formats and transforms

Conclusion Digital sovereignty Mastering scholarly content at all stages of the publication process Mastering the whole scholarly process Understanding where and when we need to resort to the private sector Mastering interoperability by means of open formats A general culture of openness Focusing our efforts More brain, budget etc. in shaping this digital turn Stop loosing time in negotiating open access: let s just do it Dedicated research activity on scholarly information management TDM breakthroughs at the service of scientific information Dealing with the quickly evolving digital context Coupling service units with adequate research teams Training future digital librarians Formats, standards and related technologies Data mining and visualisation

References Mabe M. A., Scholarly communication: A long view, New Review of Academic Librarianship, 2010 - Taylor & Francis Hopfield J. J., Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. U.S.A., 79 (1982), pp. 2554 2558 Berthaud Christine, Laurent Capelli, Jens Gustedt, Claude Kirchner, Kevin Loiseau, et al.. EPISCIENCES - an overlay publication platform. ELPUB2014, Jun 2014, Thessalonique, Greece. <http://www.ebooks.iospress.nl/publication/36552>. <10.3 233/978-1-61499-409-1-78>. <hal-01002815v2> Romary Laurent. Scholarly Communication. Mehler, Alexander and Romary, Laurent. Handbook of Technical Communication, de Gruyter, 2012, 978-3-11-022494-8. <inria-00593677>