UQ s Digitisation Journey : steep but interesting. Christine Heslehurst

Similar documents
FRASER Digitization Standards

B R I E F I N G P A P E R

,!1.,,,. Uni^rig. Digital ^ of Records Procedure. Digital Imaging of Records Procedure 8/2/ Approved by Approval date

Starting a Digitization Project: Basic Requirements

Scanning. Records Management Factsheet 06. Introduction. Contents. Version 3.0 August 2017

Module 1A: Record images of ledger/card or catalog/field notes (materials not stored with specimens)

Digitisation Plan

DIGITISATION. Facilitator: Christine Ianna

Book Scanning Technologies and Techniques. Mike Mansfield Director of Content Engineering Ancestry.com / Genealogy.com

Case Study. British Library 19th Century Book Digitisation Project

The CLA HE Trial Scanning Licence how we re using it.

Scanning Techniques to Create Accessible PDF Documents

University of Massachusetts Amherst Libraries. Digital Preservation Policy, Version 1.3

Digital Record. AND Digitization. Standards

RESEARCH DATA MANAGEMENT PROCEDURES 2015

Title: Self-service, Express Digital Scanning: KIC Bookeye High Speed Scanners

DEVON & CORNWALL C O N S T A B U L A R Y

Vasantrao Naik Marathwada Krishi Vidyapeeth University Library, Parbhani No. U.L. /CIS/671/18 Date: 03 September 2018

Recordkeeping for Good Governance Toolkit. GUIDELINE 15: Scanning Paper Records to Digital Records

Category: Data/Information Keywords: Records Management, Digitization, Imaging, Image capture, Scanning and Indexing

Antenie Carstens National Library of South Africa. address:

So you want to digitize?: Maximizing the value of a digitization project

CIPHERLAB 1166 Bluetooth Scanner

product range SMASCANNERS.COM SMA Electronic Document GmbH Södeler Weg Wölfersheim Germany Phone: Fax:

townsweb superior digitisation & software services

Portable, compact document scanner ADS SCAN

State Library of Queensland Digitisation Toolkit: Scanning and capture guide for image-based material

Ramping it up: 10 Lessons learnt in mass digitisation

Library Special Collections Mission, Principles, and Directions. Introduction

Follow this and additional works at: Part of the Archival Science Commons

GUIDELINES FOR EXPRESSIONS OF INTEREST

Everyday document scanning ADS Portable compact document scanner. r.co.uk/scanners/compact-scanners SCAN WIRELESS

Mass Digitization of a Scientific Biodiversity Collection

Country Paper : Macao SAR, China

FamilySearch Tools for Advanced Users

3 Suggested workflow for Photographers and Compilers

The Electronic Darkroom: Improving Artifact Presentation

HOUSE OF COMMONS JOB DESCRIPTION

Best Practices for Scanning Tax Documents

WORLD LEADERS IN COMMERCIAL IMAGING SOLUTIONS

BookDrive DIY. The V-shaped book scanning solution. atiz.com

Image Digitization: Best Practices and Training

IFLA International Newspaper Conference

A Digitisation Strategy for the University of Edinburgh

DIGITISATION FOR PRESERVATION AND ACCESS A technical perspective

These project cannot be made up after the due date. (Each exercise is worth 25 points)

DISA Guidelines for Best Practice. No 2. Digital conversion of text, photographs and archival material

YOUR IMAGES. CROWLEY EXPERTISE.


Satellite Meeting "Conservation and preservation of library material in a cultural-heritage oriented context" 31 August - 1 September 2009 Rome, Italy

Wisconsin Heritage Online Digital Imaging Guidelines QUICK GUIDE TO SCANNING

THE DRAFT NATIONAL POLICY ON DIGITISATION AND INSTITUTIONAL CAPACITY BUILDING

Publishing Tips. Submitting Your Article: Ways to Submit

BOOK BUILDING. for beginners. Lightroom Tutorial by Mark Galer

Editing your digital images:

SUBMITTING A PRESS-READY COVER For Paperback Books with Perfect Binding, Plastic Comb, and Plastic Coil Binding

escan Open System - A3

BEST PRACTICES FOR SCANNING DOCUMENTS. By Frank Harrell

Digitise this: converting content

The Electronic Darkroom: Turning Bad Photographs into Useful Line Art

***************************************************************************** DRAFT UFGS- 01 XX XX (FEB 2014)

EXHIBITION PRINT AND PROOF FROM FILE

Preparing Images For Print

NBU Digital Collection Bulgarian Cultural Heritage Materials

Preservation of nuclear information and records

f r o m m a n u s c r i p t t o m a r k e t p l a c e

Camera RAW 101 By Jon Canfield

UNIONOVO PN I. A2+ professional high-speed book scanner. Less cost, less waste, high-tech environment-friendly product

Desktop document scanner ADS

STRATEGIC DIRECTIONS A world-class and innovative information service and learning space

Portfolio Primer University of Minnesota School of Architecture College of Design

Treasure your archive

Proposal for the IGD-TP Knowledge Management Portal

W i n d o w s. ScanGear CS-S 4.3 for CanoScan FB1200S Color Image Scanner. User's Guide

1/31/2010 Google's Picture Perfect Picasa

Mark Sullivan Digital Library of the Caribbean

MAV-ID card processing using camera images

Oxford Scholarship Online

Managing e-content : Ways and Issues

Epson Scanner (Expressions Photo) Basic Directions:

DataCapture Transcript Module Getting Started Guide

Digitisation Benchmarking Project Report. December 2013

Scanning: pictures and text

Preparing for digitisation Skill session 15 September 2007

Introduction.

The Library's approach to selection for digitisation

DIGITIZATION INITIATIVES AND UNIVERSITY LIBRARIES IN INDIA

Specification for Quality Control (Version 1.1) (March. 2006)

UNIONOVO CP II. A2+ professional high-speed book scanner. Less cost, less waste, high-tech environment-friendly product

Liaison 2015 at Swinburne: definitely a work in progress. Derek Whitehead May 2010

Photography is everywhere

Press-Ready Folder Guide

Digital Design and Communication Teaching (DiDACT) University of Sheffield Department of Landscape. Adobe Photoshop CS5 INTRODUCTION WORKSHOPS

6 Basic Tech Skills Every Author Must Learn - BookWorks

all editorial writing.

Digital Preservation Policy

Design and Technology Subject Outline Stage 1 and Stage 2

Selective Editing in Camera Raw 5

Academic Publishing Guides. Publishing Your Book. Word

Large Format Scanning in the Libraries

Transcription:

UQ s Digitisation Journey : steep but interesting Christine Heslehurst ABSTRACT Digitisation at the University of Queensland (UQ) Library was born in 2009 with the arrival of the Kirtas APT2400 robotic scanner, the result of the Library s involvement in a state-wide digitisation project. Library staff had to quickly learn how to use the scanner and the associated software to produce high quality searchable PDF files with a file size suitable for repositories. This was the birth of the Digitisation Team. To digitise an item prior to 2009, it was necessary to first photocopy and then scan it. As digitisation is now an integral component of library services, the team has developed its skills which include the use of a range of digital equipment including the robotic scanner (Kirtas), multifunctional devices, flat bed and AO scanners and the occasional photograph. Staff recruited for the unit must have skills relating to digital image manipulation. Staff use commercial and proprietary software to meet the required quality standard and are also involved in drafting specifications for technique improvements and outsourcing. Services routinely undertaken include digitisation of UQ theses for the University s electronic repository (UQ espace) and document delivery, book chapters and journal articles for Course materials reserves, as well as out-of-print materials and accessibility tagged items for vision impaired clients. Items added to the electronic repository are linked to the catalogue to facilitate easy access by internal and external clients. This paper includes statistics (which depict the time required for digitisation and digital cleaning, as well as the variety of digitisation material undertaken by the unit) and photographs (which illustrate various work functions). The paper traces the journey from 2009 to today, discussing the range of material for digitisation, the equipment used, the processes utilised, the deposit locations and the future scenario for UQ. IN THE BEGINNING Most staff working in libraries have been involved in digitisation for a number of years, e.g. producing readings for course reserves and document delivery files for clients. The production of these files has probably involved the use of multi functions devices/copiers (MFDs) and Infotrieve s Ariel system. UQ Library Course readings were produced using an MFD, then sent to the staff members desktops. Files were processed using either PaperPort or Adobe Acrobat Professional, but even then the quality of the resulting file was questionable. A lot of time and effort was required (with the risk of wrist overstrain injury with repetitive actions in PaperPort), to make the files presentable, i.e. remove the black edges and centres, deskew or straighten the pages. The files were not keyword searchable as they were not OCR d (optical character

recognition), however the electronic files were still a vast improvement on the photocopied readings on which students had previously had to rely. In mid-2009, after testing various settings on the MFD (black versus grayscale, 300dpi versus 200dpi) in conjunction with processing the files in Adobe Professional, we were able to introduce procedures for Course Work digitisation in the Social Sciences & Humanities Library, the largest branch library at the St Lucia campus. KIRTAS ARRIVES Queensland's 150th celebrations (Q150) took place across Queensland from January to December 2009. During this time, the Fryer Library and the Centre for the Government of Queensland combined skills and resources to produce a web gateway to unlock sources and scholarship on Queensland. The project became known as Queensland Past Online (QPO) and digitised out of print books, parliamentary debates, Research Higher Degree (RHD) theses and back issues of the Journal of the Royal Historical Society of Queensland - all pertaining to the history of Queensland. The Kirtas APT 2400 Book Scanner was acquired as part of the project and digitisation for us changed dramatically. Kirtas has three computers, two cameras and a robotic book cradle, scanning bound books at a rate of up to 2400 pages per hour. With a cradle that keeps books open at 110 degrees, the device provides low stress support for rare and fragile books. The cradle capacity is approximately 4½ inches. By adjusting the suction head size to suit the items, Small or Large, Kirtas is capable of scanning quite small and large items. The arrival of Kirtas created great excitement about its capabilities throughout the library, however for us it also created trepidation how do we use it, where do we get the staff, what s the project timeline, where do we put the files? The most important initial hurdles were learning to use Kirtas, including the Workplace Health and Safety (WH&S) issues, file size and staff recruitment. The user s guide, while very good, didn t provide all the solutions, i.e. what brightness or contrast should be used? Technical support was via email to Singapore who in turn, at times, had to go back to the USA for technical support. Not a quick way for novices to find a solution to a problem. How do we handle the extremely large file sizes - do we reduce the quality, can a client download the large files, can we upload the large files to UQ espace? Most importantly, how do we recruit staff, what is the required skill set, what is the allowed shift length, casual versus contract employees and the span of work hours? The WH&S audit and risk assessment was conducted. As a result, staff operating Kirtas were only permitted to scan for two hours at a time because of the risk of repetitive strain Page 2

issues. In addition, hearing protection was recommended due to operating noise. Even though the carriage is height adjustable, there is the option for staff to utilize a height adjustable stool during shifts. As the high resolution cameras proved very enticing to camera enthusiast library staff, Kirtas staff also needed security cards to access the digitisation room. We worked our way through the issues, sometimes distressingly slowly, sometimes with expertise from our staff, sometimes by accident. Mistakes were made, but we learned from them e.g. don t start with the earliest and most difficult typed or roneoed theses, learn to use Kirtas with more modern theses with margins, printed not typed, minus glued-in photos, maps and graphs. During the project Kirtas ran from 8:00am 8:00pm, operated by casual staff. These staff members were predominantly university students with great photo editing and IT skills, working two hours shifts, so rostering (around class schedules) was challenging. Changing to modern theses and books enabled us to identify the correct processing standards. Quality control was added to our work process to identify errors (missed pages, duplicated pages, pages missing from theses which required a note to be added to the file) for correction before the file was marked as completed and the images deleted. UQ espaces file size limit was increased to allow us to deposit large files, while exceptionally large files were split into several logical and manageable files. File size and quality issues were finally resolved by optimizing files in Adobe Acrobat Professional. Difficult pages (pages with bleeding or eroded text from carbon paper or roneo, or text visible from the page beneath), maps and images (graphs with variations of grey or black shading or fine line graphs with multiple colours) were enhanced in Adobe Photoshop before being reinserted into the final file. Improved scanning techniques produced a flattening of page curvature, resulting in a noticeable improvement in the final image quality. This was particularly true for theses and hard bound books but, paperback books remained a major problem as it is very hard to flatten out a paperback without breaking the spine. Scanning and processing times improved with increased expertise, mentoring by experienced colleagues and the introduction of the third party software. As staff gained confidence and expertise in scanning, there were fewer pickup issues (where the suction head failed to pick up the page or picked up two pages at once) with a marked reduction in duplicate pages and rescans for missing pages. The metadata and file upload process into UQ espace was refined, resulting in a more efficient and accurate record. Most of the metadata for theses and books is copied and pasted from the University catalogue. However at times there is no matching catalogue record and staff initially check Trove in the hope there may be a record to copy, otherwise they have to create the metadata from the thesis. This is not an easy task for non-library trained staff. If a thesis contains an abstract this is also copied (with the original formatting) into the record. Initially this was a laborious copy and paste task as the template field is very small, making formatting extremely difficult. However a hint from a visiting staff member enabled us to update our load process by Page 3

PROJECTS formatting the abstract in MS Word, then dragging and dropping it into the template. This resulted in a huge reduction in time and frustration, particularly when formatting mathematics or physics abstracts with large quantities of scientific symbols (staff shared hints and websites on the location of hard to identify scientific symbols). The Queensland Past Online project was completed with the launch of the website Text Queensland, a unique and dynamic collection of full-text, searchable digitised sources on Queensland colonial and state history http://www.textqueensl and.com.au/ Books selected by the Queensland Parliamentary Library, belonging to the O'Donovan Library Collection. These included the Analytical and classified catalogue of the Library of the Parliament of Queensland / compiled by Denis O'Donovan and Author-list of additions to the Parliamentary Library of Queensland 1883-1890 / by Denis O'Donovan Excellence in Research Australia (ERA), which aims to identify and promote excellence across the full spectrum of research activity in Australia s higher education institutions, 2010 and 2012. The Digitisation Unit was responsible for the digitisation of all books required for the evaluation of the quality of the research undertaken by the University of Queensland against national and international benchmarks. In 2010, 97 books and in 2012, 67 books were digitized. The Economics & Business Library (ECOB) theses & MBA projects were digitised prior to the Library redevelopment and collection dispersal. Statistics of the Colony of Queensland (1872-1900) and Statistics of the State of Queensland (1901-1962). Books selected by the State Library of Queensland for special events e.g. Memories from a Forgotten People: 150 years of Australian South Sea Islander contributions to Queensland. One of these titles was Report, together with minutes of evidence and proceedings of the commission and appendices. Queensland. Royal Commission on the Sugar Industry.; William Henry Groom. Pub.1889 Special requests, e.g. Australian Journal of Communication (1982-2009) was digitised for an academic staff member (the editor of the magazine, who lost her personal copies in the 2011 South-east Queensland floods) and four titles required by the University of Queensland Press. Page 4

HOUSEKEEPING Extensive use is made of spreadsheets to record every facet of the digitisation process for both Kirtas items and MFD-generated Course materials items Digitisation work lists, in this file each staff member has a colour that identifies their work, each coloured tab is a work list e.g. the red tab is for Thesis supply TALS Course materials list, full details for each item required for digitisation for the semester s teaching is recorded on the list Maintain future project digitization lists, e.g. Honours (1500 items) and Masters theses (5500 items) lists Procedures for Course materials scanning on the Konica MFD are maintained and updated as required, e.g. the installation of a new model of the Konica MFD required all the screenshots to be updated Ensure that procedures for each facet of the digitisation process using the Kirtas Book Scanner and other peripherals (e.g. A3 flatbed scanner and the AO scanner) are maintained and updated Have contingency plans for scheduled service of the cameras and unscheduled Kirtas stoppages or breakdowns, e.g. have in reserve sufficient scanned items for staff to process and/or files to be uploaded to espace Page 5

CURRENTLY The predominant work is still to digitise RHD theses with the upload of the metadata and files into espace and then to link the UQ espace PID to the catalogue record. Thesis Supply (Doc Del) for the University since 1 July 2013. This includes EFTPOS payment processing, in addition to the digitisation and supply of the files, delivered to the requestor by CloudStor http://www.aarnet.edu.au/ The transfer of the entire process to digitisation from central Document Delivery streamlined the delivery of the requested thesis. This change has reduced the supply time and enabled verification of file delivery to the customer. While the payment change to prepayment by Credit card has created difficulty for some overseas customers, most feedback about the service and new process is positive. Digitisation and electronic tagging of items for blind and visually-impaired researchers using CommonLook PDF. A plugin for Adobe Acrobat, it allows verification and correction of tagging order and semantics in a dedicated, auditable workflow and enables correct structuring of Tables, Lists and Tables of Contents. In Semester 2, 2012, UQ Library trialled a new service, to scan books on request using the Kirtas scanner, for UQ staff and research students with a visual impairment, where no ebook copy was available which would meet their needs. The service was publicised to Library staff through elinks; it was expected that Library staff would inform academic staff where relevant, through their contacts with academics. Student Services was informed of the new service, and Student Services Disability Advisers referred eligible RHD students to the scanning service. Information was also added to the Library webpage, under Services for Users with a Disability. In regards to this project, the extent of tagging needed depends on: The individual needs of the client (for example, the type of visual impairment); The nature of the original document (for example, documents containing images, tables, footnotes etc. will require more tagging), and The type of screen-reader software used by the client Page 6

The results: Books which are important to their work have been made accessible to five UQ researchers with a visual disability. The Library has worked flexibly and responsively to provide a product which is truly accessible to the client. There is now improved communication between the Library s Digitisation Service and Student Services Alternative Print Service. The Alternative Print Service (which uses more basic equipment) now consults the Library regarding technical questions and more complex scanning jobs. This improves the product delivered to students. The time spent researching and understanding the new ISO standard has had flow-on benefits for other areas of digitisation, including: 1. Discovering new methods of ensuring that page numbers in the Adobe document are consistent with the page numbers in the print book 2. As better methods were identified, the process for all of the Digitisation Unit s work has been altered to produce a more accurate OCR of documents 3. In our view, in the long run, this process is going to have additional benefits for the standard of electronic documents the Library can produce The recommendation that this trial service to research staff and students be adopted as an ongoing Library service was approved by the Library Executive. Digitisation of Out-of-Print books for Course materials for Teaching & Learning Services (TALS) and the upload of the files to a secure collection within espace accessible only by staff and students. The digitisation requirements (600 dpi, grayscale, high density with background minimisation, page separation and centred) while high, are not as stringent as those applied for items digitised for long term preservation in espace All Course materials digitisation for the St Lucia campus (i.e. every book chapter, journal article or Document Delivery item required for Course work teaching) is scanned and / or processed by the unit Every item to be digitised is entered on the spreadsheet for statistical and tracking purposes i.e. the course code, title, chapter/pages, scan date, reference verification, file name The movement of items into and out of the unit is recorded 644 items were digitised in Semester 1 (with only 10 items rejected for either an incorrect reference, or copyright issues) and to date, in Semester 2, 691 items have been digitised Page 7

WHAT NEXT A tender has been let to digitise 1250 RHD theses for the University. They will be sent as one batch to the successful tenderer, with the files to be delivered back to the library as they re completed for ingest into espace. Both the shipping, return and checking of the electronic files before ingest will be labour intensive. Review of the Memorandum of Understanding with the State Library of Queensland A working party has been established to investigate the purchase of new digitisation equipment for the library. Among the items are a flatbed scanner with capacity for A4 - AO size digitisation, a new generation robotic scanner and a new generation microfilm scanner Digitisation of selected pre 1954 maps (copyright free) and other selected Fryer Library items Quotation supplied to digitise legal documentation as a special project The Library is investigating the Hathi Trust digitisation specifications with the aim of joining the Hathi Trust Partnership FINAL THOUGHTS Good specifications are required for any project you are about to begin, so check the latest standards Manage others expectations; be realistic about the time taken to complete a job. While priorities can be adapted, you can t perform miracles Digitisation consists of multiple processes from scanning to discovery, each of which takes time Equipment dates quite quickly and technology is continually evolving, so factor in the cost of replacing your equipment Be aware, it always takes longer than you expect. This is critical for older material, e.g. when preparing items for external digitisation, every item needs to be checked for loose pages or photos which will need securing Page 8

FURTHER READING Centre for the Government of Queensland n.d., Text Queensland, viewed 10 September 2013, <http://www.textqueensland.com.au/>. Hathi Trust n.d., Getting content into Hathi Trust, viewed 9 October 2013, <http://www.hathitrust.org/ingest>. Library of Congress 2013, PDF/A-1, PDF for Long-term Preservation, Use of PDF 1.4, viewed 09 October 2013, <http://www.digitalpreservation.gov/formats/fdd/fdd000125.shtml>. National Library of Australia n.d., Digitisation guidelines, viewed 9 October 2013, <http://www.nla.gov.au/standards/digitisation-guidelines>. NetCentric Technologies n.d., CommonLook PDF : a plugin for Adobe Acrobat, viewed 09 October 2013, <http://www.pdfa.org/commercial/commonlook-plugin-for-adobe-acrobat/>. State Library of Queensland - Digital Standard 2 Digital capture & format, version 3.01 n.d, viewed 9 October 2013, <http://www.slq.qld.gov.au/ data/assets/pdf_file/0009/139815/digital_standards_2.pdf>. The APT BookScan 2400 & 2400RA n.d., viewed 10 September 2013, <http://www.pimage.com/kirtas%20apt%202400_16.pdf>. The University of Michigan University Library Digital Library Production Services Digital Conversion Unit 2010, viewed 09 October 2013, <http://www.hathitrust.org/documents/umdigitizationspecs20100827-ccbylicense.pdf>. Page 9