Version 3.0 August 2017 Scanning Records Management Factsheet 06 Introduction Scanning paper records provides many benefits, such as improved access to information and reduced storage costs (either by discarding the originals or placing them into cheaper, long-term storage). Before carrying out any scanning, however, it is advisable to assess the costeffectiveness. For instance, it may be inappropriate to scan records with very short retention periods (e.g. 6 months or less), or records that are unlikely to be consulted prior to disposal. Where records require long (e.g. over 7 years) or permanent retention, thought must be given to the on-going preservation of the digital image (see Factsheet 07: Preservation). Sometimes the original may need to be retained after scanning, for example, if a professional accrediting body refuses to accept scanned copies. Where records have historic value, it would also be inappropriate to dispose of the physical original. For further advice on historical records, contact the University Archivist. In addition, a decision should be made about the date range of the records to be scanned. There are a number of options: Contents Introduction 1 Preparation 2 Resolution 2 Colour 2 Formats 2 Quality control 3 Indexing 4 Access and security 4 Retention and preservation 4 Optical character recognition 4 Discarding hard copies 5 To fix a date after which all newly-received records will be scanned; To scan records relating to a particular range of years; (e.g. 6 years old or less) To scan all existing paper records, regardless of date; To scan on demand (i.e. scan records only when access is required). Further information 5 The quantity, size and condition of the records, as well as the required resolution and speed, will determine the type (e.g. rotary, flatbed or planetary) and quality of scanner that is to be used. If the volume of material is high, it may be necessary to out-source the work to a professional scanning bureau. The University s Printing Services team can provide a scanning service. Contact Andrew Gambling for further details. University of Portsmouth
P a g e 2 Preparation Records should be prepared before scanning: for example, staples and paper clips may need to be removed, and sheets extracted from folders and arranged in batches. For multi-page records it may be necessary to mark the start of each item by inserting a header sheet or placing a barcode on the first page. If an automatic setting is to be used, any double-sided records will need to be separated from single-sided items. If using a sheet feeder, it may be advisable in some cases to photocopy the sheets before scanning them: Fragile or damaged records that might be damaged by the scanner; Records that are too large to be scanned and have to be reduced in size; Records where the quality of the image (such as the contrast) will be improved by photocopying beforehand. If proving the authenticity of the images will be important, it should be made clear whether photocopies rather than originals have been scanned (by, for example, stamping the papers with the word photocopy.) Unless a planetary scanner is being used, large records (e.g. A2 size) will need to be scanned in sections, and care taken to ensure the sections overlap so that no information is lost at the edges of the images. Finally, if notes have been attached to records, obscuring the text, it may be necessary to carry out the scanning twice - firstly with the notes attached and secondly with them removed. The file size of your scan will be affected by the settings you use Resolution The higher the resolution, the larger the file size will be. To determine the appropriate resolution, tests should be carried out on a set of sample records. If a record contains both text and graphics, it may in certain cases be necessary to scan it twice, using an optimal setting firstly for the text and secondly for the graphics. Colour A decision should be made whether to scan records in black and white, greyscale or colour. Greyscale will be ideal for documents that have poor definition or low contrast. For colour records, tests should be conducted to establish the level of accuracy required. 8 bit colour provides 256 different colours 24 bit colour provides approximately 16 million different colours Using 24 bit colour will increase file sizes considerably and should only be used if it is essential to reproduce every tonal variation exactly. Formats The longer scans are held, the greater the risk that it could become unreadable. Wherever possible it is best to use open-source, lossless file formats (for example: PDF, PDF/A, TIFF etc) in order to reduce the danger of records becoming trapped in obsolete technology. These are standards for which the underlying programming code has been published, so that they are not dependent on the continued support of one particular company.
File sizes of images can be reduced by using compression, either lossless (which decompresses the file, but ensures it remains identical to the original) or lossy (which removes some information from the file so it is not an exact copy). Lossy compression may compromise the evidential value of the images, and should not be used if it is essential to preserve every detail of a record. In some cases, it may helpful to store two sets of images: one set of compressed files suitable for everyday use (which can be printed and retrieved quickly), and one set of uncompressed files that could be made available for evidential purposes (should it be necessary to prove the integrity of the information, for example, in the event of a legal dispute). P a g e 3 Examples of file formats (this list is not exhaustive) BMP GIF JPEG & JPEG2000 PDF & PDF/A PNG PSD TIFF Bitmap: A proprietary format for graphics used by Microsoft Windows. It provides 24- bit colour and uses lossless compression. Graphics Interchange Format: This provides 8-bit colour and is suitable for images that contain blocks of colour (e.g. logos, banners). It is a widely-used format and allows lossless compression. Joint Photographic Expert Group: A non-proprietary format (open-source) and its specification is freely available. It provides 24-bit colour and uses lossy compression; it is ideal for storing complex colour images (such as photographs). JPEG 2000: A published international standard; it offers both lossy and lossless compression, and the image quality at smaller file sizes is higher than that of a JPEG. Portable Document Format: A format developed by Adobe Systems; it is widely used and its specification has been published (open-source). It is ideal for text documents and it prevents the content from being edited. It can also be used with OCR technology to create searchable documents. PDF/A: This is an open standard designed for long-term storage (archiving). It is intended to preserve the visual appearance of documents, regardless of the systems used for creating or storing the files. One of the key difference between PDF and PDF/A is that the latter allows the user to embed fonts during creation, so that type-faces will always render as intended. Portable Network Graphics: An open source standard that can provide 8 or 24 bit colour, and uses lossless compression. It is suitable for all types of graphics. Photoshop Document: A proprietary format used by Adobe Photoshop for images. It may be either compressed or uncompressed. Tagged Image File Format: This is a widely-supported, open-source image format. It can handle monochrome, greyscale, and up to 24-bit colour; it allows lossless compression, and multi-page documents can be saved as single TIFF files. Quality Control When the scanning is completed, the copies should be checked to make sure they are legible and everything has been captured, including the smallest details. It is also important to ensure that no pages have been omitted from multi-page or double-sided records, and that they have been scanned in order. Depending on the quantity of material either each image will need to be checked or a representative sample. The checking process for each scanning project should be documented. If the quality of any of the images is not adequate, the papers should be rescanned and the first copies replaced. It may be necessary to record confirmation of the accuracy of the scanned image, who checked it and when in the metadata against the scanned record. The EDM system allows for this.
P a g e 4 Alternatively, image enhancement or editing software can be used to improve the quality of the copies: for example, de-skewing will improve the alignment, and de-speckling will remove random black marks. When enhancing an image, however, care should be taken to ensure that it remains an accurate representation of the original; image processing may affect the evidential weight of an electronic document and, in some cases, it may therefore be advisable to retain two copies, one before enhancement and one after. The scanners themselves should also be regularly checked to ensure that they are working properly. Test targets (such as BS PD 0023: Test target for assessing output quality of black-and-white document scanners) can be used to assess whether a scanner is performing consistently and according to its specification; the targets will measure characteristics such as legibility, thin line detection and dimensional accuracy. Indexing A unique reference (such as, a system-generated sequence number) should be allocated to each text or image file to aid retrieval. In addition, the entire collection of records will need to be indexed (e.g. by date, surname or subject), so that it will be possible to locate and retrieve individual items easily and quickly. The index should make it clear whether the images are single page items or form part of multi-page records. It may also be helpful to record some additional metadata (such as, the date and time when the images were captured, who carried out the scanning, and the settings used), especially if the records are likely to be required for evidential purposes. The index and other metadata could be stored within a database or spreadsheet or an image management system. Access and security The scanned records must be accessible to all staff who require them to carry out their work and, if any of the records are personal or confidential, then access will need to be restricted to authorised staff only. The most secure place to store the scanned records will be on the EDM system where they will be subject to a comprehensive audit trail and regularly backed up and protected from software and hardware failure. The K drive is also backed up, but not audited. Retention and preservation Retention periods should be assigned to the scanned records, so that they are only retained as long as necessary to conduct business and to comply with legal and regulatory requirements as per the University Retention Schedules. In addition, the records will need to be reviewed periodically (e.g. every five years) and, if necessary, converted to new file formats, so that they do not become trapped in obsolete technology. In some cases it may be necessary to keep records of the migration and conversion processes in order to prove that the integrity of the data has not been compromised. Optical character recognition Optical character recognition (OCR) software can be used to convert scanned images into searchable PDFs or editable text files. OCR cannot, however, provide complete accuracy, and it is therefore essential to check the scanned records carefully after the conversion process. Using a sheet feed can help to ensure the papers are scanned evenly and improve the level of accuracy. The quality can also be improved by keeping the scanners clean to prevent dirt from producing unnecessary marks. In addition, some scanning systems allow thresholds to be set in order to control the quality, and will notify the operator if any captured images fall below the required standards. Converting scanned records into plain text files allows data to be extracted and entered into computer systems without the need to input it manually. Since plain text files are, however, editable and not exact copies of the originals, they are unlikely to be legally admissible. If the evidential value of a record is important, it would be advisable to retain two copies: one as a plain text file, and a second as an unalterable image file.
P a g e 5 Discarding hard copies Before discarding any hard copies following scanning, a risk assessment should be carried out to balance the consequences of losing original evidence against the costs of retaining it. In most cases it will be acceptable to dispose of the originals, provided scans have been checked for quality and indexed, and there are adequate procedures in place concerning security, disaster recovery and preservation. If it is likely that the scanned images will be required for evidential purposes, then clear, consistent procedures must be developed that are compliant with the British Standards Code of Practice for Legal Admissibility and Evidential Weight of information stored electronically, so the copies can be authenticated adequately. It is the responsibility of the business owner of the records to ensure that the relevant professional, statutory and regulatory bodies are happy to ac cept a scanned copy of the original. This agreement must be obtained, in writing, prior to disposal of the originals. Compliance with this code will involve documenting in detail various aspects of the scanning: such as, the date and time when the images were captured, the operator who carried out the scanning, as well as the settings and quality control criteria. In addition, the procedures may need to be audited by a relevant regulatory body: for example, HM Revenue and Customs would need to approve a scanning system before any financial records were discarded. In a few cases, it will be advisable to retain the hard copies because of their legal nature and the importance of being able to produce an original signed record. Although electronic records are legally admissible under the terms of the Civil Evidence Act 1995 (provided their authenticity can be demonstrated), the evidential weight of an original signed document is still likely to be greater. If the original record is disposed, this should be recorded. If the original record is retained, then the storage location must be recorded so that it can be retrieved quickly if required. BS BIP 0008 Code of Practice for Legal Admissibility and Evidential Weight of Information Stored Electronically (Appendix H.9) states: All copies of documents (photocopy, microform or electronic) will be treated by a court of law as secondary evidence, with a potential reduction of weight of evidence if the authenticity of the copy is questioned. For example, where the content of a document is under question, the original or copy should be treated with equal weight, but if a signature is being disputed, then the original document is likely to carry more weight than a copy of it. The ICSA Guide to Document Retention (Chapter 6) also explains: Despite all the rules that now allow copies of documents to be tendered as evidence, there will be circumstances in which the original will be of much greater evidential value. A copy of a document is, for example, unlikely to be given as much weight as the original where the validity of a signature is at issue. It may also be advisable to retain paper copies if they are of low quality, so that it will be possible to demonstrate that a poor quality copy accurately represents the original. Finally, no original paper records must be destroyed, if there is pending or ongoing litigation. Further information If you require any further information, please contact the University Records Manager (recordsmanagement@port.ac.uk or ext. 3390) or visit the records management web pages at www.port.ac.uk/records.