Catalog Records Retrieved by Personal Author Using Derived Search Keys

Similar documents
Catalog Card Production at Ohio State University Libraries

Dirk W. FOKKER and Michael F. LYNCH: Postgraduate School of Librarianship and Information Science, University of Sheffield, England.

Contents 1 Introduction Optical Character Recognition Systems Soft Computing Techniques for Optical Character Recognition Systems

Texas Hold em Inference Bot Proposal. By: Brian Mihok & Michael Terry Date Due: Monday, April 11, 2005

Several Experimental Trials at the Kanazawa Institute of Technology Library Center Using its Online System

Real Time Word to Picture Translation for Chinese Restaurant Menus

Tommy W. Gaulden, Jane D. Sandusky, Elizabeth Ann Vacca, U.S. Bureau of the Census Tommy W. Gaulden, U.S. Bureau of the Census, Washington, D.C.

PRACTICAL ASPECTS OF ACOUSTIC EMISSION SOURCE LOCATION BY A WAVELET TRANSFORM

1 - Some basic definitions 2 - What is Duplicate Texas Holdem? 3 - How it works

Session 5 Variation About the Mean

WORLDWIDE PATENTING ACTIVITY

Chapter 3 WORLDWIDE PATENTING ACTIVITY

CHAPTER 6 SIGNAL PROCESSING TECHNIQUES TO IMPROVE PRECISION OF SPECTRAL FIT ALGORITHM

The Econoinic Goal of Library Automation

How to divide things fairly

Maximum Likelihood Sequence Detection (MLSD) and the utilization of the Viterbi Algorithm

AUTOMATED MUSIC TRACK GENERATION

An Evaluation of Artifact Calibration in the 5700A Multifunction Calibrator

Lab 8. Signal Analysis Using Matlab Simulink

Section 1.5 Graphs and Describing Distributions

February 24, [Click for Most Updated Paper] [Click for Most Updated Online Appendices]

Using a Sample Technique to Describe

Ring Oscillator PUF Design and Results

Digging Deeper, Reaching Further. Module 5: Visualizing Textual Data An Introduction

Locating the Query Block in a Source Document Image

Chapter 0 Getting Started on the TI-83 or TI-84 Family of Graphing Calculators

Statistical Analysis of Nuel Tournaments Department of Statistics University of California, Berkeley

Chapter 17. Shape-Based Operations

Signal segmentation and waveform characterization. Biosignal processing, S Autumn 2012

ACCURACY AND SAVINGS IN DEPTH-LIMITED CAPTURE SEARCH

Magnetic Tape Recorder Spectral Purity

Item designation in electrotechnology

8.2 IMAGE PROCESSING VERSUS IMAGE ANALYSIS Image processing: The collection of routines and

National Quali cations SPECIMEN ONLY. Forename(s) Surname Number of seat. Date of birth Day Month Year Scottish candidate number

A Note about the Resolution-Length Characteristics of DNA

Patterns of Use in an Online Catalog and a Card Catalog

Implementing Multipliers with Actel FPGAs

Lossless Image Compression Techniques Comparative Study

Matlab Based Vehicle Number Plate Recognition

A comparative study of different feature sets for recognition of handwritten Arabic numerals using a Multi Layer Perceptron

DAR POLICY STATEMENT AND BACKGROUND Using DNA Evidence for DAR Applications

Traffic Sign Recognition Senior Project Final Report

Mark Abumeri. Advantages and Disadvantages of PPH. 9 November 2014 Asian Patent Attorneys Association 63 rd Council Meeting Penang, Malaysia

Chapter Two: The GamePlan Software *

NRZ Bandwidth (-3db HF Cutoff vs SNR) How Much Bandwidth is Enough?

Comparison of the Analysis Capabilities of Beckman Coulter MoFlo XDP and Becton Dickinson FACSAria I and II

Applications of Advanced Mathematics (C4) Paper B: Comprehension WEDNESDAY 21 MAY 2008 Time:Upto1hour

Adaptive Feature Analysis Based SAR Image Classification

Combinatorics. Chapter Permutations. Counting Problems

Chapter 11. Sampling Distributions. BPS - 5th Ed. Chapter 11 1

Measurement Systems Analysis

Coto Technology 9814 Reed Relay

Number Plate Recognition Using Segmentation

Ahti Saarenpää THE DIGITAL LAWYER

Most genealogy computer software programs have options to print a family group number somewhere on the printed record.

Abstract. Most OCR systems decompose the process into several stages:

A computer model of chess memory 1

COMPLEXITY MEASURES OF DESIGN DRAWINGS AND THEIR APPLICATIONS

The value of libraries has been a prominent topic in library literature over the last five years with much emphasis placed on developing assessment

Design and Development of Information System of Scientific Activity Indicators

PROCESS-VOLTAGE-TEMPERATURE (PVT) VARIATIONS AND STATIC TIMING ANALYSIS

Iowa State University Library Collection Development Policy Computer Science

BACCARAT: A LONGITUDINAL MICRO-STUDY

Chapter 4 SPEECH ENHANCEMENT

Game Mechanics Minesweeper is a game in which the player must correctly deduce the positions of

The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.

Lake Bluff Public Library Actions Taken in Response to Patron Satisfaction Survey Drafted: Eric Bailey, Library Director

COMPARITIVE STUDY OF IMAGE DENOISING ALGORITHMS IN MEDICAL AND SATELLITE IMAGES

Trial version. Resistor Production. How can the outcomes be analysed to optimise the process? Student. Contents. Resistor Production page: 1 of 15

Cross-Service Collaboration Yields Management Efficiencies for Diminishing Resources

Automatics Vehicle License Plate Recognition using MATLAB

GPS Carrier-Phase Time Transfer Boundary Discontinuity Investigation

N J Exploitation of Cyclostationarity for Signal-Parameter Estimation and System Identification

(i) Understanding the basic concepts of signal modeling, correlation, maximum likelihood estimation, least squares and iterative numerical methods

III CHAPTER. Analysis Methodology. regarding data sampling, benchmarking indices and overall program layouts.

Dynamic thresholding for automated analysis of bobbin probe eddy current data

Sustainable Development

COMPUSCIENCE. Subject Coverage. File Type. Features Thesaurus None. Record Content. File Size. Coverage Updates.

PCM BIT SYNCHRONIZATION TO AN Eb/No THRESHOLD OF -20 db

Rules for TV-Radio/Audio-New Media Writing Awards

UNIT 13A AI: Games & Search Strategies

Speech Enhancement in Presence of Noise using Spectral Subtraction and Wiener Filter

Image Enhancement in spatial domain. Digital Image Processing GW Chapter 3 from Section (pag 110) Part 2: Filtering in spatial domain

December 12, FGCU Invitational Mathematics Competition Statistics Team

Multi-Site Efficiency and Throughput

SONG RETRIEVAL SYSTEM USING HIDDEN MARKOV MODELS

INTEGRATED COVERAGE MEASUREMENT SAMPLE DESIGN FOR CENSUS 2000 DRESS REHEARSAL

ISSN No: International Journal & Magazine of Engineering, Technology, Management and Research

Master s Thesis Defense

Analysis of Complex Modulated Carriers Using Statistical Methods

CHAPTER 6 PROBABILITY. Chapter 5 introduced the concepts of z scores and the normal curve. This chapter takes

TxDOT Project : Evaluation of Pavement Rutting and Distress Measurements

Simple Counting Problems

Artificial Intelligence: Using Neural Networks for Image Recognition

!! Figure 1: Smith tile and colored pattern. Multi-Scale Truchet Patterns. Christopher Carlson. Abstract. Multi-Scale Smith Tiles

WRITING ABOUT THE DATA

Virtual Global Search: Application to 9x9 Go

A Comparison of MIMO-FRF Excitation/Averaging Techniques on Heavily and Lightly Damped Structures

Classification-based Hybrid Filters for Image Processing

Error Correcting Code

Transcription:

1 Catalog Records Retrieved by Personal Author Using Derived Search Keys Alan L. LANDGRAF and Frederick G. KLGOUR: The Ohio College Library Center This investigation shows that search keys derived from personal author names possess a sufficient degree of distinctness to be employed in an effi cient computerized interactive index to a file of MARC catalog records having 167,7 4 personal author entries. Previous papers in this series and experience at the Ohio College Library Center have established that truncated derived search keys are efficient for retrieval of entries by name-title and title from large online computerized files of catalog records. 1-4 Experiments reported in the earlier papers were... based on the assumption that each key had a probable use equal to all other keys. However, Guthrie and Slifko have shown that random selection of entries, rather than keys, yields results closer to actual experience but with a higher number of entries per reply. 6 For example, they found on retrieving from a file of 87,72 records using a 4, (four characters of main entry, five characters of title) key tl1at when the basis of the search was random keys there was one entry per reply 81. percent of the time, but when the basis was random records, there was one entry per reply.7 percent of the time. This paper presents the results of experimentation with search keys to be used in constructing an author index to a large file of on-line catalog records. An interactive environment is assumed, with the interrogator employing a remote terminal. A companion paper de:;etibes the findings of an investigation into retrieval efficiency of search keys derived from corporate author names. 7 MATERALS AND METHODS The investigation employed a MARC file containing approximately 2, monographic records from which a computer program extracted 167,74 personal-name keys. The program extracted these keys from main entry, series statement, added entry, and series added entry fields. The basic key structure consisted of sixteen characters-the first eight from the surname, the first seven from the forename, and the first character from the middle name ( 8,7,1). f the surname and forename contained fewer char-

14 Journal of Libmry Automation Vol. 6/ 2 June 197 LKELHOOD 9.% 99.% 99. % 9.% 99. % 99.% 2... j:.&: -i... j:.j:... t i NO. OF CHARACTERS EXTRACTED FROM THE SURNAME 4 6 (>2) (> 2) (>2) 171 (>2) 67 2 18 16 172 9 71 6 (>2) 1 12 81 16 8 6 6 2 2 67 6 2 26 12 9 87 44 8 16 62 7 8 29 21 21 7 J::.... 17 78 2 1 Fig. 1. Number of Names Retrieved 9, 99, and 99. Percent of the Titne for Different Key Structures acters than the key segment to be derived, the segment was left-justified and padded out with blanks. f there was no middle name or middle initial, a blank was used. Another program derived shorter keys from the 8,7,1 structure ranging from, to,2,1. Next, a sort program arranged the shorter keys in alphabetical order. A statistics collection program then processed the alphabetical file. This program counted the number of distinct keys, built a frequency distribution of names per distinct key and cumulative frequency distributions of names per distinct key in percentile groups. RESULTS Figure 1 presents the findings at three levels of likelihood for retrieving n

Catalog Records Retrieved/ LANDGRAF 1 Table 1. Number of Names Retrieved With 9 Percent Likelihood No. of Characters 4 6 7 No. of Names Retrieved ( > 2) (>2) (>2) ( > 2) 26 2 16 171 18 17 12 8 8 16 9 6 Key Structure, 4,,1,,2 4,1,1,1 6,,1, 4,2,2,1 4,1,1 6,1,2,1,1,,1 4,2,1 or fewer names when a variety of search key combinations were employed ranging from three to six characters from the surname, zero to three characters from the first name, and with or without the middle initial. Table 1 is an extraction from Figure l and contains the number of names retrieved at a level of 9 percent likelihood for the various search keys employed. Figure 2 has the same structure as Figure 1 but contains the degree of distinctness as percentages, ( no. of distinct keys) 1 no. of entries x percent. Table 2 records distinctness arranged by number of characters per key. Figure is a graphical representation of the degrees of distinctness of the various keys. n this figure, different types of lines connect points representing key structures that contain an equal number of characters. The bottom line in Table l may be read as saying that 9 percent of the time a 4,2,1 key will retrieve five or fewer names from a file of 167,74 personal name keys. The bottom line of Table 2 states that from the same file the 4,2,1 key. yields a single name 64.1 percent of the time. DSCUSSON, This experiment has shown the degree of distinctness-that is to say, the number of distinct keys divided by the total number of entries from which all keys were derived-to be a useful tool in determining what key structures may be efficiently used. As seen by comparing Figure 1 with Figure 2 and Table 1 with Table 2, there is a high degree of correlation between distinctness aj}d the likelihood of retrieving a certain number of names 9,

16 Journal of Library Automation Vol. 6/ 2 June 197 a: la. :!::: a:o 1-z < :-' Cl)t-< ffiie t;w!: :w <Z-' %WO oa::o A-Oi OA... ww %: Z-t- NO. OF CHARACTERS EXTRACTED FROM THE SURNAME 2 4 2.271-17.16 44.1 4.676 6.979 44.914 66. 1 4 6 9,94 19.22 24.87 - - -,6 44.8 48.4 7.148 61.449 62. 891 49.87.8 ' 64.1 66.186 6.294 66.99 Fig. 2. Degree of Distinctness in Percent for Different Key Structures Table 2. Distinctness by Number of Characters Per Key No. of Characters 4 6 7 Degree of Distinctness 2. 9.9 17.1 19.2 4.8.7 44. 24.6 44.9 44.9 49.9 7. 7.1 48..8 6. 61.4 62.1 64.1 Key Structure, 4,,1,,2 4,1,1,1 6,,1, 4,2,2,1 4,1,1 6,1,2 4,,1,1,,1 4,2,1 99, or 99. percent of the time. Thus, the investigator can eliminate many unesirable key structures on the merits of distinctness alone and pool his remaining resources toward studying in detail other structures.. 'When the 8,7,1 key was tested, it yielded a uniqueness percentage of

Catalog Records Retrievedj LANDGRAF 17 (,} 1 (4,) 2, / 'rs,1j (7,2) (,) -- (6,) 4 (j) 6 7 tipper LMT 68.78;,..;;-------------- Fig.. Degree of Distinctness. Lines Connect Points Whose Key Structures Have an Equal Number of Characters 68.8 that represents the upper limit of uniqueness in this experiment. From Table 2 it is apparent that the bottom three keys yield a percentage of uniqueness near the upper limit. Table 2 shows a distinct jump in percentage of uniqueness between the n,o and n,l key structures. Another sharp increase occurs between n,m and n,rn,l structures. Each section of the key is derived from a Markov string, and it appears from the discontinuities between sections that the parts of personal names are not highly correlated. As pointed out in previous papers, a key structure that possesses a rela-

18 Journal of Libmry Automation. Vol. 6/ 2 June 197 tively high degree of distinctness also yields a small percentage of replies containing many entries. For the name-only search key, this effect could be reduced by performing the retrieval in two steps when necessry. First, the full names for each author whose name matcl1es the entered search key would be displayed; names appearing with more than one work would be displayed only once. Next, the retriever would choose the name desired and request all of the titles associated with it. However, some title displays could be excessive-william Shakespeare's name appears with more than works. A paper currently in preparation at OCLC describes an algorithm whose interactive use resolves this type of search problerri. 8 CONCLUSON This investigation has yielded findings showing that there are several truncated search keys derived from personal names that ate sufficiently specific to perform efficiently as an author index to a file of 161,74 personal names, thereby providing an on-line index that will make it. possible for a terminal user to obtain a listing of all titles by a given author: in an on-line catalog. ACKNOWLEDGMENT This study was supported in part by Office of Education contract OEC--72-2289 ( 6) and Council on Library Resources grant CLR-26. REFERENCES 1. P. L. Long and F. G. Kilgour, A Truncated Search Key Title ndex, Journal of Library Automation :17-2 (March 1972). 2. F. G. Kilgour, P. L. Long, E. B. Leiderman, and A. L. Landgraf, Title-Only Entries Retrieved by Use of Truncated Search Keys, Journal of Library Automation 4:27-1 (Dec. 1971).. F. G. Kilgour, P. L. Long, and E. B. Leiderman, Retrieval of Bibliographic Entries from a Name-Title Catalog by Use of Truncated Search Keys, Proceedings of the American Society for nformation Science 7:79-82 (197). 4. F. G. Kilgour, P. L. Long, A. L. Landgraf, and J. A. Wyckoff, The Shared Cataloging System of the Ohio College Library Center, Journal of Library Automation :17-18 (Sept. 1972).. Long and Kilgour, A Truncated Search Key, p.l8. 6. Gerry P. Guthrie and Steven D. Slifko, Analysis of Search Key Retrieval on a Large Bibliographic File, Journal of Library Automation :96-1 (June 1972). 7. K. B. Rastogi, A. L. Landgraf, and P. L. Long, Corporate Author Entry Record Retrieved by Use of Derived Truncated Search Keys, ]oumal of Library Automation in press. 8. J. A. Wyckoff, A Technique for Extending Searches through Large Numbers of Duplicate Matches, in Preparation.