The Emperor's New Repository

Similar documents
MITOCW MITCMS_608S14_ses03_2

Interviewing Techniques Part Two Program Transcript

************************************************************************ Financial Literacy in Grades 9 and 10 The Arts Music AMU1O and AMG2O

QUICKSTART COURSE - MODULE 7 PART 3

MITOCW watch?v=fp7usgx_cvm

MITOCW R3. Document Distance, Insertion and Merge Sort

Common Phrases (2) Generic Responses Phrases

The Slide Master and Sections for Organization: Inserting, Deleting, and Moving Around Slides and Sections

>> Counselor: Hi Robert. Thanks for coming today. What brings you in?

SOAR Study Skills Lauri Oliver Interview - Full Page 1 of 8

Jenna: If you have, like, questions or something, you can read the questions before.

Using Google Analytics to Make Better Decisions

Editing Your Novel by: Katherine Lato Last Updated: 12/17/14

even describe how I feel about it.

#022: JOB INTERVIEW IN ENGLISH COURSE

Celebration Bar Review, LLC All Rights Reserved

Eight Steps to a Vision of Greatness

How to Put Rainbow on Your Resume

Dialog on Jargon. Say, Prof, can we bother you for a few minutes to talk about thermo?

LEARN AND EARN GUIDE. Find Out How to Make Money as a Copywriter While You re Learning to Write Copy!

MITOCW R7. Comparison Sort, Counting and Radix Sort

Ep 195. The Machine of Your Business

The ENGINEERING CAREER COACH PODCAST SESSION #1 Building Relationships in Your Engineering Career

Transcript of the podcasted interview: How to negotiate with your boss by W.P. Carey School of Business

The Open University xto5w_59duu

Crushing Self-Doubt & Insecurity

Tips On Starting Your WooCommerce Online Store with Michael Tieso

Ep #181: Proactivation

CLICK HERE TO SUBSCRIBE

[00:03:00] There is another movement, which is essentially the pupils of the eyes expanding and contracting.

Week 1. Getting Started

BOOK MARKETING: Profitable Book Marketing Ideas Interview with Amy Harrop

NFL Strength Coach of the Year talks Combine, Training, Advice for Young Strength Coaches

WEBSITE PROPOSAL OBJECTION ANSWER SCRIPTS

NCC_BSL_DavisBalestracci_3_ _v

Formulas: Index, Match, and Indirect

Pricing Strategies. Don't be a guesser

Heuristics: Rules of Thumb

How to Help People with Different Personality Types Get Along

Laura is attempting to bake and decorate a cake, with no success. LAURA It didn t work. Damn it! It didn t work. Final Draft 7 Demo

#1 CRITICAL MISTAKE ASPERGER EXPERTS

Transcript of Interview with Studio Superstar Phi Nelson

COLD CALLING SCRIPTS

Subject: Subject: A PARODY ON HOW WE SAY THE LORD S PRAYER THE LORD'S PRAYER

Class 1 - Introduction

Hello and welcome to the CPA Australia podcast. Your weekly source of business, leadership, and public practice accounting information.

Do Not Quit On YOU. Creating momentum

Preparing for Your Interview

ONTARIO MINISTRY OF EDUCATION

OKAY. TODAY WE WANT TO START OFF AND TALK A LITTLE BIT ABOUT THIS MODEL THAT WE TALKED ABOUT BEFORE, BUT NOW WE'LL GIVE IT A

Today what I'm going to demo is your wire project, and it's called wired. You will find more details on this project on your written handout.

"The Lottery Shotgun Method:

So, again, that was addressing that main problem of how to attract new members. Even though people in that stage, you know, it's not just about

Contribute to CircuitPython with Git and GitHub

Power of Podcasting #30 - Stand Out From The Crowd Day 3 of the Get Started Podcasting Challenge

MITOCW R9. Rolling Hashes, Amortized Analysis

Autodesk University See What You Want to See in Revit 2016

ECO LECTURE 36 1 WELL, SO WHAT WE WANT TO DO TODAY, WE WANT TO PICK UP WHERE WE STOPPED LAST TIME. IF YOU'LL REMEMBER, WE WERE TALKING ABOUT

Welcome to another episode of Getting the Most. Out of IBM U2. This is Michael Logue, and I'll be your host

MITOCW watch?v=guny29zpu7g

Autodesk University Automated Programming with FeatureCAM

ECOSYSTEM MODELS. Spatial. Tony Starfield recorded: 2005

2015 Farnoosh, Inc. 1 EPISODE 119 [ASK FARNOOSH] [00:00:33]

Ep #2: 3 Things You Need to Do to Make Money as a Life Coach - Part 2

Proven Performance Inventory

MITOCW R22. Dynamic Programming: Dance Dance Revolution

GOAL CLARITY ROADMAP

MITOCW ocw lec11

Zoë Westhof: Hi, Michael. Do you mind introducing yourself?

Mike Wynn - ArtofAlpha.com

Referral Request (Real Estate)

7 Killer Replies To Common Sales Objections That WORK How to overcome the most common insurance phone sales conversations

Why Affiliate Marketing Doesn t Work for You

Become A Blogger Premium

HI. I'M TOM WYRICK AND I'LL BE YOUR INSTRUCTOR THIS SEMESTER IN ECON 155. IT'S THE PRINCIPLES OF MACROECONOMICS, BUT THIS IS

Graphs and Charts: Creating the Football Field Valuation Graph

Covering Cover Letters Why A Cover Letter is More Than Just A Dust Jacket

MITOCW R11. Principles of Algorithm Design

While you already see the notes needed, which are C, E, G, and D, we must understand what the term 'add' means.

How Can I Deal With My Anger?

Julie #4. Dr. Miller: Well, from your forms that you filled out, seems like you're doing better.

Environmental Stochasticity: Roc Flu Macro

Easy Steps to My Best Bonus Offers by Sunny Suggs! Click here to join now!

What to Do When They Say, 'Tell Us About Your Research' - Advice - The Chronicle of Higher Education

MITOCW ocw f08-lec36_300k

BOOSTING AFFILIATE PROFITS

26 AdWords Mistakes: How They Are Killing Your Profits (And How To Fix Them) Contents

3 SPEAKER: Maybe just your thoughts on finally. 5 TOMMY ARMOUR III: It's both, you look forward. 6 to it and don't look forward to it.

What You Need to Learn

How to Close a Class

This course involves writing and revising a research paper on a topic of your choice, and helping other students with their research papers.

CLICK HERE TO SUBSCRIBE

CLICK HERE TO SUBSCRIBE

They want to say "goodbye" to their current job situation and make their living as a freelance web writer.

Ep #207: Being a Good Employee

Transcript: Say It With Symbols 1.1 Equivalent Representations 1

PASSIVE INCOME DESIGNERS BUSINESS DESIGN FREEDOM EPISODE #001 THE 7 MYTHS OF PASSIVE INCOME

Freezer Paper Piecing with Tara Faughnan

Buying and Holding Houses: Creating Long Term Wealth

Great. We're gonna start off by you sharing, just say your name, say your year in school. I think you all are sophomores, right?

Transcription:

The Emperor's New Repository I don't know the first thing about building digital repositories. Maybe that's a strange thing to say, given that I work in a repository development group now, and worked on the original DSpace project years ago, and worked on a few repository research projects in between. If that qualifies me to say anything about repositories, though, it's just that I don't know much about what I'm doing, and I don't think many other people do, either. I'd better qualify that some. It's not that there aren't smart people working on repositories -- there are plenty. It's not that few repository projects have good, important objectives -- many do. And it's not that we haven't learned anything in the past 10-15 years since "digital repositories" grew from a buzzword into a strategic program in many libraries -- we've learned a lot. But I don't know what this smart group of people with solid goals and lessons learned add up to yet. Does that still sound strange? If it sounds strange to you, try this thought experiment. Say you get a new job as a director of a small library. In a new town. Without a library. So it's your job to build a library where one doesn't exist. What do you do first? I've never run or built a library before, but I'd guess that you'd need to start with siting property and working with a city to plan infrastructure. Then come architectural design and construction bids, and when you're far enough along to plan what goes in the building itself you split out budget lines for staffing different departments, and furniture and shelving of several different kinds, and a floor plan, and meeting rooms and utility functions. Maybe you only can hire a few people, but you know you need to cover collection development, public and technical services, computing, accounting and payroll, and maintenance, whether with three people on staff or 30. See? I don't know the first thing about running a library, but I know you'd better plan for at least all these things. You're probably thinking of

other things I didn't mention, whether you've ever run a library or not. Now suppose you were hired to be a digital repository project director. For a new repository, which doesn't exist yet. What do you do first? Eh? I don't know, either. And that's what I mean! Collect what you know Given how long I've been around people and projects aiming to "build repositories", and how little confidence I have that we know what it really means to build a repository, I'd guess there are plenty of you who don't know, either. On one hand, sometimes it's okay not to know -- if you have a good goal in mind, like collecting faculty research, or making rare local materials available, the details of how you achieve your goal are less important than regularly measuring against the yardstick itself. I'd bet, though, that there are plenty of projects that don't even have this much clarity driving them. Absent such clarity, there are a lot of mistakes you can make along the way to achieving clarity in determining why you want a repository. The first mistake to avoid is fetishizing software products or projects. Over the years I've had a lot of conversations with friends and colleagues who've asked "do you think Greenstone/Fedora/DSpace/EPrints/ContentDM/etc. is what we should use?" My answer these days is almost always the same: "it depends on what you're doing, but you can always start with one or another and decide after you have some experience, and they're all a good place to start, so just pick one and get started." There isn't any single answer and there isn't any clear winner. In an era when we're still trying to figure out what it means to build a repository, it's great that there are so many options, and it's wonderful that there are so many free software options. If you think you need specialized software to build your repository, then the key thing is to get started with a tool that looks like a roughly good fit. You're going to learn so much along

the way that the details of whether that tool's the best long term fit or not are going to become obvious to you as you build up experience loading your content and making it available. The flip side of avoiding agonizing over which tool to pick is that you shouldn't hesitate, once a project takes some turns you don't like, to acknowledge that maybe you've made a bad choice with a particular toolkit, or that maybe you just approached the project the wrong way, with the wrong materials at first, or with the wrong staff, or even just at the wrong time. There's a software development axiom from Frederick Brooks, author of _The Mythical Man Month_, that applies well here: "Plan to throw one away; you will, anyhow." To plan for mistakes means to be ready to learn from them when they happen, and to minimize the cost in energy and expense when things go wrong. Start with a small collection, minimal staff, and a short timetable, and see what you can learn by building something quickly. All the feature comparison spreadsheets and RFPs in the world won't help you make a good decision if you haven't already started up the learning curve for yourself. There's a broader point to remember here, too, that software isn't usually part of our collection development plans. We librarians know approval plans, cataloging standards, and search strategies, but selecting and implementing software isn't our strength. There, I said it: we're not good at this. But that's okay! Like a reference librarian assigned to select materials in a new collection area, we can learn as we go. The key steps are to get started, and to expect to make mistakes, but to be ready to learn from your mistakes. I'll let you in on another secret of repository building. Adding new software isn't always the best approach to building a repository. Sometimes it's not only a mistake to introduce new layers of software between content and users or between content and staff, it's just a bad idea, period. You're probably comfortable, by now, with putting a web page online, and setting up a directory on a web server with some new files, or if you're not comfortable performing these tasks, you probably have colleagues or staff who are capable and experienced with basic web publishing. If you have a

small collection of digitized or born digital items, and your primary goal is to get it online, the easiest thing to do might be to just put it online in a simple directory or two with some web pages listing and describing it all. Make a backup, too, or two, of course. But most of us with web sites can get some new files linked from a library's home page pretty easily these days. If you can do that, your users can find it from your home page, and even if they never visit your home page, the major search engines can find your items and crawl and index them, so maybe your users can find things that way. If you think this sounds defeatist or desperate, don't think that -- I mean just the opposite. Sometimes the shortest path between users and content is simply putting the content where users have a chance at finding it. The web does this very well for us. I've administered enough web sites and odd software packages over the years to know well that the content that lasts online the longest is almost always the content with the least amount of stuff around it that can go wrong. Whether it's an old programming language or a bad script or an odd toolkit few people ever used, if the only way to get to your stuff is through some oddly-shaped software that's out of date and unreliable, someday nobody's going to be able to get to your stuff anymore. If, on the other hand, the same content is simply available over the web like any other web content, just a bunch of files in a directory on a web server, then that content can be indexed and reindexed by Google et al., copied and recopied onto different servers by you and your staff, and migrated across changes in web server and operating system software choices over time. We are the repository The second mistake to avoid is forgetting that we are the repository. Every software repository I've helped to build has faced complex issues of planning and policy which had little to do with technology and everything to do with how to build a sustainable program for ensuring access over time. Remove any special software from the equation completely -- like in the case of "just a few directories on a web server" -- and planning and policy issues still come into play. Like with any other materials in a library, it's ultimately

up to those of us working in the library to set, maintain, and uphold policies for collection development, access, maintenance, and retention. If repository technology gets in the way of making policy choices that fit in with your broader institutional mission, something might be wrong. Optimize for access A third mistake that's easy to make is to over-think what a "digital object" might be. I fall into this trap all the time, even with a few rounds of experience under my belt. If you focus on making some content available, using one specialized tool or another or none at all, and that content is useful to your community, its users will tell you how they want to use it. This is another concept echoed both in the "release early and release often" mantra of free software development and the "don't make me think" school of usability testing. Real feedback from real users will tell you the most about features your repository should add or improve, or when one degree more or less of descriptive metadata will make items easier to find or will cost you a ton without really helping people. If, before "just giving it to people", you spend months or years designing specifications for "complex objects" and hammering structural metadata and content files into shape to match before ever giving your users a chance to see that content, you might just find that you've spent a lot of time and energy without knowing a bit about what people want to do with your stuff. It's possible that you'll guess correctly about what to call files and how to relate files and metadata to each other and store all of that on disk, but that might not help your users at all. Give three programmers a pile of files and ask them to arrange it and you'll get five system designs for how to do it in four different programming languages and database models, but none of that tells you whether your users will use any of it. The best thing about letting users drive what you do, when it works (I know how hard it can be just to get feedback sometimes), is that it lets you build incrementally. Maybe you start with a simple set of collections published in a few directories and not a lot more. If your users are happy

with that, maybe you can stop there. But if they ask for the ability to search across it all, or to browse it all by subjects, or for specialized functions like being able to zoom in on large images, for instance, maybe that's a reason to look deeper at a specialized software package to augment or replace your "files on disk" setup. If you try a new package, look to your user community to tell you whether it does the job better. I know this advice isn't going to "solve your repository problem" for you. But if you avoid over-thinking software choices and you avoid overthinking the complexity of your content, you might find that there are immediate, cost-effective choices available to you that can help your community soon and teach you a lot along the way. And if you focus on optimizing access to meet your community's needs, you might find that the policies you'll need to sustain digital materials over time might match how you do everything else already. In five, ten, and twenty years, after all, any software you use today is likely to be obsolete, but it'll still be your responsibility to make and keep your content available and useful to your community.