Starting Small: Practical First Steps in Digital Preservation
By Helen K. Bailey, Library Fellow for Digital Curation and Preservation, MIT Libraries
This past summer I was fortunate to attend ALA Annual Conference for the first time as the recipient of the Jan Merrill-Oldham Professional Development Grant. This award, created by the ALCTS Preservation and Reformatting Section (PARS) and sponsored by the Library Binding Institute, was established in 2011 to honor the career and influence of Jan Merrill-Oldham, distinguished leader, author, and mentor in the field of library and archives preservation. Thanks to this generous grant, I was able to not only attend ALA Annual, but to present this paper at the PARS Digital Conversion Interest Group session. Many thanks to ALCTS, PARS, LBI, and Dartmouth College Library for giving me the opportunity to attend and present at this wonderful conference.
This presentation is one example of how digital preservation principles can be added to the collections management activities of a small institution, without needing a lot of additional resources. This process was undertaken at Dartmouth College from 2010–2012. My original presentation slides can be found on SlideShare. This article is a brief summary of the full presentation.
Libraries everywhere are seeing a constant increase in e-journals, e-books, databases, and other electronic resource subscriptions, and are concerned with whether and how these materials are being preserved. Services such as LOCKSS and Portico are excellent preservation services, but they do not preserve the entirety of libraries’ vast and growing digital collections.
Many larger institutions have digital preservation repositories in which they deposit and manage much of this digital content. However, smaller institutions may not have the resources to set up such a repository, or may be searching for an interim solution until a repository is in place. Below are some simple steps that a digital content manager can take in situations such as these.
Step One: Take Inventory
The first step in managing digital collections is to create a human-readable inventory. There are many ways this could be done, but one easy way to manage a relatively small number of resources is with a simple spreadsheet. Ideally this spreadsheet should be shared and backed up (as a Google Doc or on a shared server space) so that the information doesn’t reside with one individual.
Kinds of information to include:
- Descriptive information about each resource. This doesn’t need to be exhaustive, just enough information to identify which resources are being managed.
- Resource record or other linking ID so resources can be linked to catalog records, etc.
- File formats. This could be as simple as a list of the file formats encountered in each resource. However, a separate master list of all the file formats managed within the collection offers a quick overview of potential files needing preservation migration.
Step Two: Document the Bits
The next step is to verify and document bit integrity. The easiest way to do this is by creating a manifest of all the files with checksums for each file. There are many tools that can do this, but we chose to use Bag-It, which is a packaging and transfer specification developed by the Library of Congress and the California Digital Library.
Step Three: Backup
Good, redundant backup is necessary to ensure safety of the digital content over time. This can be achieved using backed-up servers, but for a really inexpensive solution we used redundant external hard drives. We purchased three 2TB external hard drives (different but reputable brands), at a total cost of $600, and copied all of the content to each drive.
We then sent one copy of each of the redundant drives to our off-site storage library, one to the archive, and kept one in preservation services. Thus, we had three copies of all the files, stored in separate locations on campus. We also catalogued each hard drive in our library system to document the locations and allow for future retrieval.
Step Four: Manage Content over Time
All of the previous work leads to this step, which is what really ensures the long-term usability of the content. We developed a procedure to retrieve each hard drive every six months, run the Bag-It validation, spot-check a few files to make sure they were usable, migrate any files in formats that are in danger of obsolescence, record any changes made in the inventory list, and then send them back to their storage locations. We also added new content as needed, and replaced the hard drives with newer models periodically.
These basic steps for introducing digital preservation to collections management aren’t perfect or comprehensive, but they do have some benefits:
- It’s cheap. Our total equipment cost at Dartmouth as of June 2012 was $600 for hard drives.
- There’s very little staff time required. There was no configuration, no real software setup, and all the processes can run in the background while busy library staff do other things.
- It doesn’t require a lot of technical expertise.
However, this system is not an ideal permanent solution. Here are some drawbacks to keep in mind:
- It’s not scalable. As collections grow to many terabytes, a repository becomes necessary to automate these tasks.
- It’s low on security. This would not be a good process to use for sensitive material.
- It’s prone to human error because the overall process is manual.
- Delivery and discovery have to be addressed separately. These steps are simply a failsafe for ensuring that digital content is not lost.
Resources for More Information
- Digital Curation and Preservation Bibliography by Charles W. Bailey, Jr.
- Digital Preservation Handbook, first compiled by Neil Beagrie and Maggie Jones and now maintained and updated by the Digital Preservation Coalition.
- Digital Preservation Management tutorial and workshop, developed by Anne R. Kenney and Nancy Y. McGovern and maintained by Cornell University Library, 2003–2006; extended and maintained by ICPSR, 2007–2012; and now extended and maintained by MIT Libraries, 2012–.
- Reference Model for an Open Archival Information System, by the Consultative Committee for Space Data Systems.
- The Signal: Digital Preservation Blog by the Library of Congress.