Archiving of Electronic Business Reference Sources
1998 Publishers Open Forum
January 12, 1998
Prepared by Bill Kinyon
With the assistance of John Campbell and Sharmon Kenyon
BRASS Business Reference Sources Committee
Along with the increasing availability of information in electronic formats have come questions about the continued access to that information. This is especially true in libraries, where information is supposedly stored for anyone to use, for as long as anyone wants to use it. Librarians have long fought the battle of preserving printed materials, seeking ways to keep them from disintegrating, rotting, fading, or anything else that would render them unusable. When the first electronic documents appeared, many librarians probably rejoiced that at last they had a format which presented no preservation problems. It is now obvious that electronic information has presented a whole new set of challenges.
An extensive examination of archiving in general, in both its policy and technical aspects, has appeared in the Task Force on Archiving of Digital Information's 1996 report Preserving Digital Information, prepared for the Commission on Preservation and Access and the Research Libraries Group.
Much of the discussion in the literature has concerned the archiving of electronic journals. Librarians have been discussing this aspect of the archiving issue for some time, and are still trying to determine how to resolve it. Thus far, many publishers have seemed to be content to let librarians handle it. Indeed, an official at Elsevier and a co-editor of an electronic journal have stated that libraries will be the ones to archive electronic journals ( Manoff, 1992, pp. 124-125).
The Researcher's Need for Historical Information
Although the archiving of journals is certainly a concern for business librarians, a more pressing one may be the archiving of other types of information found in electronic business reference sources. These include company financial reports; demographic, economic, and industry statistics; directory listings; and bibliographic records. This part of the archiving issue has apparently gone largely unaddressed in the library science professional literature.
The basis for the concern by business librarians about the inaccessibility of historical information lies in the demand by their clientele for that information. ("Historical" can mean different time frames to different people, but an often-used definition is anything older than a year.). Demand has always been and will continue to be very high, since many business researchers are interested in assessing trends and patterns which evolve over time, often going back twenty years or more. The frequency of the data is also important, i.e., sometimes annual statistics will suffice, but frequently quarterly, monthly, weekly and even daily statistics are required for a researcher's project. Business librarians are now caught in the crunch between researchers' needs and the increasing inaccessibility of data.
Technological Constraints on the Preservation of Historical Information
In the last few years, business librarians have really begun to feel the burden of dealing with the challenges posed by electronic information that they have acquired through purchase or subscription. For example, a CD-ROM produced several years is unreadable by currently available CD-ROM drives; or the tape drive is now obsolete that can read data on a magnetic tape produced many years ago ( Task Force, 1996, p.6).
The physical nature of the media, e.g., magnetic, optical, also presents difficulties for archiving electronic information. Estimates of the lifetime of magnetic media vary widely, but even the longest estimates fall far short of what is needed for very long-term storage. Magnetic media are subject to two types of decay, as described by Neavill and Sheble (1995, p. 15): static, which involves deterioration over time even when the media are not in use, and dynamic, which involves deterioration resulting from use. Optical media, though promoted as a viable option to magnetic media for long-term storage because they are not subject to dynamic decay, are still subject to static decay. According to the Task Force on Electronic Information Systems of the American Physical Society, "the lifetime of current optical media is 'finite, on the order of years'" (as quoted in Neavill & Sheble, 1995, p.16).
Differences in the Librarian's and the Publisher's Perspectives
Business librarians and publishers may have different perceptions when debating the issue of accessibility of historical business information. From the librarian's perspective, the data should extend as far back as it is recorded; from the publisher's perspective, only as far back as it is feasibly profitable to extend the backfiles. Furthermore, librarians certainly feel that anything which they originally received in electronic format should remain accessible to them - even when the software or hardware changes - either by reissuing it in a new format or by adding it to the more current information. Librarians feel that, since they paid for it once, they should not have to pay for it again to continue using it. That is, it is the publishers' responsibility to continually provide all of the data for which the libraries have paid.
On the other hand, publishers have to look at this issue from an economic standpoint and determine if they can actually afford to reissue the data or develop a means of providing it with the more current information. Is there really a market for this older data, a market that will be large enough to bring a profit, or at least prevent a loss? If there is a market, how much will it cost to develop a method of providing the data?
Another issue is that of new products and new data. If librarians buy a product now, what assurance do they have that the data in that product will still be accessible in several years, or even two or three years from now? Again, there is the potential dichotomy between publishers and librarians. Librarians want this information to be available, while publishers wonder if they can afford to do so.
An Alternative to Publisher Archiving
If the publisher cannot do the archiving, perhaps one alternative would be the assumption of that role by a certified archiving service ( Task Force, p. 22-24). One possible approach is consortial collections. This would involve libraries working together to collect and archive electronic sources on networked sites. Neavill and Sheble (1995, p. 14) cite one such effort, the CICNet Electronic Journal Collection, a cooperative journal archival project started in 1992 by the Committee on Institutional Cooperation, a consortium of major research universities. However, such consortial arrangements would not be a dependable option for libraries who are not members of the group. According to Neavill and Sheble (1995, p. 14): "Libraries should not depend on the charity of a few sites such as CICNet to provide universal access to these publications." Even if such arrangements would work with electronic journals, it might well prove much more difficult to implement with electronic business data.
Archiving: The Concept of Data Migration
Even once it is decided who will be responsible for the archiving, the technical question of how to perform the archiving arises. The overarching concept behind the process is data migration. According to the Task Force on Archiving of Digital Information (1996, p. 6): "[Data] migration is the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects and to retain the ability of clients to retrieve, display, and otherwise use them in the face of constantly changing technology." Jeff Rothenberg (1995, p. 45) has discussed the major difficulty of migration: it must occur frequently enough so that, because of physical unreadability or obsolescence, the data does not become inaccessible before being copied. He uses the analogy of links in a chain. A single break in the chain may make the data inaccessible, short of a major effort to prevent it. Rothenberg believes that the current pace of rapid technological change may require that migration occur as often as every few years.
Two Approaches to Archiving: Data Refreshing and Data Translation
Two approaches to the archival process are data refreshing and data translation ( Task Force, 1996, p.5-7).
Data refreshing involves copying data from one medium to another. "Refreshing digital information by copying will work as an effective preservation technique only as long as the information is encoded in a format that is independent of the particular hardware and software needed to use it and as long as there exists software to manipulate the format in current use" ( Task Force, 1996, p. 5).
Data translation involves translating data into a standard form that would theoretically be compatible with any other system. Rothenberg (1995, p. 46) states that the problem with this approach is that new forms of data are not always compatible with previous formats and that: "Old documents cannot always be translated into unprecedented forms in meaningful ways, and translating a current file back into a previous form is frequently impossible." However, there are programs called emulators, which can mimic the behavior of hardware. The problem with emulation is that detailed specifications for the hardware must be saved in an independent digital form, so that it too will not have to be emulated in order to be able to emulate another program ( Rothenberg, 1995, p. 47).
Another technical aspect is the archiving of Web-based information, whose evanescent nature presents different challenges. Brewster Kahle (1997, pp. 82-83) has suggested that one solution, though only a partial one, is the taking of periodic "snapshots" of Web sites. His project, the Internet Archive, utilizes software to crawl the Internet and download freely available documents and to update them as they change.
Questions for Publishers
Business librarians want to hear from business reference publishers about the archiving of information and providing access to it. As a starting point for discussion, publishers could address the following questions.
1. Have you already done archiving? If so, what factors did you consider in determining what data to preserve? What technical approach did you take to archiving? Describe any difficulties that you encountered.
2. What plans do you have for making available historical data that is no longer accessible, because of the obsolescence of the software and/or hardware? Will there be additional charges to access this data?
3. What plans do you have for handling this issue in relation to reference sources that in the future will only be available electronically? Will you be prepared to continue to make all data accessible for as long as a library subscribes?
4. It is likely that some libraries will discontinue their subscription to a source. With a printed source, they would usually retain the older information. With an electronic source, that may not be the case. Will you make arrangements so that these libraries could continue to access the data for which they have already paid?
5. What happens when you, the publisher, discontinue a source? Will it be maintained electronically for some period of time? Depending on the type of data, it could well continue to be useful for several years. Again, with a printed source, a library would have access to the information. What will happen with regard to an electronic source?
6. Do you see a role for the government or standards agencies in the archiving process?
7. As documents are migrated in the archival process, how would you designate for researchers what the "real" document is for citation purposes?
Kahle, B (1997). Preserving the Internet. Scientific American, 276(3), 82-83.
Manoff, M., Dorschner, E., Geller, M., Morgan, K., & Snowden, C. (1992). Report of the Electronic Journals Task Force MIT Libraries. Serials Review, 18 (1-2), 113-129.
Neavill, G. & Sheble, M A (1995). Archiving Electronic Journals. Serials Review, 21(4), 13-21.
Rothenberg, J (1995). Ensuring the Longevity of Digital Documents. Scientific American, 272(1), 42-47.
Task Force on Archiving of Digital Information (1996). Preserving Digital Information: Report of the Task Force on Archiving of Digital Information. Washington, DC: Commission on Preservation and Access. Also available on ERIC microfiche: ED395602. Also available at the Research Libraries GroupWebsite.
Contact at BRASS Business Reference Sources Committee
Main Reference Department
University of Georgia Libraries
Athens, GA 30602-1641
FAX (706) 542-4144
Comments to: firstname.lastname@example.org
Archiving of Electronic Business Reference Sources
Publishers Open Forum, ALA Midwinter Conference, January 12, 1998
Disclaimer: This publication has been placed on the web for the convenience of BRASS members. Information and links will not be updated. Posted 17 November 1997.