Guest Editorial: Introduction to This Special Issue on the Bibliomining Process

Scott Nicholson

Libraries are under siege. Material costs continue to rise, forcing more selective collection development; funding agencies demand data-based justification for services; and the Internet is replacing the library as the primary source for information. In addition, legal threats to the privacy of patron records are pushing librarians to take extremely destructive action to their institutional records.

One solution to many of the problems is the bibliomining process. Libraries can use a data warehouse to capture information about their materials without associating personally identifiable information with their use. Statistical and data-mining techniques can then be used with the data warehouse to understand patterns of use. Understanding these patterns allows for

  • better decisions about collection development;
  • thorough data-based justification of library services;
  • customized library services to compete with the Internet; and
  • a more complete understanding of how the library is used.

These methods and tools are heavily used in the corporate sector to systematically capture, clean, and customize services for users. The significant difference between corporate use and the use of these tools in libraries is that of user privacy; libraries are more interested in patterns of use exhibited by groups than in behaviors exhibited by individuals. By definition, a pattern is something done by a group of people; therefore, the goal is only to discover patterns of behavior. During the data-cleansing process, the connections are broken so that these patterns cannot be traced back to individual users.

This is not the first special issue on this topic; in 1996, Library Administration and Management had a themed issue on the mining of library data. However, most of the data came from automation systems. While those systems are still a valuable resource for data, the introduction of digital library services through the Internet has greatly increased the amount of data available. As the percentage of library budgets spent on electronic materials grows, so does the need for updated management tools to understand the use of those materials. The goal of this special issue is to raise awareness of some current and forthcoming methods and tools useful in understanding library use.

Two articles in this issue deal primarily with data warehousing. The first, “The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making,” presents an overview of the entire process with some discussion of methods that can be employed to create the data warehouse. The other article is “Traces in the Clickstream: Early Work on a Management Information Repository at the University of Pennsylvania,” written by Joe Zucca, who has been in charge of the construction of a data warehouse at the University of Pennsylvania Library.

Also included in this special issue are two brief communications about some applications of the bibliomining process. In “A Study of the Use of the Carlos III University of Madrid Library’s Online Database Service in Scientific Endeavor,” Carlos A. Suárez-Balseiro and others from Madrid explore patterns of e-journal use by different departments in an academic setting. Eugene Garfield, founder of the Institute for Scientific Information and the Web of Knowledge, and his colleagues present a new way of using citation data with visual data mining through HistCite to gain a clearer understanding of essential literature on a topic.

This special issue includes three full articles exploring different bibliomining methods for gaining insight from library data. Christos Papatheodorou and his coauthors from Greece describe two methods of extracting profiles of user groups in “Mining User Communities in Digital Libraries.” Irene Wormell from Sweden presents a technique for developing library portals that are representative of user needs in “Matching Subject Portals with the Research Environment.” Finally, in “An Architecture for Behavior-Based Library Recommender Systems,” Geyer-Schultz and his colleagues from Germany present a system that will recommend works for users based upon patterns of past use.

This international collection of articles all use library data to help in library decision-making processes. Some of these research articles are focused on service-building while others are designed to assist library personnel in meeting the needs of users. The one common factor in all of these articles is the need to maintain library data. Library administrators contemplating the wholesale deletion of library records are advised to consider the long-term impact of their decision; the interim step of a data warehouse can maintain some data important for decision-making and the building of services while still protecting user confidentiality.

To learn more about this topic, visit the Bibliomining Information Center at http://bibliomining.org.


   Scott Nicholson (srnichol@syr.edu) is an Assistant Professor at the Syracuse (N.Y.) University School of Information Studies.