Information, Technology and Libraries (ITAL), Volume 22, Number 4, December 2003

Contents

Guest Editorial

Introduction to This Special Issue on the Bibliomining Process
SCOTT NICHOLSON

Feature Articles

The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making
SCOTT NICHOLSON

Mining User Communities in Digital Libraries
CHRISTOS PAPATHEODOROU, SARANTOS KAPIDAKIS, MICHALIS SFAKAKIS, AND ALEXANDRA VASSILIOU

Matching Subject Portals with the Research Environment
IRENE WORMELL

An Architecture for Behavior-Based Library Recommender Systems
ANDREAS GEYER-SCHULZ, ANDREAS NEUMANN, AND ANKE THEDE

Communications

Traces in the Clickstream: Early Work on a Management Information Repository at the University of Pennsylvania
JOE ZUCCA

A Study of the Use of the Carlos III University of Madrid Library’s Online Database Service in Scientific Endeavor
CARLOS A. SUÁREZ-BALSEIRO, ISABEL IRIBARREN-MAESTRO, AND ELÍAS SANZ CASADO

Mapping the Output of Topical Searches in the Web of Knowledge and the Case of Watson-Crick
EUGENE GARFIELD, A. I. PUDOVKIN, AND V. S. ISTOMIN



Guest Editorial

   Introduction to This Special Issue on the Bibliomining Process
SCOTT NICHOLSON

Editor's Note: The full text of this editorial is available.


Feature Articles

   The Bibliomining Process: Data Warehousing and Data Mining for Library Decision-Making
SCOTT NICHOLSON

Bibliomining, or data mining for libraries, is the application of data mining and bibliometric tools to data produced from library services. This article outlines the bibliomining process with emphasis on data warehousing issues. Methods for cleaning and anonymizing library data are presented with examples.

Editor's Note: The full text of this article is available.

Scott Nicholson (srnichol@syr.edu) is an Assistant Professor at the Syracuse (N.Y.) University School of Information Studies.


   Mining User Communities in Digital Libraries
CHRISTOS PAPATHEODOROU, SARANTOS KAPIDAKIS, MICHALIS SFAKAKIS, AND ALEXANDRA VASSILIOU

The interest in the analysis of library user behavior has been increasing rapidly since the advent of digital libraries and the Internet. In this context, the authors analyze the queries posed to a digital library and recorded into the Z39.50 session log files, and construct communities of users with common interests, using data-mining techniques. One of the main concerns of this study is the construction of meaningful communities that can be used for improving information access. Analysis of the results brings to the surface some of the important properties of the task, suggesting the feasibility of a common methodology.

Christos Papatheodorou (papatheodor@ionio.gr) is an Assistant Professor and Sarantos Kapidakis (sarantos@ionio.gr) is an Associate Professor in the Department of Archive and Library Sciences, Ionian University, Corfu, Greece. Michalis Sfakakis (msfaka@ekt.gr) is a Researcher at the National Documentation Center, Athens, Greece. Alexandra Vassiliou(alex@lib.demokritos.gr) is Head of the Documentation Department at the Library, National Center for Scientific Research, Demokritos, Athens, Greece.


   Matching Subject Portals with the Research Environment
IRENE WORMELL

This article presents methods for testing the usefulness of bibliometric methods for the evaluation of information resources located at subject portals. Two subject portals for social sciences have been selected as objects for the study: SamWebb at Gothenburg University Library in Sweden and Bisigate at the Aarhus Business School Library, Denmark. To show how to capture the local users´ views and requirements in the development of portals, this article explores the results of the analyses targeting one of the selected institutions, Gothenburg University’s Department of Political Sciences. The study produced various types of lists as well as maps for monitoring the research and publication pattern of the department. These reports allow exploration and visualization of the research results of the institution in a form that is easy to read and understand for portal users. The content of the lists and maps was designed to provide information about which journals are relevant for the ongoing research activities in the department, and to identify useful links to professional institutions, organizations, persons, most cited publications, and authors.The study gathered quantitative data to measure how well the information resources of the portals match the research profile of the institutions.

Irene Wormell (Irene.wormell@hb.se) is a Professor at the Swedish School of Library and Information Science, Borås, Sweden.


   An Architecture for Behavior-Based Library Recommender Systems
ANDREAS GEYER-SCHULZ, ANDREAS NEUMANN, AND ANKE THEDE

Library systems are a very promising application area for behavior-based recommender services. By utilizing lending and searching log files from online public access catalogs through data mining, customer-oriented service portals in the style of Amazon.com could easily be developed. Reductions in the search and evaluation costs of documents for readers, as well as an improvement in customer support and collection management for the librarians, are some of the possible benefits. In this article, an architecture for distributed recommender services based on a stochastic purchase incidence model is presented. Experiences with a recommender service that has been operational within the scientific library system of the Universität Karlsruhe since June 2002 are described.

Editor's Note: The full text of this article is available.

Andreas Geyer-Schulz (geyer-schulz@em.uni-karlsruhe.de), Andreas Neumann (neumann@em.uni-karlsruhe.de), and Anke Thede (thede@em.uni-karlsruhe.de) are Researchers at the Schroff Chair of Information Services and Electronic Markets, Institute for Information Engineering and Management, Department of Economics and Business Engineering, Universität Karlsruhe (TH), Germany.


Communications

   Traces in the Clickstream: Early Work on a Management Information Repository at the University of Pennsylvania
JOE ZUCCA

For the past three years, the University of Pennsylvania (Penn) library has been building a data repository and developing computer functionality to support management information needs. This article traces the origin and evolution of Penn’s evolving management information system (MIS) program, known as the data farm. It addresses problems pertaining to the collection, storage, anonymization, and normalization of data, and looks at current work on a database-driven model for future MIS functions.

Editor's Note: The full text of this article is available.

Joe Zucca (zucca@pobox.upenn.edu) is the Assessment, Planning, and Publications Librarian, University of Pennsylvania Library, Philadelphia.


   A Study of the Use of the Carlos III University of Madrid Library’s Online Database Service in Scientific Endeavor
CARLOS A. SUÁREZ-BALSEIRO, ISABEL IRIBARREN-MAESTRO, AND ELÍAS SANZ CASADO

The identification of variables affecting university research is one of the chief factors in the evaluation of these institutions. This paper explores the relationships between the use of the information resources available through the Carlos III University of Madrid Library’s Online Database Service and the results of scientific endeavor within the institution. A two-dimensional analysis was performed, combining the number of database accesses as identified in the monthly activity records furnished by the IRIS CD-ROM database management module with the level of research activity represented by an aggregate index of the results of scientific endeavor, calculated by principal components methods.

Carlos A. Suárez-Balseiro (csbgv@bib.uc3m.es) and Isabel Iribarren-Maestro (iiribarr@bib.uc3m.es) are Research Fellows, and Elías Sanz Casado (elias@bib.uc3m.es) is Chair, Director of the Departamento de Biblioteconomía y Documentación, Facultad de Humanities, Comunicación y Documentación, Universidad Carlos III de Madrid, Madrid.


   Mapping the Output of Topical Searches in the Web of Knowledge and the Case of Watson-Crick
EUGENE GARFIELD, A. I. PUDOVKIN, AND V. S. ISTOMIN

HistCite™ is a system that generates chronological maps of subject (topical) collections resulting from searches of the Institute for Scientific Information Web of Science (WoS) or Science Citation Index, Social Sciences Citation Index, and Arts and Humanities Citation Index on CD-ROM. WoS export files are created in which all cited references for source documents are captured. These bibliographic collections are processed by HistCite, which generates chronological tables as well as historiographs that highlight the most-cited works in and outside the collection. Articles citing the 1953 primordial Watson-Crick paper on the structure of DNA will be used as a demonstration. Real-time dynamic genealogical historiographs will be shown. HistCite also includes a module for detecting and editing errors or variations in cited references. Export Files of five thousand or more records are processed in minutes on a PC. Ideally the system will be used to help the searcher quickly identify the most significant work on a topic and enable the searcher to trace its year-by-year historical development.

Eugene Garfield (garfield@codex.cis.upenn.edu) is Chairman Emeritus, Thomson ISI, Philadelphia. A. I. Pudovkin (aipud@online.ru) is Chief Scientist at the Institute of Marine Biology, Russian Academy of Sciences, Vladivostok, Russia. V. S. Istomin (vi@mail.wsu.edu) is a Systems Analyst at the Center for Teaching, Learning, and Technology, Washington State University.