Possibilities for Open Source Software in Libraries

Eric Lease Morgan

This short essay, based on a presentation given at the 2001 American Library Association (ALA) Annual Conference, enumerates a number of possibilities for open source software (OSS) in libraries and how it can be leveraged to provide better and more effective digital library collections and services.



OSS Briefly Defined

Open source software (OSS) is both a philosophy and a process. It is a philosophy describing the intended use of software and methods of distribution. OSS is often times equated with GNU software as well as described as free software, but the term "free" should be more equated with the Latin word liberat (meaning to liberate), and not necessarily gratis (meaning without return made or expected). In the words of Richard Stallman, the founder of the GNU software project, we should "think of 'free' as in 'free speech,' not as in 'free beer.'" 1 In this regard, the ideology behind OSS is not unlike some of the basic principles of librarianship in America. 2

OSS is also a process for the creation and maintenance of software. This is not a formalized process, but rather a process of convention with common characteristics between software projects. First and foremost, the developer of a software project almost always is trying to solve a specific computer problem - commonly called "scratching an itch." The developer realizes other people may have the same problem, and consequently the developer makes the project's source code available on the Internet in the hopes that other people can use it too. If there seems to be a common need for the software, a mailing list is usually created to facilitate communication, and hopefully the list is archived. Since the software is almost always in a state of flux, developers need some sort of version-control software to help manage the project's components. The most common version-control software is called CVS (Concurrent Versions System).

Codevelopers then "hack away" at the project, adding features they desire or fixing bugs of previous releases. As these features and fixes are created, the source code's modifications, in the form of "diff" files, are sent back to the project's leader. The leader examines the diff files, assesses their value, and decides whether or not to include them into the master archive. The cycle then begins anew. Much of a project's success relies on the primary developer's ability to foster communication and a sense of community around the project. Once accomplished, the "two heads are better than one" philosophy takes effect and the project matures. A highly recommended book titled The Cathedral and the Bazaar by Eric S. Raymond outlines this process in much greater detail. 3

OSS Contrasted with Homegrown Systems

Some people may remember the homegrown integrated library systems developed in the '70s and '80s, and these same people may wonder how OSS is different from those humble beginnings. There are two distinct differences. The first is the present-day existence of the Internet. This global network of computers enables people to communicate over much greater distances, and it is much less expensive than twenty-five years ago. Consequently, developers are not as isolated as they once were and the flow of ideas travels more easily between developers - people who are trying to scratch an itch. Yes, there were telephone lines and modems, but the processes for using them were not as seamlessly integrated into the computing environment (and there were always long-distance communications charges to contend with). 4

Second, the state of computer technology and its availability has dramatically increased in the past twenty-five years. At that time, computers, especially the type used for large-scale library operations, were almost always physically large, extremely expensive, remote devices whose access was limited to a small group of specialized individuals. Today, the computers on most people's desktops have enough RAM, CPU horsepower, and disk space to support a college campus of twenty-five years ago. 5

In short, the OSS development process is not like the homegrown library systems of the past simply because there are more people with more computers who are able to examine and explore the possibilities of solving more computing problems. In the days of the homegrown systems people were more isolated in their development efforts and more limited in their choice of computing hardware and software resources.

State of OSS in Libraries

What is the state of OSS in libraries today? Daniel Chudnov has been the profession's evangelist for the past two or three years, the original author of jake (jointly administered knowledge environment), and the maintainer of the oss4lib.org domain as well as its mailing list. Chudov has done a lot to raise the awareness of OSS in libraries. To that end he, Gillian Mayman, and others maintain a list of open source system projects. These projects include a lot of software designed specifically for libraries such as (but not limited to):

  • Document delivery applications (Prospero by Eric Schnell)
  • Z39.50 clients and servers (Yaz and SimpleServer by Sebastian Hammer, Zeta Perl by Rocco Carbone, and JZKit by Knowledge Integration, Ltd.)
  • Systems to manage collections (Catalog by Senga, Greenstone by Ian H. Witten et al., ROADS funded by JISC via the eLib Programme, and OSCR by Wally Grotophorst)
  • MARC record readers and writers (MARC.pm by Bearden et al., m[n]m by Robert McDonald et al., and XMLMARC by Lane Medical Library)
  • Integrated library systems (Avanti by Peter Schlumpf, Koha by Rosalie Blake and Rachel Hamilton-Williams, OpenBook by the Technology Resource Foundation, and OSDLSP by Jeremy Frumkin and Art Rhyno)
  • Systems to read and write bibliographies (bib2html by Stephanie Galland, bp by Dana Jacobsen, gBib by Alejandro Sierra and Felipe Bergo, and Pyblio_ grapher by Frederic Gobry)

For a more comprehensive list, visit www.oss4lib.org.

Yet the state of OSS in libraries is more than sets of computer programs. It also includes the environment where the software is intended to be used - a socioeconomic infrastructure. Put another way, any computing problem can roughly be divided into 20 percent technology issues and 80 percent people issues. It is this 80 percent of the problem that concerns us here. Given the current networked environment, the affinity of OSS development to librarianship, and the sorts of projects enumerated above, what can the library profession do to best take advantage of the current milieu? This question was posed to the OSS4Lib mailing list in April and May of 2000 and generated a lively discussion. 6 A number of themes presented themselves, each of which are elaborated upon below:

  • National leadership
  • Mainstreaming, workshops, and training
  • Usability and packaging
  • Economic viability
  • Redefining the integrated library system (ILS)
  • Open source data

National Leadership

One of the strongest themes mentioned was the need for national leadership. It was first articulated by David Dorman as the Open Source Library Network (OSLN). Karen Coyle and Aaron Trehab elaborated on the idea by suggesting that organizations such as ALA/LITA, DLF, OCLC, or RLG help fund and facilitate methods for providing credibility, publicity, stability, and coordination to library-based OSS projects. While OSS is almost always driven by individuals, the individuals of OSS still need to be provided with resources such as time, money, and computer hardware and software. It is widely believed that individualism can only go so far because after a time, individuals lose interest and pass projects on to others. Libraries are in it for the long term and cannot afford to implement workflows based on software whose lifetime is measured in "Internet years." National leadership, in the form of institutionalized support, will make OSS in libraries more of a reality much in the same way RedHat has helped make Linux a viable operating system and the World Wide Web Consortium (W3C), supported by MIT, provides guidelines and standards for the Web.

Mainstreaming, Workshops, and Training

Along these same lines was the expressed desire for the mainstreaming of OSS articulated by Carol Erkens, Rachel Cheng, and Peter Schlumpf. This mainstreaming process would include presentations, workshops, and training sessions on local, regional, and national levels. These activities would describe and demonstrate OSS for libraries. They would enumerate the advantages and disadvantages of OSS. They would provide extensive instruction on the staffing, installation, and maintenance issues of OSS. This mainstreaming process is an effort to promote and market OSS as a viable means for implementing sustainable digital library collections and services.

Usability and Packaging

In its present state, OSS is much like microcomputer computing of the '70s as stated by Blake Carver. It is very much a build-it-yourself enterprise; the systems are not very usable when it comes to installation. This point was echoed by Cheng, who helped facilitate a NERCOMP workshop on OSS. Schulmpf pointed to the need for easier installation methods so maintainers of systems can focus on managing content and not software. Using OSS should not be like owning an automobile in the '20s; you shouldn't necessarily need to know how to fix it in order to make it go. Packaging, and to a lesser extent, usability, are features supported in software by commercial institutions. Again, RedHat, a company distributing versions of the Linux operating system, has made its money by making it easier to install and maintain Linux-based computers. Microsoft writes software intended to seamlessly integrate with Intel-based computers. Microsoft's success is not based so much on the features of its applications, but rather the way the applications integrate with each other. The developers of OSS, including the ones in libraries, would benefit from similar installation procedures and integration processes.

Economic Viability

As pointed out by Eric Schnell and David Dorman, OSS needs to be demonstrated as an economically viable method of supporting software and systems. Libraries have spent a lot of time, effort, and money on resource sharing. Why not pool these same resources together to create software that will satisfy our professional needs? OSS cannot be equated with the homegrown systems of the past - spaghetti code and GOTO statements should be ancient history. More importantly, a globally networked computer environment provides a means of sharing expertise in a manner not feasible twenty-five years ago. We need to demonstrate to administrators and funding sources that money spent developing software empowers our collective whole. It is an investment in personnel and infrastructure. OSS is not a fad, yet it will not necessarily become a complete replacement for commercial software. On the other hand, OSS offers opportunities not necessarily available from the commercial sector.

Redefining the ILS

There are many open source library applications available today, and each satisfies a particular need. Maybe each of these individual applications can be brought together into a collective, synergistic whole as described by Jeremy Frumkin, and we could redefine the ILS. Presently our ILSs manage things like books pretty well. With the addition of 856 fields in MARC records they are beginning to assist in the management of networked resources too, but libraries are more than books and networked resources. Libraries are also about services: reserves, reading lists, bibliographies, reader advisory services of many types, digitization, current awareness, reference, to name but a few. Maybe the existing OSS can be glued together to form something more holistic - a sum greater than its parts.

OSS provides an opportunity for traditional library vendors as described by Schnell. Instead of writing computer programs, library vendors could support the documentation, installation, and integration of OSS for libraries in exchange for a fee. Libraries would feel much more comfortable with the applications running on their computers if those applications did not seem to so much beyond their control.

Open Source Data

OSS relates to data as well as systems, as described by Thomas Krichel. The globally networked computer environment allows us to share data as well as software. Why not selectively feed URLs to Internet spiders to create our own, subject-specific indexes? Why not institutionalize services like the Open Directory Project or build on the strength of INFOMINE to share records in a manner similar to the manner of OCLC?

Systematically describing Internet-accessible information resources with things like the Resource Discovery Framework (RDF) provides the means of implementing the Semantic Web. Libraries are about the collection, organization, dissemination, and evaluation of data and information for the purposes of facilitating knowledge. These are the same principles behind the Semantic Web - a tool for answering the perennial question, "Can you find me more like this one?" The library profession purports to excel at the classification of data and information. RDF represents one way to accomplish this goal in a globally networked environment. If we, as librarians, were to contribute to the efforts of the Semantic Web, then we would also be contributing to the efforts of open source data. 7

Another way to contribute to the open source data concept is to integrate ourselves into information creation processes of our hosting communities. Libraries do not exist in a vacuum. They are all a part of some sort of community. Each of these communities creates information and increasingly makes it available in digital form. By becoming a part of this process libraries may be able to make the information more accessible to a wider audience over a longer period of time. Again, this is fostering a concept of open source data.

Possibilities of OSS in Libraries

OSS presents many possibilities for libraries. First and foremost, it presents an opportunity to take control of library services and collections relying on computer software. The time and effort spent buying (read "licensing") software could be developed learning how to use the software. Time spent developing our own solutions to problems develops staff expertise. OSS lowers the barrier to this learning process because staff will not be limited by such things as who is allowed to use the software; since the software is freely given away, it is very easy to download, install, give it whirl, and evaluate whether or not more time should be spent on it.

Quality OSS rises to the top in the same way cream rises to the top of fresh milk. OSS goes through an informal review process. This process has nothing to do with market hype or self-promotion. Consequently, if you identify a piece of OSS and you know of other people using it, then you will know exactly what you are getting. There should be no surprises. This presents itself as another opportunity for libraries, not so much in terms of library services or collections, but in the time spent evaluating products. In the words of Shiyali Ramarita Ranganathan, "Save the time of the reader." 8

Instead of feeling helpless about how our online catalogs work (or don't work), or instead of wishing for some sort of software widget to "automagically" appear, OSS provides a framework - possibilities for resource sharing - in order to take control of our situation. We all have similar problems, needs, and desires when it comes to using computers in our libraries. If we were to take a greater stake in the use of OSS, then we would be more able to share our ideas among ourselves. This sharing of ideas will bring more minds together and ultimately create more robust solutions. Effective communication will still have to take place, but that is where the leadership comes into play. OSS does not solve communication problems.

OSS provides the means to give back to the Internet. By contributing OSS to the community at large, others will benefit from our experience. Similarly, if we, as a profession, contribute to the idea of open source data through the systematic description of Internet resources, then we will be helping people satisfy their information needs. We will be bringing like things together for the purposes of creating knowledge, not just gathering information. While information has never been free, the processes behind OSS can make it less expensive.

Conclusion

I am always excited about libraries and librarianship. The discussions on the oss4lib mailing list exemplify some of the opportunities for our profession. As Ben Ostrowsky put it, "[y]ears from now, this will be known as The Week It All Came Together." We can hope so. Let's hope the momentum can be sustained. Let's build on our strengths, continue to pool our resources, and spend our time, money, and energy on ways to improve our situation instead of bemoaning the perceived limitations. As Gordon Paynter said, "These are social problems, rather than technical." Let's explore our alternatives.



References and Notes

   1. The ideas behind GNU software and its definition as articulated by Richard Stallman can be found at www.gnu.org/philosophy/free-sw.html. Accessed Jan. 10, 2002.

   2. I elaborated on the similarities and differences between OSS and librarianship via a book review of The Cathedral and the Bazaar appearing in Information Technology and Libraries 19, no. 2 (June 2000): 105. See www.lita.org/ital/1902_books.html#anchor387677. Accessed Jan. 10, 2002.

   3. The Cathedral and the Bazaar is also available online at www.tuxedo.org/~esr/writings/cathedral-bazaar. Accessed Jan. 10, 2002.

   4. As an interesting aside, read "Stalking the Wily Hacker" by Clifford Stoll in the Communications of the ACM 31, no. 5 (May 1988): 484. The essay describes how Clifford tracked a hacker via a seventy-five-cent error in his telephone bill. It is on the Web in many places. Try http://eserver.org/cyber/stoll2.txt. Accessed Jan. 10, 2002.

   5. It is believed a past chairman of IBM, Thomas Watson, said in 1943, "I think there is a world market for maybe five computers."

   6. An archive of the oss4lib mailing list is available at www.geocrawler.com/lists/3/SourceForge/6067/0. Accessed Jan. 10, 2002.

   7. I personally think the ideas behind the Semantic Web are very intriguing. For more information about this effort see www.w3.org/2001/sw. Accessed Jan. 10, 2002.

   8. Ranganathan's Five Laws of Library Science are: (1) books are for use; (2) every book its reader; (3) every reader his book; (4) save the time of the reader; and (5) a library is a growing organism.


   Eric Lease Morgan (emorgan@nd.edu) is Head of the Digital Access and Information Architecture Department at the University Libraries of Notre Dame, Indiana.