Open Source, Open Standards

Karen Coyle

When people speak of open source software they are referring to computer code - programs that run. But code is only the final step in the information technology process. Prior to writing code the information technology professional must do analysis to determine the nature of the problem to be solved and the best way to solve it. When software projects fail, the failure is more often than not attributable to shortcomings in the planning and analysis phase rather than in the coding itself. Open source software provides some particular challenges for planning since the code itself will be worked on by different programmers and will evolve over time. The success of an open source project will clearly depend on the clarity of the shared vision of the goals of the software and some strong definitions of basic functions and how they will work. This all-important work of defining often takes place through standards and the development of standards that everyone can use has become a movement in itself: open standards.



Open standards are publicly available standards that anyone can incorporate into their software. An example from the library environment is the MARC record standard. The original documentation for the MARC record was published by the American National Standards Institute. 1 The most common use of the standard, that of the MARC21 records that libraries adhere to, is also published and available for use. No one owns the MARC record format; there are no fees for its use and no restrictions on who can use it in their products. Any software developer who wishes to write for library systems therefore has access to a vital part of the system needs: the basic data structure that libraries use today.

This may seem so obvious that its importance is hard to grasp. In fact, the library world has probably made more use of open standards than practically any other industry. Let's face it, "open" is practically our middle name. Examples from the non-open world of proprietary software might help us understand the importance of our preference for open standards, and the examples are not hard to find: Microsoft Windows versus the Macintosh operating system; VHS versus Betamax; Nintendo versus Sega. In each case you have unique products that are inherently incompatible. As a matter of fact, this incompatibility is purposeful and actually enhanced by the companies in question as part of their market strategy. If you need to compete, then openness is a disadvantage. If you need to cooperate, then openness is the way to go.

Goals of Open Standards

Open standards can serve multiple needs. The most common one is the need for interoperability. Interoperability refers to communication between systems or system parts. In the highly networked world of the twenty-first century, the ability for computer systems to exchange data in order to carry out basic functions is absolutely vital since most systems operate in a vast and varied digital community. Our library systems communicate electronically with sources of bibliographic records, book vendors, and users. They also now interconnect themselves with networked information resources outside of the library and deliver these through library-maintained interfaces. Much of this communication is through open-standard interfaces, such as Z39.50, Electronic Data Interchange (EDI), and hypertext transport protocol (HTTP). 2 These standards operate at the point where system boundaries touch; they determine the rules of the digital membrane but do not determine how systems handle data up to that point of permeability. Internally, few systems store bibliographic data in the format prescribed by ANSI Z39.2, the basis for the MARC record. But they are able to transform the data into that format for communication with other systems.

Another purpose of open standards is to create the framework for a community. In many ways this is the prime reason for many library standards. The use of common cataloging rules does not so much allow libraries to intercommunicate as it does create a certain look and feel and a commonality between libraries that is an aid to users. It allows users to move between libraries without having to learn a whole new process for finding materials, and it makes it possible for the library profession to train librarians and hire from among a pool of candidates. The cataloging rules, published and readily available to anyone with the desire and patience to learn them, contributed to the rise of professional (rather than artisan) librarianship. Creating the rules brought members of the library community together to ponder not only the vagaries of title pages but also to confront some basic philosophical issues about the organization of knowledge.

Today, in a world where many activities are performed through computer programs, open standards can be promulgated as a way to encourage decentralized development. Much of the work of the World Wide Web Consortium (W3C) falls into this area. The W3C is a membership-sponsored standards body that creates new standards for the Web. These standards can be used by anyone writing software for the Web. What is critical about many of these standards is that they set the foundation for entirely new Web functions; functions that will only work if many different people develop their part of the software that is needed.

This is rather hard to describe but should become clear with an example. I'll use the recent development of the Platform for Privacy Preferences (P3P). 3 P3P is a set of rules that allows Web sites to describe their privacy practices in a standard way. It would also allow Web users to express their "privacy preferences" using the same standard vocabulary. P3P does not specify how this will be implemented on the Web; the development of actual software will be left to the rather amorphous Web community. For P3P to be part of the Web, it will be necessary for Web site owners to incorporate P3P into their sites, and for Web browsers to create a user interface to the function. But for P3P to be successful, it needs to be recognized by all major browsers (Internet Explorer, Netscape, and AOL), and it must be used by a large number of Web sites. Since many companies and institutions make use of software like FrontPage or Cold Fusion to develop their large and complex Web sites, tools for building P3P will need to be included in these packages. By specifying a standard for privacy preferences, the W3C is attempting to set in motion a very decentralized software development project that will need to be undertaken by a wide variety of players.

Sort of Open versus Really Open

Although we speak about open standards, some are more open than others. This is because there are a variety of aspects to open standards, and standards that call themselves open do not always adhere to all of these.

Open standards are:

  • standards that anyone can use to develop software or functions;
  • standards in which anyone can participate in their development and modification; and
  • standards that anyone can obtain without a significant price barrier.

The best example of standards that meet all of those criteria are those created by the Internet Engineering Task Force (IETF). The IETF dates back to pre-Internet days, when it was a group of engineers working on the first developments that eventually became the basis for that network. These engineers developed a way of chronicling and communicating their technical ideas through a series of documents called Requests for Comments or RFCs. 4 The first RFCs were almost in the form of notices ("OK, I'm going to send packets with a 5-byte header, let me know if you can read them"), but as time went on the RFCs became well-thought-out standards that had been developed by groups of volunteers. Anyone can comment on the RFC, either to point out errors or to make suggestions. Even after the technical decisions in the RFC are accepted and implemented, the RFC remains an RFC. Some RFCs improve or comment on previous ones, as technology changes or as better ideas arise.

The functioning of the IETF is like a lesson in democracy: one person or a group of people sees the need for a new or modified function for the Internet; they draft a proposal which is placed on the Internet for anyone to read and comment on; if the proposed function meets a need and is successfully tested with an actual program, it becomes part of Internet use. The IETF is open to anyone who wishes to participate. That last statement needs qualification, however: participation in the IETF requires a high level of technical knowledge and a considerable amount of a person's time. Those who make up the various IETF committees are a self-selected technocracy. And while the philosophy of the IETF is one of engineering "purity," today's committees invariably have members who represent technology companies that often have a particular bias toward their own products. Still, there is no other standards organization that is as open as the IETF, and there is still considerable input from the academic and research communities.

This can be compared to the W3C, the standards organization formed to develop and promulgate standards for the Web. Participation in the W3C is limited to members - predominantly technology companies - who pay between $5,000 and $50,000 per year to belong to the group. Compared to the IETF, this group is lacking the academic and research engineers who bring a financially neutral viewpoint to the discussions. There are also almost no members who might represent a public interest viewpoint. This latter is significant because the W3C does not limit itself to standards of engineering; there is an effort called Technology and Society (within which P3P was developed) that develops standards for functions like content filtering and privacy.

There are a number of other standards bodies, such as the National Information Standards Organization (NISO), the International Standards Organization (ISO), and the American National Standards Institute (ANSI). These organizations have members who participate directly in the development of standards. The standards, once developed, are not only open for use, but some of them are actually mandatory within certain industries. Obtaining the actual text of the standards is, however, another question.

Standards-making is an expensive enterprise and standards bodies have traditionally made money on the sale of the printed form of their standards. Since many companies and organizations would be required to adhere to the standard, this provided a kind of guaranteed audience for the standards documents, many of which carried rather hefty price tags. The W3C, having arisen from the Internet community (and with the example of the IETF preceding it) makes its standards available for open access on the Web. In comparison, the document from ISO describing the Universal Character Set which all modern computing is moving toward is priced at about one hundred dollars. Although it isn't a huge price if viewed in light of the research and development budget of a company, it does make it difficult for small organizations, nonprofits, schools and libraries, and individuals to make active use of the standards. Responding to these needs and to the move toward greater openness in the standards area, in 2000 NISO became the only national standards organization making its standards available over the Internet for free. There is some risk because this removes a significant revenue stream from the organization. The gain is that the organization should be even more successful in its primary mission, which is that of providing standards for widespread use.

Open Standards and Libraries

The first of the library technology standards was the decision at the first annual ALA meeting in September of 1877 to standardize the catalog card at 7.5 x 12.5 cm. 5 While this was intended to make mass production of cards possible (and by analogy more standardized production of card cabinets as well), the advantages of an open standard manifested themselves when in 1898 the Library of Congress (LC) began its printed card service. This was possible only because libraries in the United States were using the same sized card and thus filing into cabinets that held cards of that size. We can consider the LC card service the technological predecessor of the MARC record service of the latter half of the twentieth century. The card-size standard was its key to interoperability.

The next technological standard of great interest was the computerization of those same cards through the MARC record standard. Prior to the development of what we now think of as MARC, a group of librarians led by Henriette Avram of LC developed a machine-readable record format standard for bibliographic data, ANSI Z39.2. This standard made use of other national standards, such as the ASCII character set. Although at the time only LC had the capability of producing the records (and the motivation to do so), this is arguably the most significant technological development of modern librarianship. By establishing an open standard for machine-readable records, LC created the basis for the computerization of library catalogs. That wasn't the intention in 1965 when Z39.2 was proposed, however. LC was focused on automating its card production services and creating a print-on-demand card service. Like Dewey's desire to reduce the cost of card-stock production, the LC standard, because it was open, was available to be used in ways that its creators had not yet imagined.

Few library open standards have been as successful as the MARC standard. Since 1965 arguably the most widely used standard is Z39.50, the protocol for information retrieval from remote databases. Z39.50 takes advantage of the existence of searchable bibliographic databases in library automation systems and the networking provided through the Internet. The protocol had a somewhat slow beginning, partly due to its complexity, but today the functionality is included in most library system packages and there are even open source versions of the software.

Other standards have been less successful. One example is the Common Command Language (CCL), Z39.58. CCL is a standard set of commands for searching in online catalogs that was developed by NISO in 1992. When the standard came up for its five-year NISO review the organization's members allowed the standard to lapse. Although some systems claim to use a common command language, these generally do not use the standard commands defined in the NISO standard. So how did a standard become not a standard after all?

The reason for creating a common command language was not unlike one of the original motivations behind a standardized set of cataloging rules: the uniformity between libraries makes it easier for users to move from library to library. A common command language is especially important in current times because users may be using a number of library systems almost simultaneously over the Internet. Why would such a useful standard fail? There are a number of reasons why standards might not be adopted. One of the obvious ones in terms of the CCL standard is the fact that the technology that the standard responded to, the command-line interface to library databases, was eclipsed by a new technology, the Web browser.

Although some command-line searching remains, it is not the main user interface. Another reason for the lack of adoption of the CCL is something that gives standards development a tricky aspect: people seem less likely to accept standards that affect the content aspects of their computer systems. Successful standards tend to define background functions, and leave a great deal of flexibility for system developers in terms of presentation. For example, the protocols that control the Internet e-mail function do not dictate how e-mail will be presented to the user. Everything from the command line Pine e-mail software to the almost user-obsequious Microsoft Outlook product make use of the same e-mail protocols. Yet another reason is that standardizing the command line gains you very little where the underlying indexes of the system are not themselves standardized. The command line is merely the interface to a much more complex set of decisions about what fields feed into what indexes, and about how the data in those language-based fields is treated for the purpose of searching.

The lesson here is that not all aspects of systems are ideal candidates for normalization. Whether rational or whimsical, system developers clearly express a need to have a certain amount of freedom. Standards need to facilitate functionality without suppressing the creativity of system developers or their ability to meet the needs of their particular target audience. Standards work best in the underlying technology layers and less well the closer one gets to the actual user.

Some library standards currently in development might fit this bill. For example, the NISO Circulation Interchange Protocol (NCIP) standard for interlibrary loan (ILL) is intended to facilitate interoperability between library systems for ILL transactions. 6 ILL is an obvious area where communication between diverse systems is needed for automation of the function.

Libraries don't live by library standards alone, however. Increasingly our library systems are interacting with the wider world of technology, delivering library services over public networks. We use mainstream standards such as the Internet protocols developed by the IETF, the Web protocols of the W3C, the character sets defined internationally by the ISO. Library representatives were heavily involved in the latter effort, having already participated in the development of a similar standard known as Unicode. However, there is virtually no library participation in organizations such as the IETF or W3C, even though the standards developed by these organizations are vital to our operations. Not only are libraries missing from the standards groups, so also are schools and nonprofit organizations, which are kept out not only by the membership fees but also by the labor requirements for active participation: the need to dedicate a significant amount of time of a highly skilled technical worker to the standards process.

While it is unlikely that individual libraries would be able to be active in a standards organization, we now have a possible model for greater library participation: in 2000, the American Library Association (ALA) joined the Open eBook Forum (OEBF), an industry group working on e-book standards. By leveraging the strength of ALA's membership it has been possible to spread the burden of participation while at the same time provide a visible library presence for the standards process.

Conclusion

The Internet has given us an entirely new model for the cooperative development of highly complex systems and subsequently of the standards that allow those systems to work. Although the Internet has not lived up to some of the Utopian promises of its early days, it still allows a low entry barrier for active participation; so low in fact that individuals can create their own Web sites right on the same network beside those of major companies. We might not be able to give all of the credit to the IETF and its example of open standards, but it is clear that open standards are an essential element in the success of the Internet and its widespread use. Continuing the open standards tradition will be essential for its continued success.



References and Notes

   1. National Information Standards Organization (U.S.), Information Interchange Format (Bethesda, Md.: NISO Pr., 1994.) National Information Standards Series ANSI/NISO Z39.2-1994.

   2. National Information Standards Organization, Information Retrieval (Z39.50): Application Service Definition and Protocol Specification (Bethesda, Md.: NISO Pr., 1995.)

   3. Full documentation on P3P is available at www.w3.org/P3P. Accessed Oct. 2, 2001.

   4. There are a number of sites that house searchable copies of the IETF RFCs. The official IETF RFC site is www.ietf.org/rfc.html. Accessed Oct. 2, 2001.

   5. Wayne A. Wiegand, Irrepressible Reformer: A Biography of Melvil Dewey (Chicago: ALA, 1996): 53-54.

   6. NISO Circulation Interchange Protocol is a draft standard, available for review and testing. Accessed Oct. 2, 2001, www.niso.org/committees/committee_at.html.



Related URLs

American National Standards Institute
www.ansi.org

Internet Engineering Task Force
www.ietf.org

International Organization for Standardization
www.iso.org

Library of Congress MARC Standards Office
lcweb.loc.gov/marc

National Information Standards Organization
www.niso.org

Open eBook Forum
www.openebook.org

World Wide Web Consortium
www.w3.org


   Karen Coyle ( www.kcoyle.net) is a Systems Developer at the California Digital Library, Oakland California.