Toward the Next Gen Catalog

By Karen G. Schneider | LibraryLand has seen much excitement since the ALA TechSource blog launched a little over a year ago. As much as Library 2.0 turns me on—Skype me, baby, 8 to the bar!—the trend that makes my heart go pitter-pat is a more subtle water-on-stone The Library Catalog: A Water-on-Stone Metamorphosismetamorphosis, one in which long-held perceptions and attitudes are changing, souls are becoming emboldened, and librarians push forward with new ideas. It's a trend loosely called "NGC," for NextGenCatalog, which does not refer to Land's-End mail-order shopping for college students, but is the set of future services we as a profession will provide for information discovery.

The ever-imaginative Eric Lease Morgan—Head of the Digital Access and Information Architecture Department at the University Libraries of Notre Dame—punted the NGC movement when he established and widely announced a discussion list for the same. Discussion on this list has been intense, lively, and sometimes fractious, but almost always worthwhile. A little later, Karen Coyle, formerly of the University of California system and now a consultant, launched futurelib, a wiki for documenting NGC ideas and concepts. Along the way, a team at NCSU's library rolled out the most telling rebuke to the traditional catalog—a catalog with another product layered on top of it to dramatically improve the search and browse experience for users.

The NGC discussions also have librarians asking hard questions, such as…
  • How are our users actually finding information (or trying to find information)?
  • Is the catalog a starting point or a destination?
  • How can we do a better job of presenting a unified but coherent interface to our books, journals, and other media?
  • Do we need MARC?
  • Should we continue cataloging "the way we have always done it" or should we examine the costs and benefits of current practices and put our money elsewhere?
The focus of NGC is the catalog—but it's not the catalog. The NGC movement builds on all the great work done on cataloging in the past, but for the most part refuses to think of "the catalog" as a monolithic and separate entity. Instead, one of the more interesting concepts about the NGC—not universally shared in these discussions, but increasingly gaining traction—is that the NGC is no longer a stand-alone catalog unto itself but offers sets of content and services that are malleable, portable, and intellectually related to other services we offer, such as journal articles, full-text books, and multimedia.

I'm Right—No, I'm Right
Different Pages... A Good Thing Not everyone's on the same page—and that's a Good Thing. To no one's surprise, questions about the future of cataloging stir up huge debates. Cataloging has contributed so much to our profession over the last hundred years, but we do need to ask how much of it continues to be justified. The real answers have to be evidence-driven and should be based on studies of user behavior.

As Art Rhyno, Systems Librarian at Leddy Library, University of Windsor, ruminated to me, perhaps the NGC is "a system without an interface on its own at all … something that is picked up by other indexers and uses the object descriptions typically found in a catalogue as a starting point for endless augmentation."

Parsing the Catalog
Many NGC enthusiasts slice and dice this "system" even further, pointing out that most integrated library systems can be divided into at least two functions: inventory and discovery.

The inventory function allows us to manage our resources. Traditional library catalogs do a reasonably good job of managing the acquisition, inventory, and circulation management of book or book-like content that libraries physically own, such books, videos, CDs. But NGC advocates point out that the typical ILS fails miserably at managing content that is not owned but licensed, such as e-books or electronic journal articles.

If there are a few bumps in the gravy for inventory management, the discovery function in traditional catalogs is a near-complete failure. As I have written before, most OPACs lack good post-coordination (how the results appear on the page after a search), spell-check, relevance ranking, the ability to suggest like items, and many other modern features found in search engines. As finding aids, OPACs just plain suck.

Many librarians talk about improving the finding function of OPACs, and some have added minor improvements to their catalogs—better relevance, some spell-check. These are palliative steps that don't quite address the larger problems.

NCSU Libraries' implementation of the Endeca search engine, layered over and communicating with their ILS, takes a more pragmatic tack: it assumes that the finding function simply isn't there, as if it were the "missing module," and adds discovery back into the catalog with a separate product.

Treating inventory and discovery as separate services is a savvy tactic for three reasons: it doesn't try to turn an inventory tool into a search engine (Andrew Pace's "lipstick on a pig" argument); it keeps the inventory and discovery functions separate enough that it would be less complex to replace one or the other, if the time came to do so (that's my "twin beds" argument for keeping content-management systems separate from other services); and finally, Endeca serves as an umbrella that can conceivably tie together different types of content, not just from the traditional catalog, but from other sources such as journal article databases. You can read the hint of things to come in the ITAL article, “Toward a 21st Century Library Catalog,” by NCSU librarians Kristin Antelman, Emily Lynema, and Andrew K. Pace.

"Endeca" has become almost synonymous with "improved search for library catalogs," but it's important to note that Endeca is only one of a suite of state-of-the-art search engines that could conceivably be used as the "discovery layer" for library catalogs—and full disclosure, at My Place Of Work we just implemented Siderean, a competing product, with what we consider to be outstanding, even luminous, results. Siderean, i411, Dieselpoint, and FAST are four major products with similar functionality compared to Endeca, and they are all worth looking at (and I'm sure there are more like them).

Even at that, however, the Endeca implementation showcases some of the more pernicious problems within how we deliver services. First, the browsing facets are largely generated by Library of Congress Subject Headings—a language that was not designed for browsing collections on the Web. A search for civil war brings up a lot of facets—but how useful are they? We can joke about LC's arcane language all we want, but we can't fault LCSH for not doing well at what it was never designed to do.

Second, improving the catalog as a finding aid does not necessarily improve its ability to be a destination. This is not to diss the NCSU/Endeca implementation, but to point out that we can make the catalog as wonderful as it could ever be, but our users may still never find it.

Consider a college student researching a paper. He or she will search the Web, look at courseware for an instructor's recommended readings, perhaps, if prodded or instructed, use a library database, and possibly, as a last resort, search the library catalog, which may or may not provide some hard-to-use bridge between books and journal articles for any given subject.

The real task for us is to find out the student's first-tier destinations—Google, courseware, the school's e-mail system—and make our content available there. In the public world, it's not that different—though it's more challenging. A user doing a casual search of the Web should be able to discover that his or her local library offers resources of value. In an academic environment, which is a kind of closed community, we have a fighting chance of that happening. In the Web at large, it's far more complex.

One small gulp of fresh air is that Google Book Search is finally incorporating links to library holdings much more widely . I don't know why—surely the complaints of a few bloggers can't have been all that influential—and I don't know how that fits into Google's business plan (since "Don't be evil" doesn't necessarily lead to "Do be good,"), but it's important. I still can't believe that major academic libraries went into huge contracts with Google Library Project without bargaining that point.

Finally, we can prink the catalog to death, but as long as it only points to some of our library resources, it's only part of the story. An interesting question is whether we try to fold our journal articles, e-resources, and more into the catalog or whether we make the catalog just one of many other resources accessible through a variety of other tools and pollinate broad dissemination of these services into the networked world. Perhaps it is the fetish of a monolithic unified interface that needs to be abandoned.

Ross Singer of Georgia Tech and Art Rhyno wrote an entry for the Talis Mashup contest that decouples catalog data from its source ILS and puts it into Google Desktop. It's a charmingly geeky idea "in the school of NGC," as I think of some of these experiments, and one that is a winner on its own terms, as a demonstration of the separation of the catalog data from the catalog itself.

MARC comes in for scrutiny in these discussions. Some argue for making it stricter and more complex; some argue for better profession-wide compliance with rules; some argue with doing it away, or moving to other forms, such as MODS, an XML schema that looks and feels MARC-like.

For all this talk about MARC, I think we need to start with the user and work backwards. We still haven't done enough work to determine where to put our emphases, how our users behave, what's really important, and what can be left behind. But in looking back at this past year, it's clear that many librarians are now fully awake to the problems and the possibilities of that thing we have called "catalog."
Technorati tags: , , , , , , , , , ,