American Libraries |
||
Site NavigationLeft Sidebar Items |
||
The Relevance of “Relevant Relevance”
Head of Systems, North Carolina State University Libraries, Raleigh. Column for October 2005 Though I’ve written about it before, and I enjoy a good dose of navel-gazing as much as the next librarian, I have to admit that I grow tired of endless (often fruitless?) discussions regarding the relevance of our profession. When Jonathan Livingston Seagull author Richard Bach was not busy writing about talking seagulls, he actually came up with some quotable quotes. One of my favorites is: “Argue your limitations and sure enough they will be yours.” Lately, I’ve been interested in a different kind of relevance: the algorithmic relevance of our library systems in particular, and in internet technologies in general—that is, the order in which search results are returned from automated systems. Simply put, how does the system determine what’s relevant to a query? I get a mythical vibe when librarians and vendors talk about relevance in their systems. I’ve heard them described as “black box” technologies, complex mathematics wrapped in proprietary technology with impenetrable nondisclosure agreements for anyone lucky enough to get a peek under the lid of the box. My personal experience has shown that most are so simplistic that calling them relevant is flattering; I accuse the OPAC most vehemently. For the OPACs that do have relevance, one wonders what kind of rankings are applied to their MARC records’ controlled vocabulary. What works for Google and Yahoo sure won’t work on precision metadata; what is good for the goose is not so good for the gander. I’m sure OCLC learned this lesson when it turned over WorldCat to search engines that index non-metadata content. Internet search engines have to account for content providers trying to fake relevance. Libraries, on the other hand, take some of the most controlled vocabulary ever created (specifically Library of Congress Subject Headings and name authority files) and wonder what a keyword search can offer. Library science has not added much to keyword advancement, making its promise, if not mythical, then at least mystical. Mystic rankingsOK, so I can’t decide if relevance algorithms are mythical or mystical. Keyword relevance is made mystical by our profession’s strict devotion to cataloging rules in general, and authority work in particular. I mean to slight neither, believe me. I rank (no pun intended) the WorldCat cooperative catalog and name and subject authorities among the top five greatest creations of our profession. (I’ll have to come up with the other three in future columns, I guess.) Nevertheless, the notion that relevant keyword searching and scholarly use of subject headings are somehow mutually exclusive diminishes the utility of both. While I agree with many of his contentions, I take issue with Thomas Mann’s passionate defense of LCSH in “Research at Risk” (Library Journal, July 15). Mann asserts, “Keyword search algorithms, no matter how sophisticated their ‘relevance ranking’ capabilities, cannot turn exactly specified words into conceptual categories.” I might have added “yet” to that sentence; Mann goes on to compare the inadequacies of Google to the scholarly precision of LCSH. The primary flaw in his argument is comparing the precision and beauty of LCSH to Google keyword. More relevant would be discussion of meaningful keyword relevance applied to the OPAC itself, not an apples-and-oranges contrast of still-disparate technologies. Simply put, what will be the impact of faceted search, natural language, robust keyword searching, and word clusters on the OPAC in general and on the future of authority searching specifically? Raising the barIt’s still debatable, but I think citation databases are better at creating relevance in their search results—certainly better than OPACs. Of course, no two do it the same way, but at least there’s no paid placement and dollar figures going into the relevance algorithms of scholarly and popular articles (time for that “yet” word again). Nevertheless, cracking the code of so-called relevance can be difficult, and finding comprehensive evaluation of database relevance algorithms is nearly impossible in either library literature or in any empirical studies conducted by libraries. (Someone correct me if I am wrong!) Lexis-Nexis tells me my results are “sorted by relevance”—comforting, but not ultimately convincing. Much more prevalent is “relevant equals recent,” or what I call “last in, first out” relevance: Would that it were always so. Relevance is a tricky business, and it’s only going to be made more difficult by emerging technologies. I like to describe a lot of library technology problems with my 9-foot, 11-inch plank metaphor. Much of what libraries try to do is get that plank across a 10-foot chasm. But as soon as it’s up on one side, it slips off the other. Well, as soon as we start to get a grasp on something like relevancy ranking, along comes metasearch, and suddenly there are new problems. How do systems communicate and reconcile multiple relevance algorithms to metasearch agents? And what of full text? I have argued in the past that even over-indexing a big MARC record can mess up the relevance of keyword searching. What happens when the entire text is included? How will Google Print indexing compare to (more relevant) MARC record keyword indexing? Proximity, Boolean operators, and the other tricks of our trade will likely be no match for advancements in the full-text indexing going on outside our industry. I remain optimistic about next-generation search and browse technology in library systems. A new age is definitely dawning. The challenge for libraries will be both understanding and embracing the changes that are ahead of us. Contracts and agreements
East Hampton (Conn.) Library has chosen the ASP-hosted solution, replacing Dynix.
Glenbrook (Ill.) High School District 225, replacing CASPR; Eugene (Ore.) School District, replacing Dynix.
University of Huddersfield in England, replacing Dynix Horizon. Announcements and alliancesThomson Scientific has created an XML gateway for its ISI Web of Knowledge. Collaborating with Ex Libris, maker of the MetaLib metasearch software, the move is good news to all metasearch software companies and the libraries that use portal technologies to access multiple databases. Thomson joins the growing group of database providers who are realizing that their native web interface is not always the portal of choice. OCLC has announced two key promotions of internal staff. Chip Nilges assumes the role of vice president, new services, and Mike Teets will be vice president, global product architecture. Nilges is most recently known for his development of new services tied to WorldCat. Teets’s team most recently introduced the new WebJunction service. Teets will oversee major OCLC products, such as OCLC Connexion, WorldCat Resource Sharing, and FirstSearch. WebFeat, a leading provider of metasearch technology to libraries, has partnered with MediaLab, creator of AquaBrowser Library, a keyword technology that presents users with faceted browsing options and a word cloud of metadata terms. New to catalog interfaces through its partnership with TLC, WebFeat is expanding into the world of citation database metadata. Queens Borough Public Library in Jamaica, New York, will be the first library to launch the new interface for both OPAC and database access. |
Right Sidebar |
|