American Library Association | Search ALA | Contact ALA | Give ALA | Join ALA | ALA FAQ | ALA Login

American Libraries



Site Navigation







Left Sidebar Items


Technically Speaking


David DormanBy David Dorman
American Libraries Columnist
ddorma@ltnet.ltls.org

Library consultant for the Lincoln Trail Libraries System in Champaign, Illinois.

Column for October 2001


The Secrets of Search Software

While browsing the IFLA exhibits in Boston in August, I encountered a very impressive searching front end that could be used with any structured database, such as a database of MARC records. While most catalog search interfaces give the user a choice of indexes to search, this one—TurboSearch, developed and distributed by Lingomotors—prompted the user for a natural-language query. When the query was entered, the software analyzed the user’s query and determined for itself which indexes to search. At the time I saw it being demonstrated, the product was searching a 1.5-million-record database of books. I queried the system with “Find a book about the Iliad and PTSD.” The software translated this question into “‘Iliad’ AND (‘ptsd’ OR ‘post traumatic stress disorder’ OR ‘shell shock’ OR ‘combat fatigue’ OR ‘battle fatigue’).” I also suspect that the software searched only the subject index, but I could not verify that definitely from what I saw or what I was told by the vendor rep. The search retrieved only two items: two editions of the one book I had in mind, Achilles in Vietnam.

I know of no library management search software that can both recognize a boolean operator and take advantage of a thesaurus of related concepts in a single search. There is no magic to the software’s excellent performance, of course; any person who understands how MARC records are indexed and who can imagine a bit of front-end linguistic analysis on the input string (“about” = subject) can deduce what is happening. But the software was impressive nevertheless: Not only did it get the best possible result without the user needing to know anything about how bibliographic records are indexed or what information they contain, it did so without the benefits of LCSH authority control and in one easy natural-language search. (It is also interesting to note that the database of synonyms the software used linked the phrases “shell shock” and “post traumatic stress disorder,” which LCSH does not do. Nor are “combat fatigue” or “battle fatigue” headings or cross references in LCSH.)

Searching secrets

It is software such as this that makes me wonder about the future of the library profession. Don’t get me wrong: I am not implying it will put librarians out of work. I’m confident the profession is secure against that kind of displacement. And I am not referring simply to the effectiveness of the software; I wish that IOLS vendors used such effective software. What concerns me is that what the software does is a trade secret: Its very functionality is proprietary and kept hidden. And what’s the big deal with that, you might ask. My fear is that if the functionality of library searches becomes as hidden from view as the functions of some Web search engines and other commercial search software being marketed to the business community, librarians will be effectively eliminated from participating in the science of information retrieval.

Already there are many librarians who really don’t care about how searching works and are just as happy to let some black box software take care of the whole thing. I know many librarians whose eyes glaze over when uniform title or authority control is discussed.

Today most libraries still use systems that don’t hide what goes on behind the scenes. While many librarians and most users don’t bother to find out how searches really work, enough do care and are more effective searchers for having this understanding. At present, the functionality of the searching software used by a typical IOLS vendor, as well as the MARC data structure and the cataloging rules, are all open for inspection. This openness is a large part of what makes the information-retrieval aspect of librarianship a skill we can all participate in.

The software I saw did display the search string that it sent to the database server after it had processed my query. But this last bit of transparency could very easily be left out of the interface. What would remain would be a proprietary black box whose functionality librarians and other users could only guess at.

If librarians could trust commercial vendors to provide us with honest and scientifically sound black box searching software, there might be little to be concerned about; but in the long run, what is hidden from view will get manipulated in ways we might consider inappropriate if we knew about them.

Hits for dollars

Recently it was revealed that a number of Internet search companies were taking money from other companies hosting Internet sites. In return for money those company sites got preferential treatment by the search software. Imagine the consternation that would ensue if a library automation vendor’s searching software gave priority to certain publishers’ books because those publishers paid the vendor for special treatment. If this sounds far-fetched, think about a time in the future when library search engines will have become unexplorable black boxes provided by third parties. If what happens to a search query cannot be subject to scrutiny, what potential abuses of the trust we currently have in our library vendors’ search engines could take place without our knowledge? And what would then happen to the trust that patrons put in libraries if that trust were to be abused by the searching software provided by libraries?

While the potential problem described above may not arrive any time soon, we are moving gradually toward a time when it will be all too likely to occur unless librarians take proactive measures to prevent it. There are several things that can be done. One is for the library community to band together to develop Open Source searching front ends, as well as search engines; these would, by virtue of their being Open Source, be subject to inspection by any who are interested. Another solution would be for libraries to develop searching standards and specifications that would mandate that the functionality of any searching software be fully documented. Such documentation would need to extend to pre- and post-search processing, as well as to the search itself.

The library community is devoting considerable energy to developing open metadata encoding standards. I believe a similar amount of energy should be put into developing open-search processing specifications and standards.

Contracts and Agreements

  • VTLS—with the Caixa Laietana Bank and Diputacio de Barcelona, both in Spain, for upgrades from older VTLS systems to Virtua, to be installed in the public libraries served by the two institutions.
  • Endeavor—with the University of Montana for a Voyager system for the university’s four campuses; and with the National Library of New Zealand and Auburn (Ala.) University Libraries, current Endeavor customers, for ENCompass systems, the company’s digital library navigation, organization, and linking tool.
  • Ex Libris—with Daimler Chrysler AG of Stuttgart, Germany, for an Aleph 500 system to replace the library’s Dobis/Libis system; and with the Curtin University of Technology centered in Perth, Australia, for an Aleph 500 system, MetaLib/SFX, and DigiTool, to replace a DRA system at the university’s campuses in Western Australia.
  • Sirsi—with the Louisiana Library Network, a statewide consortium of 28 academic libraries, for the Link Library Management System, to replace the Network’s NOTIS system; with the Mississippi Library Commission, for the Unicorn Library Management System and iBistro, to be installed in MLC’s resource library, to replace the commission’s Dynix system.
  • Innovative Interfaces—with the University of Burgos in Spain, for a Millennium system to replace the existing Libertas system installed in 1996.
  • TLC/Carl—with the Klein Independent School District in the northwest suburbs of Houston, for a Carl IMDS system to upgrade the Klein’s current Carl system; and with the Chesterfield County Public Schools near Richmond, Virginia, for TLC’s Library.Solution system for the system’s 57 sites.

Announcements

  • EOS International has announced what it believes to be the world’s first integrated library, archive, and museum system that will be able to create records in the MARC format, the ISAD(G) format, and the Spectrum format, with records in all formats sharing common authority files.
  • TLC/Carl and VTLS have announced agreements with Syndetic Solutions to provide enhanced bibliographic data to their respective customers, TLC/Carl through its YouSeeMore product and VTLS through its Chamelion Gateway XBS OPAC.

Right Sidebar

AL Joblist
AL Store