American Libraries |
||
Site NavigationLeft Sidebar Items |
||
Technically Speaking
Library consultant for the Lincoln Trail Libraries System in Champaign, Illinois. Column for October 2001 The Secrets of Search SoftwareWhile browsing the IFLA exhibits in Boston in August, I encountered a very impressive searching front end that could be used with any structured database, such as a database of MARC records. While most catalog search interfaces give the user a choice of indexes to search, this one—TurboSearch, developed and distributed by Lingomotors—prompted the user for a natural-language query. When the query was entered, the software analyzed the user’s query and determined for itself which indexes to search. At the time I saw it being demonstrated, the product was searching a 1.5-million-record database of books. I queried the system with “Find a book about the Iliad and PTSD.” The software translated this question into “‘Iliad’ AND (‘ptsd’ OR ‘post traumatic stress disorder’ OR ‘shell shock’ OR ‘combat fatigue’ OR ‘battle fatigue’).” I also suspect that the software searched only the subject index, but I could not verify that definitely from what I saw or what I was told by the vendor rep. The search retrieved only two items: two editions of the one book I had in mind, Achilles in Vietnam. I know of no library management search software that can both recognize a boolean operator and take advantage of a thesaurus of related concepts in a single search. There is no magic to the software’s excellent performance, of course; any person who understands how MARC records are indexed and who can imagine a bit of front-end linguistic analysis on the input string (“about” = subject) can deduce what is happening. But the software was impressive nevertheless: Not only did it get the best possible result without the user needing to know anything about how bibliographic records are indexed or what information they contain, it did so without the benefits of LCSH authority control and in one easy natural-language search. (It is also interesting to note that the database of synonyms the software used linked the phrases “shell shock” and “post traumatic stress disorder,” which LCSH does not do. Nor are “combat fatigue” or “battle fatigue” headings or cross references in LCSH.) Searching secretsIt is software such as this that makes me wonder about the future of the library profession. Don’t get me wrong: I am not implying it will put librarians out of work. I’m confident the profession is secure against that kind of displacement. And I am not referring simply to the effectiveness of the software; I wish that IOLS vendors used such effective software. What concerns me is that what the software does is a trade secret: Its very functionality is proprietary and kept hidden. And what’s the big deal with that, you might ask. My fear is that if the functionality of library searches becomes as hidden from view as the functions of some Web search engines and other commercial search software being marketed to the business community, librarians will be effectively eliminated from participating in the science of information retrieval. Already there are many librarians who really don’t care about how searching works and are just as happy to let some black box software take care of the whole thing. I know many librarians whose eyes glaze over when uniform title or authority control is discussed. Today most libraries still use systems that don’t hide what goes on behind the scenes. While many librarians and most users don’t bother to find out how searches really work, enough do care and are more effective searchers for having this understanding. At present, the functionality of the searching software used by a typical IOLS vendor, as well as the MARC data structure and the cataloging rules, are all open for inspection. This openness is a large part of what makes the information-retrieval aspect of librarianship a skill we can all participate in. The software I saw did display the search string that it sent to the database server after it had processed my query. But this last bit of transparency could very easily be left out of the interface. What would remain would be a proprietary black box whose functionality librarians and other users could only guess at. If librarians could trust commercial vendors to provide us with honest and scientifically sound black box searching software, there might be little to be concerned about; but in the long run, what is hidden from view will get manipulated in ways we might consider inappropriate if we knew about them. Hits for dollarsRecently it was revealed that a number of Internet search companies were taking money from other companies hosting Internet sites. In return for money those company sites got preferential treatment by the search software. Imagine the consternation that would ensue if a library automation vendor’s searching software gave priority to certain publishers’ books because those publishers paid the vendor for special treatment. If this sounds far-fetched, think about a time in the future when library search engines will have become unexplorable black boxes provided by third parties. If what happens to a search query cannot be subject to scrutiny, what potential abuses of the trust we currently have in our library vendors’ search engines could take place without our knowledge? And what would then happen to the trust that patrons put in libraries if that trust were to be abused by the searching software provided by libraries? While the potential problem described above may not arrive any time soon, we are moving gradually toward a time when it will be all too likely to occur unless librarians take proactive measures to prevent it. There are several things that can be done. One is for the library community to band together to develop Open Source searching front ends, as well as search engines; these would, by virtue of their being Open Source, be subject to inspection by any who are interested. Another solution would be for libraries to develop searching standards and specifications that would mandate that the functionality of any searching software be fully documented. Such documentation would need to extend to pre- and post-search processing, as well as to the search itself. The library community is devoting considerable energy to developing open metadata encoding standards. I believe a similar amount of energy should be put into developing open-search processing specifications and standards. Contracts and Agreements
Announcements
|
Right Sidebar |
|