$node.absurl

$node.contribution("Title")

$node.absurl


What I Learned This Week


Joseph Janes

By Joseph Janes
American Libraries Columnist 

Assistant Professor, Information School, University of Washington.
intlib@ischool.washington.edu

Column for April 2005


One of the great bonuses of being on a university campus is the range of presentations I regularly have access to. It’s a privilege, and one I don’t often get the chance to take advantage of; last week I went to two, and I’m glad I did.

Monday: Clifford Lynch, demigod of the information world, nominally executive director of the Coalition for Networked Information, but more appropriately known as that incredibly smart guy who travels the world searching out the novel, cool, and important and weaves them together into thoughtful ruminations. I never leave one of his talks without several dense pages of notes and an exhilarated sense of exhaustion.

This was no exception. Lynch covered a lot of ground, largely in scholarly publishing and communication. I can’t possibly do him justice in my space here, so I’ll hit highlights.

My favorite sentence? “We are only now getting over the assumption that we write articles to be read by other people.” If you want your writing to be widely found, it has to go through an increasing number of computational processes, including spidering, indexing, data mining, and machine translation; and new generations of such beasts are constantly under development.

The repository model

A great deal of the recent discussion around scholarly communication has focused on institutional repositories, which could be maintained by universities to provide access to the scholarly work of the faculty without going through the traditional journal-publishing model.

It’s possible that this could provide significant savings; Lynch lodged this in the larger context of attempts to open materials to wider audiences. This is a concern of museums with huge collections that can’t possibly all be put on display, state-supported colleges trying to demonstrate their value to taxpayers, and the growing movement to make government-supported research results available to the public that paid for them. He also suggested that a university-based repository might capture published output as well as more of the intellectual life of the campus, including symposia, performances, and even the work of student groups.

Finally (and believe me, I’m leaving out a ton here), he tossed in an observation about Google Scholar. There has been a lot of hand-waving about this (and Google Print, and Google’s other announcements du jour); as usual, I got a new insight from Lynch. Google Scholar, he said, is the company’s first approach into the invisible web and as such will raise the cost of entry for new potential web-search companies.

No longer will a couple of guys in a garage with a great idea and a few servers be able to make an impression in the search world. The deep web will have to be involved, which means deals with content providers. All of a sudden the web-search world starts to look a lot more like the traditional content industry (and, not surprisingly, this same week the New York Times Company announced it was buying About.com).

Friday: Oren Etzioni, from the University of Washington computer science department, spoke on the future of web searching. His was a far more technical talk (and one which drew noticeably far fewer librarians), plunging me deep into my background in information retrieval. His first sentence was easy enough to understand: Web searching is moving from document-retrieval to question-answering, because that’s what people really want.

That got my attention. He and his research team have been working on a system that would scour the web for the answers to questions, admittedly in restricted domains like cities and states, but also product reviews (automatically determining if they’re positive or negative). There’s a great deal of interesting work here, from probabilistic information-retrieval (straight out of information science, by the way) to artificial intelligence. I came away impressed by the technical quality of the work and the progress they’ve made.

I can’t tell you whether this particular system (KnowItAll) will thrive; it’s hard to imagine, though, that every major search company isn’t spending big bucks to produce a good, general-purpose question-answering system, and it’s worth assuming someone will eventually succeed.

Quite a week. I’m still processing it all, but I’ll tell you this: I’m glad I’m part of a profession that embraces and fosters change and complexity. As the nature of documents and their use evolves, and technology enables increasingly sophisticated manipulation and creation of them, a profession that clung stubbornly to tradition for tradition’s sake would be in a world of hurt . . . but that’s another story.

.