Nowhere to Hide

By Joseph Janes
American Libraries Columnist
Assistant Professor, Information School, University of Washington.
intlib@ischool.washington.edu
Column for September 2004
Several years ago I heard Brewster Kahle give a plenary address at a conference of the American Society for Information Science. I knew his work, of course: Back in the very early ’90s he developed WAIS—the Wide Area Information Server—which Internet librarians of a certain age will remember as one of the earliest search and publication devices on the Net. It was a little primitive and never quite fully workable, but in those heady days it was a great idea.
WAIS was way cool for its time, so I was intrigued to hear what Kahle was up to next. I’m sure I’ve gotten important aspects of this wrong, but he got up and started to hold forth about how much of the history of the Web we were losing and what a shame that was.
This was probably 1996 or so, and I was running the Internet Public Library at the time. Among the questions people often asked me then was whether we were going to build an archive of the Internet. I would politely reply that we were the Internet Public Library, not the Internet Public Archive, and that there was just too much stuff on the Net, and it changed too fast, and there was no real way anybody could save everything, and even if you could, who’d want it?
Which, of course, is precisely what Kahle said he was doing. (OK, so I was wrong.) The Internet Archive, familiarly known as the Wayback Machine now claims to hold about 1 petabyte of the old Web, which “eclipses the amount of text contained in the world’s largest libraries, including the Library of Congress.” That’s a million gigabytes, for those of you keeping score at home—and who’s the poor soul at LC counting the bytes?
That’s a lot of stuff. I’ve used the archive several times to find old versions of web pages. Beyond the sheer postmodern tourist archeology of it all, it’s come in handy for real-life purposes as well. It’s a great tool—and also means that lots of web pages that nobody ever thought would be preserved or see the light of day again, or even remembered, are stored for at least short-term posterity and universally available.
The Google cache maintains local copies of defunct web materials and can be accessed from normal searching, and Google Groups means that ancient Usenet postings won’t soon join the choir invisible. These, along with those abandoned web pages that never get updated or maintained but somehow consistently manage to come up during searches or are linked from good sites, form what we might call the Undead Internet. Not really dead, yet no longer alive either.
The irrevocable Internet
It’s easy to conclude that once you are “on the Web,” in loose terms, you are likely there to stay, and once you write or say or publish anything there, you might not be able to get it back again. It won’t be yours alone; it belongs to the Web and its users and maybe to the ages.
The pedestrian response to this is: Don’t put anything in an e-mail or on a web page/posting/blog that you wouldn’t want in the newspaper or in front of a Senate committee. But we already knew that. Self-expression has consequences, potentially longer lasting and enduring than most people realize or desire.
We’ve always thought of the Net as ephemeral, temporary, quicksilver. It may well be that it’s not exactly the freewheeling, memoryless place we all thought it was. This makes it sound, paradoxically, more like print to me. In a realm that prizes, fosters, and even depends on dynamism and interactivity, now we have tools that freeze, that make final, archival copies of documents never intended to be “done” or, for that matter, around for very long.
One further complication: Don’t forget too that these “archives” will only be around as long as they are maintained and supported by . . . whom?
This is kinda spooky. On the Internet, nobody knows you’re a dog; they may also not know whether you still mean what you said in that e-mail or web page or Usenet posting from 1999. If there are parts or aspects of the Web that linger like zombies, this raises even more serious questions about authenticity and authority in this environment. It appears we will have to figure out how to cope with a possibly short-term unintentional archive of the once-transient that can’t necessarily be believed or trusted. Slippery turf on which to build . . . but that’s another story.
|