Unearthing the Invisible Web: Business Resources Search Engines Don't Find

Disclaimer

ALA Midwinter Meeting, New Orleans
Sunday, January 20, 2002
9:30 - 11:00 a.m.
Marriott New Orleans/Mardi Gras BR G&H


BRASS Discussion Group presentation

Gary White
Head, Schreyer Business Library
Penn State University


Glenn S. McGuigan
Business Reference Librarian
Penn State Harrisburg


If you have any questions or need information, please contact Lisa O'Connor,
Steering Committee Chair, at loconnor@lms.kent.edu


This session will begin with the basics of the Invisible Web and focus on business resources that are not retrievable using standard Web search engines. It will include background information on the structure of the Web and search engine mechanics, as well as cover specific "invisible" business web sites. A bibliography of sources will be distributed. The speakers will present information and moderate the ensuing discussion.

Outline | Text | Bibliography



Invisible Web

  • What is the Invisible Web?
  • Why is it important?

Search Tools

  • Search Engines
  • Directories

Why is it Invisible?

  • Static vs. Dynamic Pages
  • Databases
  • Problems with Terminology

What is the Invisible Web?

"The invisible Web," as described by Michael Dahn in the July/August issue of Online, refers to the "content that is not searchable using traditional public search engines like AltaVista or Northern Light" (Dahn, 2000).

An article in a business trade magazine offers the estimate of the invisible Web being more than 500 times larger than the content that traditional search engines can index. "New Search Engine Provide Access to the Invisible Web." Direct Marketing, Garden City, NJ; May 2001, Anonymous).

Why is is important?

Considering the scope, it is obvious that this is content that must be acknowledged.

While this content has been available since the advent of the Web, surprisingly little has been written about it considering the massive scope of content that is hidden from the view of search tools.

Chris Sherman and Gary Price have been the critical voices discussing the value of the Invisible Web.

Search Tools: Directories

Often incorrectly referred to as "search engines."

A directory is a search tool that categorizes information into hierarchical classifications.

The Yahoo! Site is one of the most popular of these search tools.

Search Tools: Search Engines

The program of a search engine that looks for Web pages is referred to as a "spider" or a Web crawler.

A search engine is a search tool that "indexes keyword within some or all documents in Web sites. Keywords are found within a document and have contextual meaning to that topic. A search engine matches your keywords with its index" (Reding, Elizabeth E. 2001. Building an E-Business from the Ground Up. New York: McGraw Hill p. 18).

Directories Search Engines
Inherently Small No inherent or artificial size restriction
Selected links chosen for quality Mass quantities of links, no quality control
Poor for exhaustive searches Good for exhaustive searches
Can include limited invisible web content, but don't allow searching of it Can technically include and allow searching of some invisible Web content
Often point to Web site top-level or home pages, but no deeper Typically index the full text of many, if not all, pages on every site
Adapted from Table 2.3 Directories Vs. Search Engines in Sherman, Chris and Price, Gary. The Invisible Web: Uncovering Information Sources Search Engines Can't See. (Medford, NJ: Information Today, 2001).

Why is it invisible?

Static Vs. Dynamic Pages - The spiders of search engines cannot locate dynamically generated pages or are programmed not to locate them.

Much of the invisible Web exists behind databases.

Problems with Terminology - Invisible Web vs. Deep Web - are these sites really invisible?


Unearthing the Invisible Web for Business Information

What is the Invisible Web?

Much of the content that exists on or within the World Wide Web cannot be located through the use of most search tools. This content has been referred to as the invisible Web or the deep Web and is an important topic for discussion by librarians, especially business librarians.

"The invisible Web," as described by Michael Dahn in the July/August issue of Online, refers to the "content that is not searchable using traditional public search engines like AltaVista or Northern Light" (Dahn, 2000). The majority of this material exists in specialized databases that cannot be located by search engines. While this content has been available since the advent of the Web, surprisingly little has been written about it considering the massive scope of content that is hidden from the view of search tools.

The amount of content that is hidden from search tools is dramatically larger than the content that is retrievable. An article in the business trade magazine Direct Marketing offers the estimate of the invisible Web being more than 500 times larger than the content that traditional search engines can index. In one of the first article to discuss the invisible Web published in Online magazine in 1999, author and Web editor Danny Sullivan discusses how little content search tools actually locate from the Web. His estimates from that time "showed that the best search engine (HotBot) covered only 34 percent of the 320 million pages the study estimated to exist, while the worst (Lycos) covered only 3 percent" (Sullivan, p. 30, 1999).

Valuable business resources compose a significant portion of the invisible Web. For those interested in business information, reliance only upon the results generated by search engines will exclude such a dramatically large amount of information that it demands inquiry into what many of us have been missing by not learning about the Invisible Web.

Search Engines - What is a search engine?

To answer this question, it is necessary to look at search tools in general and examine the difference between directories and true search engines. Directories and search engines compose the two main search services on the Web.

Directory -- A directory is a search tool that categorizes information into hierarchical classifications. Rather than a robot-driven search aid, a directory is created by humans who classify and categorize Web pages. The Yahoo! Site is one of the most popular of these search tools. While directories are commonly, and mistakenly, referred to as search engines, these tools place Web sites into categories and provide access to users based upon these pre-selected items.

A search engine, on the other hand, is a search tool that "indexes keyword within some or all documents in Web sites. Keywords are found within a document and have contextual meaning to that topic. A search engine matches your keywords with its index" (Reding, p. 18). The program of a search engine that looks for documents on the Web is referred to as a "spider."

Why are some pages invisible?

Static Vs. Dynamic Pages - The spiders of search engines, ie., the programs that search the content of pages on the Web, cannot locate dynamically generated pages.
Metatags --- Metatags are coded categories nested in Web site programming (Bushko, p. 4).

The University of California Berkeley Library pages contain a teaching component dealing with the invisible web. The author of the tutorial mentions the "ambiguity inherent in the Invisible Web" due to a variety of factors. The sum of the topics regarding this ambiguity concern the fact that some information may be replicated in the "visible" portion of the Web and that search engines may have policy changes that affect their searching capabilities, such as the policy of excluding Web pages from searches that contain "?" in the URL.

Much of the invisible Web exists behind databases. Chris Sherman, one of the main researchers in the area of the Invisible Web along with Gary Price, describes the inability of search engine spiders to locate any of the information behind the databases as "they've run smack into the entrance of a massive library with securely bolted doors" (Sherman, 11/13/01) and the only thing that the spider can tell is the library's address and nothing about the content behind the library's doors.

Of course, many databases available by way of the Web are commercial products that may only be searched by authorized users or subscribers. Databases such as this include Standard & Poor's Net Advantage, Dow Jones Interactive, while some databases provide varied levels of access with free content and subscriber-restricted content.

Problems with Terminology

On the University of Albany Libraries Web site, Laura Cohen, Network Services Librarian, asserts that the use of the term "invisible" is not appropriate in speaking about these resources because they are not in fact "not visible." Yes, they may be invisible if one restricts searching to only a search engine, but they still exist. She embraces the term "deep Web" instead. She has a good point here --- for those of us who have worked in libraries using "library" resources side-by-side with "Web" resources, it has always been assumed that the information that one is searching for within an OPAC, a CD ROM database or an online database, is indeed separate from the information available on the Web. In fact, this has been why we pay so much in subscription and licensing fees to these commercial entities that provide us with the content within these databases.

BIBLIOGRAPHY

Bushko, David. The Metatag Edge: Hidden Tools for Cybermarketing Consulting to Management; Burlingame; Sep 2001; Vol 12, 3, p. 4-5.

Reding, Elizabeth E. 2001. Building an E-Business from the Ground Up. New York: McGraw Hill.

Web sites hidden from view; David Noack; Link - up, Medford; Sep/Oct 2001; Vol. 18, Iss. 5; pg. 26, 1 pgs

Some Invisible Web Business Related Databases
(None of which are listed in the Price and Sherman Book)

Better Business Bureau --- BBB Information System Search
This national company database lists companies that either are a member of the Better Business Bureau or have had unanswered complaints filed against them.

BPUBS.COM (Business Publications Search Engine)
A "content collection" of business publications that aims to focus upon a "collection development" approach of selecting quality business e-publications that are all free of charge.

FranchiseZone
A part of the Entrepeneur.com Web pages. Franchises are classified by type of business. Provides access to a database of franchise opportunities. Database is composed of the magazines "Franchise 500" which are companies ranked by the magazine - also other company franchises are listed in the database as well.

International Monetary Fund Publications Database
Database provides access to working papers, economic reviews and a host of other documents published by the International Monetary Fund. While some are full text, the majority appear to be citations that are then delivered full text for a fee.

Manufacturing.Net
Site provides access to four separate invisible web databases --- associations, magazines, trade shows and yellow pages. Providing few links to trade articles without registration, the site does provide access to content posted on the individual publications' Web sites. Registration is free however and provides access to archives of articles published by Cahners Business publications.

ZIPFIND
An excellent tool for marketing information, the free portion of this product allows the user to put in a zip code as a search query and then insert a radius parameter --- database produces a list of the zip codes within that radius. Also allows a search to calculate distances between different zip codes.

Other Invisible Web Resources

America's Job Bank
Sponsored by the Dept. of Labor. A Vast database of hundreds of thousands of jobs throughout the United States.

Library of Congress Country Studies

Occupational Outlook Handbook
Bureau of Labor Statistics searchable directory of occupations.

U.S. Census Quick Stats
Demographic and business statistic regarding the US.


The Invisible Web -- A Select Bibliography

BOOK

Sherman, Chris and Price, Gary. The Invisible Web: Uncovering Information Sources Search Engines Can't See. (Medford, NJ: Information Today, 2001).

ARTICLES

Dahn, Michael. "Spotlight on the Invisible Web," Online, Vol. 24 No. 4, pp. 57-62.

Diaz, Karen R. "The Invisible Web: Navigating the Web Outside Traditional Search Engines," Reference & User Services Quarterly, Vol. 40 No. 2, pp. 131-134.

Gibson, Paul. "Search the 'Invisible Web," Information World Review, May 2001, p. 20.

Green, David. "Search Insider," Information World Review, No. 146, April 1999, pp. 19-20.

Guernsey, Lisa. "Search Engine Hunts for Gold Beneath the Surface of the Web," The New York Times (Late Edition), February 8, 2001, News Watch Column, p. 3.

Harley, Bruce. "The United Nations on the Invisible Web," Online, Vol. 25 No. 4, July/August 2001, pp. 36-39.

"Indexing the Invisible Web," Information World Review, No. 153, December 1999, p. 1

Jasco, Peter. "Tools for Unearthing PDF Files," Information Today, Vol. 18 No. 5, pp. 48-49.

Lanza, Sheri R. "The End of an Era: Online World 2000," Searcher, Vol. 9 No. 1, January 2001, pp. 18-22.

Miller, Leslie. "Seeing Through the Invisible Web," USA Today, October 15, 2001, p. D.03.

O'Leary, Mick. "Invisible Web Discovers Hidden Treasures," Information Today, Vol. 17 No. 1, pp. 16-18.

Pedley, Paul. "The Invisible Web," Library Association Record, Vol. 102 No. 11, November 2000, pp. 628, 633.

Price, Gary and Sherman, Chris. "Exploring the Invisible Web," Online, Vol. 25 No. 4, July/August 2001, pp. 32-34.

Price, Gary and Sherman, Chris. "The Invisible Web," Searcher, Vol. 9 No. 6, June 2001, pp. 62-74.

Sherman, Chris. "Google Unveils More of the Invisible Web," Searchday, No. 128, Oct. 31, 2001.

Smith, C. Brian. "Getting to Know the Invisible Web," Library Journal, Summer 2001, pp. 16-18.

Sullivan, Danny. "Crawling Under the Hood," Online, Vol. 23 No. 3, May/June 1999, pp. 30-38.

"Top 25 Invisible Web Categories," Searcher, Vol. 9 No. 6, June 2001, pp. 68, 70+.

WEB SITES

Complete Planet

"Direct Search: Web Search Tools and Directories" (featuring Invisible Web Resources)

"IncyWincy: The Invisible Web Search Engine"

"Invisible Web"

"Invisible Web Gets Deeper"

"The Invisible Web Homepage" (For educators)

"InvisibleWeb.com"

"Invisible-web.net"

"The Invisible Web: Searchable Databases and Hidden Sites"

"The Invisible Web: What it is, Why it Exists, How to Find it, and Its Inherent Ambiguity"

"LLRX-Research Wire: Exposing the Invisible Web" (Law Library Resource Xchange)

Mardis, Marcia. "Uncovering the Hidden Web," ERIC Digest, October 2001. EDO-IR-2001-02.

"Specialty Search Enginess…Invisible Web"

"Those Dark Hiding Places: The Invisible Web Revealed"

Webdata.com

"ZDNet Story: I Can Search the 'Invisible Web.' Here's How You Can Too"

EXAMPLE BUSINESS WEB SITES

Ad Access (Images of Historical Advertisements)
This database includes over 7,000 images of advertisements from 1911 through 1955.

Amazon.com
Popular Web site for books, videos, and other materials.

American FactFinder
This database contains information gathered by the U.S. Census Bureau, including data on the population, housing, economics, and households.

Better Business Bureau - BBB Information Systems Search
This national company database lists companies that either are a member of the Better Business Bureau or have had unanswered complaints filed against them.

Bigcharts.com (Historical Stock Quotes)
Database of historical daily stock quotes from 1994 to the present.

BPUBS.COM (Business Publications Search Engine)
A "content collection" of business publications that aims to focus upon a "collection development" approach of selecting quality business e-publications that are all free of charge.

Career Builder
This career center contains a database where users can search for jobs by location, job title, and salary.

Competitive Intelligence Guide (Fuld & Company)
This Web site is a compilation of over 600 Internet resources on competitive intelligence.

Executive Compensation Search Engine
Free database that is searchable by company name, executive name, geographic location, or industry.

FranchiseZone
A part of the Entrepeneur.com Web pages. Franchises are classified by type of business. Provides access to a database of franchise opportunities. Database is composed of the magazines "Franchise 500" which are companies ranked by the magazine also other company franchises are listed in the database as well.

FreeEDGAR
Contains a searchable database of SEC filings.

InfoSpace
InfoSpace contains several searchable databases including the yellow and white pages, classified advertisements, newspapers, and reference sources.

International Monetary Fund Publications Database
Database provides access to working papers, economic reviews and a host of other documents published by the International Monetary Fund. While some are full text, the majority appear to be citations that are then delivered full text for a fee.

Kompass
Database of international companies, products, brand names, and executives.

Manufacturing.Net
Site provides access to four separate invisible web databases --- associations, magazines, trade shows and yellow pages. Providing few links to trade articles without registration, the site does provide access to content posted on the individual publications' Web sites. Registration is free however and provides access to archives of articles published by Cahners Business publications.

MapQuest
Popular map creation database.

Thomas Register
Online version of the reference standard. Registration is required.

Universal Currency Converter

U.S. Patent and Trademark Office
Contains databases to search patents and trademarks.

Yahoo!.com Historical Stock Quotes
Users can obtain daily stock prices for many stocks back to the early 1980s.

Yahoo!.com Stock Splits
Web site containing a calendar of stock splits.

ZIPFIND
An excellent tool for marketing information, the free portion of this product allows the user to put in a zip code as a search query and then insert a radius parameter --- database produces a list of the zip codes within that radius. Also allows a search to calculate distances between different zip codes.


This page was originally prepared by Gary White and Glenn S. McGuigan.


bulletUnearthing the Invisible Web: Business Resources Search Engines Don't Find
Business Reference Services Discussion Group, ALA Midwinter Conference, January 20, 2002

  • Presentation by White and McGuigan
Disclaimer : This publication has been placed on the web for the convenience of BRASS members. Information and links will not be updated. Posted 19 August 2002.