The Internet, particularly the World Wide Web, has experienced considerable
growth during the 1990s. This expansion, fueled by a combination of recreational
and job-related factors, has led to concerns of information overload. The
web was comprised of an estimated 50 to 120 million pages by 1997, with
no end in sight. Search engines had provided a relatively effective means
of navigating this vast data domain. Even the most successful of these
engines have proven unable, thus far, to circumnavigate the entire web.
Search engines can structure queries in a user friendly fashion; however,
they inevitably place demands on the cognitive capabilities of information
seekers.
The Intranet Phenomenon
White collar professionals and support staff have become increasingly
familiar with Local Area Networks (or LANs) which employ in-house technologies
to link computers in close proximity, typically in the same building or
in adjacent buildings on a campus, commercial site, etc. Wide Area Networks
(or WANs) connect computers over a wide geographic area, linking LANs together
by means of special hardware. TCP/IP protocols can run over both LANs and
WANs; two or more such networks, when connect, in effect become part of
the Internet (which might best be defined as a broad-based network of networks
that is international in scope). The Intranet refers to a network established
using TCP/IP that operates independently of the Internet. Such a network
might be established, for example, at a corporate site to connect employees
and provide services like e-mail and file transfer. This network could
be connected to the broader Internet; however, for security reasons many
companies have decided to exploit the capabilities of TCP/IP while maintaining
proprietary control over their operations.
Prognosis for Future Internet Use
The success of the Internet within the United States (where a substantial
portion of the network's users are located) has been due in so small part
to a regulatory anomaly: since most residential telephone customers do
not pay for local calls, they can connect to Internet service providers
for free. These providers find it cost effective to offer flat rate pricing,
often at a rate of less than twenty dollars per month. Users generally
have no incentive to limit time online. Telephone companies have been perhaps
hardest hit by the Internet boom. Their customers are greeted with busy
signals when targeted lines are dedicated to web surfing, and they lose
long-distance traffic as e-mail messages replace faxes. Perhaps ironically,
the potential for profits is so great that telephone companies are poised
to take a leadership role in the future provision of information highway
services.
The Rise of Browsers
While the Internet as we know it today has necessitated a revolution
in the configuration of personal computers--most notably, the presence
of increasingly more efficient modems and greater processing power--the
key to greater web awareness has been the growth of browser software. The
original browser was called Mosaic. Born in a government-supported research
environment, Mosaic inspired many offspring, including the present-day
leaders, Netscape Navigator and Microsoft Explorer. Locked in a battle
for market dominance, both issue new versions laden with additional features
every few months or so and can be down- loaded from the Net.
Recent browser installments have expanded beyond merely navigating static web pages to functioning as the centerpiece in a suite of Internet client applications that offer increasingly rich tools for communicating, sharing information, and experiencing interactive content. Until either Microsoft or Netscape emerges with a clear edge in overall performance, combined with their ready availably at virtually no cost, many users are likely to keep both browsers on their systems.
Intelligent agents have been developed in recent years to directly facilitate browsing by ascertaining the user's interests and providing a guided tour of the Internet. Two of the more widely known research prototypes include Web Watcher and Letizia, both developed at Carnegie Mellon Universiaty. Web Watcher is a server-based interface agent that resides between the user and the Web. A user running any browser can enter the system simply by typing a topic of interest in Web Watcher's FrontDoor page. Web Watcher accepts the request, replaces the current page with a modified page that superimposes the system's command menus, thereby enabling Web Watcher to follow the user as he/she browses. It ultimately provides the user with a highlighted list of recommended hyperlinks. Because Web Watcher is a server-based system, it can log data from thousands of users to "train" itself (i.e., refine its search knowledge). If a user signals that a particular search was successful, Web Watcher uses information retrieval techniques based on the frequency of weighted terms and hyperlinks on a page, as well as user statistics associated with those links. Web Watcher can implement one of four learning methods:
annotation - relevance
based on previous user interest;
natch - metric analysis
of underlined anchor text; and
q-learning - reinforcement
learning that determins the value of downstream pages.
In general,
search engines function to assist users in locating resources. Internet
search engines tend to employ minimal domain knowledge and a very general
user model. On the other hand, Intranet search engines can better anticipate
user needs and effectively delimit connotations of key words possessing
multiple meanings. In such an environment the cost of false hits or missed
documents tends to be higher. A sophisticated blend of powerful computer
networks and specialized software, they are usually developed by either
large corporations or universities. They are freely available to anyone
with Internet access. In addition to reflecting an institution's public
relations perspective, search engines can represent good business. Because
their usefulness can attract many users, they can serve such practical
functions as showcasing a host institution's hardware/software expertise
or generate revenue through the advertising of another organization's products.
To use a search engine, you must enter a string of key words and/or
topics in accordance with the required query structure, click on the search/find
icon, and wait for feedback. With more than 150 major search engines available,
selecting the best one for your needs can represent a complex undertaking.
It can be safely assumed that no single engine will work best for a given
user in all instances.
A. Major Search Engines
Alta Vista. Owned and operated by Digital Equipment Corp., it was performing more than 475,000 searches daily as of January 1997. It indexes 31+ million web pages as well as almost every current
post to Usenet newsgroups. Its search capabilities make possible filtering
a query by the presence or absence of key words, date ranges, domain, or
newsgroup. You can also search for embedded images by using "image:caption"
or determine linkages to a particular page by deploying "link:address."
URL: http://altavista.digital.com/
Deja News. An online archive of all Usenet groups dating back
to March 1995, it plans to eventually save most Usenet posts back to 1979.
Like most search engines, it permits query filtering by date, Usenet group,
author, subject, or the appearance of keywords. In addition, it provides
an author profile with statistics covering how many posts were made--and
to which groups--from a particular e-mail address.
Excite. Claiming to be the world's most comprehensive and flexible
search engine, it is unique in permitting queries in everyday English.
To optimize search efficiency, it features intelligent concept extraction
as opposed to simple text matching. URL: http://www.excite.com/
Yahoo. Focusing on key word and topic searching, it hierarchically
organizes matches by category and includes concise summaries. Although
generally returning far fewer matches per query than other engines, these
matches are usually of very high relevance. If they do not suffice, a single
mouse click provides a link to Alta Vista's matches for the same query.
URL: http://search.yahoo.com/
InfoSeek. One of the most popular search services due to its
presence on Netscape's default home pages as well as a design geared to
enabling inexperienced searchers to achieve high-precision searches. The
latter is accomplished by reverting to a case-sensitive phrase search,
without truncation. URL: http://guide.infoseek.com/
Lycos. As of 1996, it covered an estimated 92% of the Web (with
plans to eventually index 100%). However, the engine doesn't index the
full text of most documents; therefore, it may not retrieve those resources
that mention your search topics only peripherally. URL: http://www.lycos.com/
Electric Library. Retrieves the full text of articles in more
than 1000 magazines, articles, books, TV transcripts, newspapers, and press
releases. The material tends to be academic in treatment. URL: http://www.elibrary.com/
Inktomi. Possesses a relatively large database and full-text
indexing, which translates into hgh-recall retrieval lists. However, the
service is handicapped to an extent by the simplicity of the search engine
and the paucity of the retrieval list. URL: http://inktomi.berkeley.edu/
Magellan. Like InfoSeek Guide and Excite, it melds a Web search
engine and a subject tree of reviewed documents. Effective at finding a
few really good sites on a general topic. URL: http://www.mckinley.com/
PathFinder. Covers most Time-Warner magazines. Good for current
events, market research, etc. URL: http://www.pathfinder.com/
Hotbot. A fairly recent entry. Ranks hits according
to popularity of sites. URL: <http://www.hotbot.com>
A unified search engine represents a web page bringing together a number
of search services. Particularly useful if wishing to search more than
one search engine per search session.
All-in-One Search List. URL: http://www.albany.net/allinone
E-Z Find. URL: http:/www.theriver.com
Find-It! URL: http://www.itools.com/find-it/find-it.html
Internet Sleuth. Far more comprehensive than Find-It! URL: http://www.intbc.com/sleuth/
SavvySearch. Makes possible searching all the Web's search engines
with just on query. Often busy; poor recall. URL: http://www.cs.colostate.edu/~dreiling/smartform.html
Reviews of Web Search Services
Current: Search Me! URL: http://watt.seas.virginia.edu/~bp/searchme/welcome.html
Retrospective: Chris Tweney's "Internet Search Tools" URL: http://www.zdnet.com/zdi/tblazer/iserch.html
[Freely integrates material from an unpublished research paper by Skafkat Ahmed Chowdbury and Bryan Pfaffenberger's World Wide Web Bible. 2nd ed., 1996.]