Searching the Internet
 
 
Introduction
 
 

The Internet, particularly the World Wide Web, has experienced considerable growth during the 1990s. This expansion, fueled by a combination of recreational and job-related factors, has led to concerns of information overload. The web was comprised of an estimated 50 to 120 million pages by 1997, with no end in sight. Search engines had provided a relatively effective means of navigating this vast data domain. Even the most successful of these engines have proven unable, thus far, to circumnavigate the entire web. Search engines can structure queries in a user friendly fashion; however, they inevitably place demands on the cognitive capabilities of information seekers.
 
 

The Intranet Phenomenon
 
 

White collar professionals and support staff have become increasingly familiar with Local Area Networks (or LANs) which employ in-house technologies to link computers in close proximity, typically in the same building or in adjacent buildings on a campus, commercial site, etc. Wide Area Networks (or WANs) connect computers over a wide geographic area, linking LANs together by means of special hardware. TCP/IP protocols can run over both LANs and WANs; two or more such networks, when connect, in effect become part of the Internet (which might best be defined as a broad-based network of networks that is international in scope). The Intranet refers to a network established using TCP/IP that operates independently of the Internet. Such a network might be established, for example, at a corporate site to connect employees and provide services like e-mail and file transfer. This network could be connected to the broader Internet; however, for security reasons many companies have decided to exploit the capabilities of TCP/IP while maintaining proprietary control over their operations.
 
 

Prognosis for Future Internet Use
 
 

The success of the Internet within the United States (where a substantial portion of the network's users are located) has been due in so small part to a regulatory anomaly: since most residential telephone customers do not pay for local calls, they can connect to Internet service providers for free. These providers find it cost effective to offer flat rate pricing, often at a rate of less than twenty dollars per month. Users generally have no incentive to limit time online. Telephone companies have been perhaps hardest hit by the Internet boom. Their customers are greeted with busy signals when targeted lines are dedicated to web surfing, and they lose long-distance traffic as e-mail messages replace faxes. Perhaps ironically, the potential for profits is so great that telephone companies are poised to take a leadership role in the future provision of information highway services.
 
 

The Rise of Browsers
 
 

While the Internet as we know it today has necessitated a revolution in the configuration of personal computers--most notably, the presence of increasingly more efficient modems and greater processing power--the key to greater web awareness has been the growth of browser software. The original browser was called Mosaic. Born in a government-supported research environment, Mosaic inspired many offspring, including the present-day leaders, Netscape Navigator and Microsoft Explorer. Locked in a battle for market dominance, both issue new versions laden with additional features every few months or so and can be down- loaded from the Net.
 
 

  1. Recent Developments

  2.  

     
     

    Recent browser installments have expanded beyond merely navigating static web pages to functioning as the centerpiece in a suite of Internet client applications that offer increasingly rich tools for communicating, sharing information, and experiencing interactive content. Until either Microsoft or Netscape emerges with a clear edge in overall performance, combined with their ready availably at virtually no cost, many users are likely to keep both browsers on their systems.

  3. Intelligent Browsing

Intelligent agents have been developed in recent years to directly facilitate browsing by ascertaining the user's interests and providing a guided tour of the Internet. Two of the more widely known research prototypes include Web Watcher and Letizia, both developed at Carnegie Mellon Universiaty. Web Watcher is a server-based interface agent that resides between the user and the Web. A user running any browser can enter the system simply by typing a topic of interest in Web Watcher's FrontDoor page. Web Watcher accepts the request, replaces the current page with a modified page that superimposes the system's command menus, thereby enabling Web Watcher to follow the user as he/she browses. It ultimately provides the user with a highlighted list of recommended hyperlinks. Because Web Watcher is a server-based system, it can log data from thousands of users to "train" itself (i.e., refine its search knowledge). If a user signals that a particular search was successful, Web Watcher uses information retrieval techniques based on the frequency of weighted terms and hyperlinks on a page, as well as user statistics associated with those links. Web Watcher can implement one of four learning methods:

  popularity - frequency of previous link traversal;

  annotation - relevance based on previous user interest;

  natch - metric analysis of underlined anchor text; and

  q-learning - reinforcement learning that determins the value of downstream pages.
 
 

Search Engines
 
 

In general,
 
 

search engines function to assist users in locating resources. Internet search engines tend to employ minimal domain knowledge and a very general user model. On the other hand, Intranet search engines can better anticipate user needs and effectively delimit connotations of key words possessing multiple meanings. In such an environment the cost of false hits or missed documents tends to be higher. A sophisticated blend of powerful computer networks and specialized software, they are usually developed by either large corporations or universities. They are freely available to anyone with Internet access. In addition to reflecting an institution's public relations perspective, search engines can represent good business. Because their usefulness can attract many users, they can serve such practical functions as showcasing a host institution's hardware/software expertise or generate revenue through the advertising of another organization's products.
 
 

To use a search engine, you must enter a string of key words and/or topics in accordance with the required query structure, click on the search/find icon, and wait for feedback. With more than 150 major search engines available, selecting the best one for your needs can represent a complex undertaking. It can be safely assumed that no single engine will work best for a given user in all instances.
 
 

A. Major Search Engines
 
 

Alta Vista. Owned and operated by Digital Equipment Corp., it was performing more than 475,000 searches daily as of January 1997. It indexes 31+ million web pages as well as almost every current

post to Usenet newsgroups. Its search capabilities make possible filtering a query by the presence or absence of key words, date ranges, domain, or newsgroup. You can also search for embedded images by using "image:caption" or determine linkages to a particular page by deploying "link:address." URL: http://altavista.digital.com/
 
 

Deja News. An online archive of all Usenet groups dating back to March 1995, it plans to eventually save most Usenet posts back to 1979. Like most search engines, it permits query filtering by date, Usenet group, author, subject, or the appearance of keywords. In addition, it provides an author profile with statistics covering how many posts were made--and to which groups--from a particular e-mail address.
 
 

Excite. Claiming to be the world's most comprehensive and flexible search engine, it is unique in permitting queries in everyday English. To optimize search efficiency, it features intelligent concept extraction as opposed to simple text matching. URL: http://www.excite.com/
 
 

Yahoo. Focusing on key word and topic searching, it hierarchically organizes matches by category and includes concise summaries. Although generally returning far fewer matches per query than other engines, these matches are usually of very high relevance. If they do not suffice, a single mouse click provides a link to Alta Vista's matches for the same query. URL: http://search.yahoo.com/
 
 

InfoSeek. One of the most popular search services due to its presence on Netscape's default home pages as well as a design geared to enabling inexperienced searchers to achieve high-precision searches. The latter is accomplished by reverting to a case-sensitive phrase search, without truncation. URL: http://guide.infoseek.com/
 
 

Lycos. As of 1996, it covered an estimated 92% of the Web (with plans to eventually index 100%). However, the engine doesn't index the full text of most documents; therefore, it may not retrieve those resources that mention your search topics only peripherally. URL: http://www.lycos.com/
 
 

Electric Library. Retrieves the full text of articles in more than 1000 magazines, articles, books, TV transcripts, newspapers, and press releases. The material tends to be academic in treatment. URL: http://www.elibrary.com/
 
 

Inktomi. Possesses a relatively large database and full-text indexing, which translates into hgh-recall retrieval lists. However, the service is handicapped to an extent by the simplicity of the search engine and the paucity of the retrieval list. URL: http://inktomi.berkeley.edu/
 
 

Magellan. Like InfoSeek Guide and Excite, it melds a Web search engine and a subject tree of reviewed documents. Effective at finding a few really good sites on a general topic. URL: http://www.mckinley.com/
 
 

PathFinder. Covers most Time-Warner magazines. Good for current events, market research, etc. URL: http://www.pathfinder.com/
 

Hotbot.  A fairly recent entry.  Ranks hits according to popularity of sites.  URL: <http://www.hotbot.com>
 
 

  1. Unified Search Engines

A unified search engine represents a web page bringing together a number of search services. Particularly useful if wishing to search more than one search engine per search session.
 
 

All-in-One Search List. URL: http://www.albany.net/allinone
 
 

E-Z Find. URL: http:/www.theriver.com
 
 

Find-It! URL: http://www.itools.com/find-it/find-it.html
 
 

Internet Sleuth. Far more comprehensive than Find-It! URL: http://www.intbc.com/sleuth/
 
 

SavvySearch. Makes possible searching all the Web's search engines with just on query. Often busy; poor recall. URL: http://www.cs.colostate.edu/~dreiling/smartform.html
 
 

Reviews of Web Search Services
 
 

Current: Search Me! URL: http://watt.seas.virginia.edu/~bp/searchme/welcome.html
 
 

Retrospective: Chris Tweney's "Internet Search Tools" URL: http://www.zdnet.com/zdi/tblazer/iserch.html
 
 

[Freely integrates material from an unpublished research paper by Skafkat Ahmed Chowdbury and Bryan Pfaffenberger's World Wide Web Bible. 2nd ed., 1996.]