How do Web applications work? - The Tao of Computing

Information Technology Reference

In-Depth Information

Other sources from Google and elsewhere indicate that the

search engine's database of pages is stored on over 10,000 PC

servers that are running the Linux operating system.

The quick answer as to why search engines store millions of

Web pages is speed. If you have had experience loading Web pages

into your browser, you probably have found that some pages take a

long time to load. Sometimes the pages are very large, and it takes

time to transmit all that material. Sometimes servers or communica

tions are heavily loaded with traffic, and the material must wait its

turn for transmission. Sometimes transmission lines to a server may

operate only at slow speeds. Search engines would have similar

troubles in scanning Web pages if all those pages had to be loaded

from their sources for each search. And, if loading each page takes a

long time, then a search engine would take at least that long to ob

tain the pages and scan them in response to your query. Storing

pages within a search engine's database eliminates the need to con

sult the actual sites in trying to respond to your query. In addition,

as you will see shortly, this advanced downloading of pages also al

lows the preparation of various indexes to help facilitate the search

process.

Historically, search engine companies have used two basic ap

proaches for obtaining and indexing Web pages. The first involves

reviews by humans, whereas the second uses automated programs.

When people maintain the databases for a search engine, the com

pany employs a team of folks to locate Web pages and organize

those materials in its database. This basic approach was employed

for the Yahoo! search engine, from its beginnings until October

2002. Ask Jeeves also used this technique when it started, at one

time employing about 100 editors.

The second approach to organizing Web pages uses computer

programs to methodically surf the Web. These programs, called

Web crawlers , may start on one page and then systematically follow

all its Web links to identify additional pages. Each new page gives

rise to another collection of links and new pages. Repeating this

Search WWH ::

Custom Search

Home