Information Technology Reference
In-Depth Information
Other sources from Google and elsewhere indicate that the
search engine's database of pages is stored on over 10,000 PC
servers that are running the Linux operating system.
Why store so much information?
The quick answer as to why search engines store millions of
Web pages is speed. If you have had experience loading Web pages
into your browser, you probably have found that some pages take a
long time to load. Sometimes the pages are very large, and it takes
time to transmit all that material. Sometimes servers or communica
tions are heavily loaded with traffic, and the material must wait its
turn for transmission. Sometimes transmission lines to a server may
operate only at slow speeds. Search engines would have similar
troubles in scanning Web pages if all those pages had to be loaded
from their sources for each search. And, if loading each page takes a
long time, then a search engine would take at least that long to ob
tain the pages and scan them in response to your query. Storing
pages within a search engine's database eliminates the need to con
sult the actual sites in trying to respond to your query. In addition,
as you will see shortly, this advanced downloading of pages also al
lows the preparation of various indexes to help facilitate the search
process.
How are the pages obtained for a search engine?
Historically, search engine companies have used two basic ap
proaches for obtaining and indexing Web pages. The first involves
reviews by humans, whereas the second uses automated programs.
When people maintain the databases for a search engine, the com
pany employs a team of folks to locate Web pages and organize
those materials in its database. This basic approach was employed
for the Yahoo! search engine, from its beginnings until October
2002. Ask Jeeves also used this technique when it started, at one
time employing about 100 editors.
The second approach to organizing Web pages uses computer
programs to methodically surf the Web. These programs, called
Web crawlers , may start on one page and then systematically follow
all its Web links to identify additional pages. Each new page gives
rise to another collection of links and new pages. Repeating this
 
Search WWH ::




Custom Search