Information Technology Reference
In-Depth Information
process over time allows the computer program to cover a vast
number of Web pages, and each can be stored in a search engine's
database. Many search engines now use Web crawlers to locate
pages, including Google, Ask Jeeves (which runs the Teoma search
engine), All The Web (also used by Lycos), and Inktomi (purchased
by Yahoo! in March 2003).
Although Web crawlers' process of finding links works well,
one complication arises when pages reference each other. For exam
ple, suppose my page (A) has a link to your page (B), and your page
also has a link back to mine. If a Web crawler were not careful, it
might start at my page (A), then go to yours (B), then go back to
mine (A), then back to yours (B), and so on. Such a pattern is an ex
ample of an infinite loop and could allow processing to continue
forever. To prevent being caught in such fruitless loops, Web
crawlers must keep track of what pages they have already visited. If
links would cause them to revisit a page, then that link may be ig
nored as not providing any new information.
Although Webcrawler technology predominates in the develop
ment of databases for search engines, a combined approach is still
used by a few companies, such as Yahoo! and MSN Search. These
companies employ human editors to help guide common search
queries, but the search engines utilize Webcrawler technology for
relatively uncommon queries.
Whatever technology is used to locate and store Web pages,
note that the pages in a search engine's database reflect only mater
ial from the time the pages were stored. If pages change, those
changes are not immediately reflected in a search engine's database,
and the results of your search may show outofdate information.
You cannot assume the references you get from a search apply to
current pages; instead, the search results are based on past pages.
For this reason, search engines not only have to frequently down
load Web pages for their databases, but also must regularly update
those pages to reflect any revisions made to them.
How do searches locate information for my specific
search requests?
Although a search engine's database resolves time delays of ac
cessing Web pages, storage alone is not adequate to provide a
quick response to your queries. Scanning billions of documents—
 
Search WWH ::




Custom Search