How do Web applications work? - The Tao of Computing

Information Technology Reference

In-Depth Information

process over time allows the computer program to cover a vast

number of Web pages, and each can be stored in a search engine's

database. Many search engines now use Web crawlers to locate

pages, including Google, Ask Jeeves (which runs the Teoma search

engine), All The Web (also used by Lycos), and Inktomi (purchased

by Yahoo! in March 2003).

Although Web crawlers' process of finding links works well,

one complication arises when pages reference each other. For exam

ple, suppose my page (A) has a link to your page (B), and your page

also has a link back to mine. If a Web crawler were not careful, it

might start at my page (A), then go to yours (B), then go back to

mine (A), then back to yours (B), and so on. Such a pattern is an ex

ample of an infinite loop and could allow processing to continue

forever. To prevent being caught in such fruitless loops, Web

crawlers must keep track of what pages they have already visited. If

links would cause them to revisit a page, then that link may be ig

nored as not providing any new information.

Although Webcrawler technology predominates in the develop

ment of databases for search engines, a combined approach is still

used by a few companies, such as Yahoo! and MSN Search. These

companies employ human editors to help guide common search

queries, but the search engines utilize Webcrawler technology for

relatively uncommon queries.

Whatever technology is used to locate and store Web pages,

note that the pages in a search engine's database reflect only mater

ial from the time the pages were stored. If pages change, those

changes are not immediately reflected in a search engine's database,

and the results of your search may show outofdate information.

You cannot assume the references you get from a search apply to

current pages; instead, the search results are based on past pages.

For this reason, search engines not only have to frequently down

load Web pages for their databases, but also must regularly update

those pages to reflect any revisions made to them.

search requests?

Although a search engine's database resolves time delays of ac

cessing Web pages, storage alone is not adequate to provide a

quick response to your queries. Scanning billions of documents—

Search WWH ::

Custom Search

Home