Information Technology Reference
In-Depth Information
Fig. 1.1
Number of websites
Fig. 1.2
Search engine
indexer, link analyzer, query processor, and ranker. The crawler collects webpages
and other documents from the Web, according to some prioritization strategies. The
parser analyzes these documents and generates index terms and a hyperlink graph
for them. The indexer takes the output of the parser and creates the indexes or data
structures that enable fast search of the documents. The link analyzer takes the Web
graph as input, and determines the importance of each page. This importance can
be used to prioritize the recrawling of a page, to determine the tiering, and to serve
as a feature for ranking. The query processor provides the interface between users
and search engines. The input queries are processed (e.g., removing stop words,
stemming, etc.) and transformed to index terms that are understandable by search
engines. The ranker, which is a central component, is responsible for the matching
between processed queries and indexed documents. The ranker can directly take
the queries and documents as inputs and compute a matching score using some
heuristic formulas, and can also extract some features for each query-document pair
and combine these features to produce the matching score.
Search WWH ::

Custom Search