Databases Reference
In-Depth Information
! Exercise 5.5.2 : Suppose our graph is a chain of n nodes, as was suggested by
Fig. 5.9. Compute the hubs and authorities vectors, as a function of n.
5.6
Summary of Chapter 5
F
Term Spam: Early search engines were unable to deliver relevant results
because they were vulnerable to term spam - the introduction into Web
pages of words that misrepresented what the page was about.
F
The Google Solution to Term Spam: Google was able to counteract term
spam by two techniques. First was the PageRank algorithm for deter-
mining the relative importance of pages on the Web. The second was a
strategy of believing what other pages said about a given page, in or near
their links to that page, rather than believing only what the page said
about itself.
F
PageRank : PageRank is an algorithm that assigns a real number, called
its PageRank, to each page on the Web. The PageRank of a page is a
measure of how important the page is, or how likely it is to be a good
response to a search query. In its simplest form, PageRank is a solution
to the recursive equation “a page is important if important pages link to
it.”
F
Transition Matrix of the Web: We represent links in the Web by a matrix
whose ith row and ith column represent the ith page of the Web. If there
are one or more links from page j to page i, then the entry in row i and
column j is 1/k, where k is the number of pages to which page j links.
Other entries of the transition matrix are 0.
F
Computing PageRank on Strongly Connected Web Graphs: For strongly
connected Web graphs (those where any node can reach any other node),
PageRank is the principal eigenvector of the transition matrix. We can
compute PageRank by starting with any nonzero vector and repeatedly
multiplying the current vector by the transition matrix, to get a better
estimate. 7 After about 50 iterations, the estimate will be very close to
the limit, which is the true PageRank.
F
The Random Surfer Model : Calculation of PageRank can be thought of
as simulating the behavior of many random surfers, who each start at a
random page and at any step move, at random, to one of the pages to
which their current page links. The limiting probability of a surfer being
at a given page is the PageRank of that page. The intuition is that people
tend to create links to the pages they think are useful, so random surfers
will tend to be at a useful page.
7 Technically, the condition for this method to work is more restricted than simply “strongly
connected.” However, the other necessary conditions will surely be met by any large strongly
connected component of the Web that was not artificially constructed.
Search WWH ::




Custom Search