Database Reference
In-Depth Information
EXERCISE 5.2.3 Using the method of Section 5.2.4 , represent the transition matrices of the
graph of Fig. 5.3 , assuming blocks have side 2.
EXERCISE 5.2.4 Consider a Web graph that is a chain, like Fig. 5.9 , with n nodes. As a
function of k , which you may assume divides n , describe the representation of the transition
matrix for this graph, using the method of Section 5.2.4
5.3 Topic-Sensitive PageRank
There are several improvements we can make to PageRank. One, to be studied in this sec-
tion, is that we can weight certain pages more heavily because of their topic. The mech-
anism for enforcing this weighting is to alter the way random surfers behave, having them
prefer to land on a page that is known to cover the chosen topic. In the next section, we
shall see how the topic-sensitive idea can also be applied to negate the effects of a new kind
of spam, called “'link spam,” that has developed to try to fool the PageRank algorithm.
5.3.1
Motivation for Topic-Sensitive Page Rank
Different people have different interests, and sometimes distinct interests are expressed us-
ing the same term in a query. The canonical example is the search query jaguar , which
might refer to the animal, the automobile, a version of the MAC operating system, or even
an ancient game console. If a search engine can deduce that the user is interested in auto-
mobiles, for example, then it can do a better job of returning relevant pages to the user.
Ideally, each user would have a private PageRank vector that gives the importance of
each page to that user. It is not feasible to store a vector of length many billions for each
of a billion users, so we need to do something simpler. The topic-sensitive PageRank ap-
proach creates one vector for each of some small number of topics, biasing the PageRank
to favor pages of that topic. We then endeavour to classify users according to the degree of
their interest in each of the selected topics. While we surely lose some accuracy, the benefit
is that we store only a short vector for each user, rather than an enormous vector for each
user.
EXAMPLE 5.9 One useful topic set is the 16 top-level categories (sports, medicine, etc.) of
the Open Directory (DMOZ). 6 We could create 16 PageRank vectors, one for each topic. If
we could determine that the user is interested in one of these topics, perhaps by the content
of the pages they have recently viewed, then we could use the PageRank vector for that
topic when deciding on the ranking of pages.
Search WWH ::




Custom Search