Databases Reference
In-Depth Information
5.2.6 Exercises for Section 5.2
Exercise 5.2.1 : Suppose we wish to store an n×n boolean matrix (0 and
1 elements only). We could represent it by the bits themselves, or we could
represent the matrix by listing the positions of the 1's as pairs of integers, each
integer requiring⌈log 2 n⌉bits. The former is suitable for dense matrices; the
latter is suitable for sparse matrices. How sparse must the matrix be (i.e., what
fraction of the elements should be 1's) for the sparse representation to save
space?
Exercise 5.2.2 : Using the method of Section 5.2.1, represent the transition
matrices of the following graphs:
(a) Figure 5.4.
(b) Figure 5.7.
Exercise 5.2.3 : Using the method of Section 5.2.4, represent the transition
matrices of the graph of Fig. 5.3, assuming blocks have side 2.
Exercise 5.2.4 : Consider a Web graph that is a chain, like Fig. 5.9, with
n nodes. As a function of k, which you may assume divides n, describe the
representation of the transition matrix for this graph, using the method of
Section 5.2.4
5.3
Topic-Sensitive PageRank
There are several improvements we can make to PageRank. One, to be studied
in this section, is that we can weight certain pages more heavily because of their
topic. The mechanism for enforcing this weighting is to alter the way random
surfers behave, having them prefer to land on a page that is known to cover the
chosen topic. In the next section, we shall see how the topic-sensitive idea can
also be applied to negate the effects of a new kind of spam, called “'link spam,”
that has developed to try to fool the PageRank algorithm.
5.3.1 Motivation for Topic-Sensitive Page Rank
Different people have different interests, and sometimes distinct interests are
expressed using the same term in a query. The canonical example is the search
query jaguar , which might refer to the animal, the automobile, a version of the
MAC operating system, or even an ancient game console. If a search engine
can deduce that the user is interested in automobiles, for example, then it can
do a better job of returning relevant pages to the user.
Ideally, each user would have a private PageRank vector that gives the
importance of each page to that user. It is not feasible to store a vector of
length many billions for each of a billion users, so we need to do something
Search WWH ::




Custom Search