Databases Reference
In-Depth Information
F
Dead Ends: A dead end is a Web page with no links out. The presence of
dead ends will cause the PageRank of some or all of the pages to go to 0
in the iterative computation, including pages that are not dead ends. We
can eliminate all dead ends before undertaking a PageRank calculation
by recursively dropping nodes with no arcs out. Note that dropping one
node can cause another, which linked only to it, to become a dead end,
so the process must be recursive.
F
Spider Traps: A spider trap is a set of nodes that, while they may link to
each other, have no links out to other nodes. In an iterative calculation
of PageRank, the presence of spider traps cause all the PageRank to be
captured within that set of nodes.
F
Taxation Schemes: To counter the effect of spider traps (and of dead ends,
if we do not eliminate them), PageRank is normally computed in a way
that modifies the simple iterative multiplication by the transition matrix.
A parameter β is chosen, typically around 0.85. Given an estimate of the
PageRank, the next estimate is computed by multiplying the estimate by
β times the transition matrix, and then adding (1−β)/n to the estimate
for each page, where n is the total number of pages.
F
Taxation and Random Surfers: The calculation of PageRank using taxa-
tion parameter β can be thought of as giving each random surfer a prob-
ability 1−β of leaving the Web, and introducing an equivalent number
of surfers randomly throughout the Web.
F
E cient Representation of Transition Matrices: Since a transition matrix
is very sparse (almost all entries are 0), it saves both time and space
to represent it by listing its nonzero entries. However, in addition to
being sparse, the nonzero entries have a special property: they are all the
same in any given column; the value of each nonzero entry is the inverse
of the number of nonzero entries in that column. Thus, the preferred
representation is column-by-column, where the representation of a column
is the number of nonzero entries, followed by a list of the rows where those
entries occur.
F
Very Large-Scale Matrix-Vector Multiplication: For Web-sized graphs, it
may not be feasible to store the entire PageRank estimate vector in the
main memory of one machine. Thus, we can break the vector into k
segments and break the transition matrix into k 2 squares, called blocks,
assigning each square to one machine. The vector segments are each sent
to k machines, so there is a small additional cost in replicating the vector.
F
Representing Blocks of a Transition Matrix : When we divide a transition
matrix into square blocks, the columns are divided into k segments. To
represent a segment of a column, nothing is needed if there are no nonzero
entries in that segment. However, if there are one or more nonzero entries,
Search WWH ::




Custom Search