Databases Reference
In-Depth Information
Example 5.13 : A typical department at a university maintains a Web page
listing all the courses offered by the department, with links to a page for each
course, telling about the course - the instructor, the text, an outline of the
course content, and so on. If you want to know about a certain course, you
need the page for that course; the departmental course list will not do. The
course page is an authority for that course. However, if you want to find out
what courses the department is offering, it is not helpful to search for each
courses' page; you need the page with the course list first. This page is a hub
for information about courses.
2
Just as PageRank uses the recursive definition of importance that “a page
is important if important pages link to it,” HITS uses a mutually recursive
definition of two concepts: “a page is a good hub if it links to good authorities,
and a page is a good authority if it is linked to by good hubs.”
5.5.2
Formalizing Hubbiness and Authority
To formalize the above intuition, we shall assign two scores to each Web page.
One score represents the hubbiness of a page - that is, the degree to which it
is a good hub, and the second score represents the degree to which the page
is a good authority. Assuming that pages are enumerated, we represent these
scores by vectors h and a. The ith component of h gives the hubbiness of the
ith page, and the ith component of a gives the authority of the same page.
While importance is divided among the successors of a page, as expressed by
the transition matrix of the Web, the normal way to describe the computation
of hubbiness and authority is to add the authority of successors to estimate
hubbiness and to add hubbiness of predecessors to estimate authority. If that
is all we did, then the hubbiness and authority values would typically grow
beyond bounds. Thus, we normally scale the values of the vectors h and a so
that the largest component is 1. An alternative is to scale so that the sum of
components is 1.
To describe the iterative computation of h and a formally, we use the link
matrix of the Web, L. If we have n pages, then L is an n×n matrix, and L ij = 1
if there is a link from page i to page j, and L ij = 0 if not. We shall also have
need for L T , the transpose of L. That is, L ij = 1 if there is a link from page
j to page i, and L ij = 0 otherwise. Notice that L T is similar to the matrix M
that we used for PageRank, but where L T has 1, M has a fraction - 1 divided
by the number of out-links from the page represented by that column.
Example 5.14 : For a running example, we shall use the Web of Fig. 5.4,
which we reproduce here as Fig. 5.18. An important observation is that dead
ends or spider traps do not prevent the HITS iteration from converging to a
meaningful pair of vectors. Thus, we can work with Fig. 5.18 directly, with
no “taxation” or alteration of the graph needed.
The link matrix L and its
transpose are shown in Fig. 5.19.
2
Search WWH ::




Custom Search