Database Reference
In-Depth Information
5.5 Hubs and Authorities
An idea called “hubs and authorities' was proposed shortly after PageRank was first imple-
mented. The algorithm for computing hubs and authorities bears some resemblance to the
computation of PageRank, since it also deals with the iterative computation of a fixedpoint
involving repeated matrix-vector multiplication. However, there are also significant differ-
ences between the two ideas, and neither can substitute for the other.
This hubs-and-authorities algorithm, sometimes called HITS ( hyperlink-induced topic
search ), was originally intended not as a preprocessing step before handling search queries,
as PageRank is, but as a step to be done along with the processing of a search query, to rank
only the responses to that query. We shall, however, describe it as a technique for analyzing
the entire Web, or the portion crawled by a search engine. There is reason to believe that
something like this approach is, in fact, used by the Ask search engine.
5.5.1
The Intuition Behind HITS
While PageRank assumes a one-dimensional notion of importance for pages, HITS views
important pages as having two flavors of importance.
(1) Certain pages are valuable because they provide information about a topic. These
pages are called authorities .
(2) Other pages are valuable not because they provide information about any topic, but
because they tell you where to go to find out about that topic. These pages are called
hubs .
EXAMPLE 5.13 A typical department at a university maintains a Web page listing all the
courses offered by the department, with links to a page for each course, telling about the
course - the instructor, the text, an outline of the course content, and so on. If you want
to know about a certain course, you need the page for that course; the departmental course
list will not do. The course page is an authority for that course. However, if you want to
find out what courses the department is offering, it is not helpful to search for each courses'
page; you need the page with the course list first. This page is a hub for information about
courses.
Just as PageRank uses the recursive definition of importance that “a page is important
if important pages link to it,” HITS uses a mutually recursive definition of two concepts:
“a page is a good hub if it links to good authorities, and a page is a good authority if it is
linked to by good hubs.”
Search WWH ::




Custom Search