Database Reference
In-Depth Information
(1) summarizing the data succinctly and approximately, or
(2) extracting the most prominent features of the data and ignoring the rest.
We shall explore these two approaches in the following sections.
1.1.4
Summarization
One of the most interesting forms of summarization is the PageRank idea, which made
Google successful and which we shall cover in Chapter 5 . In this form of Web mining, the
entire complex structure of the Web is summarized by a single number for each page. This
number, the “PageRank” of the page, is (oversimplifying somewhat) the probability that a
random walker on the graph would be at that page at any given time. The remarkable prop-
erty this ranking has is that it reflects very well the “importance” of the page - the degree
to which typical searchers would like that page returned as an answer to their search query.
Another important form of summary - clustering - will be covered in Chapter 7 . Here,
data is viewed as points in a multidimensional space. Points that are “close” in this space
are assigned to the same cluster. The clusters themselves are summarized, perhaps by giv-
ing the centroid of the cluster and the average distance from the centroid of points in the
cluster. These cluster summaries become the summary of the entire data set.
EXAMPLE 1.2 A famous instance of clustering to solve a problem took place long ago in
London, and it was done entirely without computers. 2 The physician John Snow, dealing
with a cholera outbreak plotted the cases on a map of the city. A small illustration suggest-
ing the process is shown in Fig. 1.1 .
Figure 1.1 Plotting cholera cases on a map of London
The cases clustered around some of the intersections of roads. These intersections were
the locations of wells that had become contaminated; people who lived nearest these wells
got sick, while people who lived nearer to wells that had not been contaminated did not.
Without the ability to cluster the data, Snow would not have discovered the cause of chol-
era.
Search WWH ::




Custom Search