Databases Reference
In-Depth Information
web page to another location. It can also mine the document structure within a page
(e.g., analyze the treelike structure of page structures to describe HTML or XML tag
usage). Both kinds of web structure mining help us understand web contents and may
also help transform web contents into relatively structured data sets.
Web usage mining is the process of extracting useful information (e.g., user click
streams) from server logs. It finds patterns related to general or particular groups of
users; understands users' search patterns, trends, and associations; and predicts what
users are looking for on the Internet. It helps improve search efficiency and effectiveness,
as well as promotes products or related information to different groups of users at the
right time. Web search companies routinely conduct web usage mining to improve their
quality of service.
Mining Data Streams
Stream data refer to data that flow into a system in vast volumes, change dynamically,
are possibly infinite, and contain multidimensional features. Such data cannot be stored
in traditional database systems. Moreover, most systems may only be able to read the
stream once in sequential order. This poses great challenges for the effective mining
of stream data. Substantial research has led to progress in the development of effi-
cient methods for mining data streams, in the areas of mining frequent and sequential
patterns, multidimensional analysis (e.g., the construction of stream cubes), classifica-
tion, clustering, outlier analysis, and the online detection of rare events in data streams.
The general philosophy is to develop single-scan or a-few-scan algorithms using limited
computing and storage capabilities.
This includes collecting information about stream data in sliding windows or tilted
time windows (where the most recent data are registered at the finest granularity and
the more distant data are registered at a coarser granularity), and exploring techniques
like microclustering, limited aggregation, and approximation. Many applications of
stream data mining can be explored—for example, real-time detection of anomalies in
computer network traffic, botnets, text streams, video streams, power-grid flows, web
searches, sensor networks, and cyber-physical systems.
13.2 Other Methodologies of Data Mining
Due to the broad scope of data mining and the large variety of data mining method-
ologies, not all methodologies of data mining can be thoroughly covered in this topic.
In this section, we briefly discuss several interesting methodologies that were not fully
addressed in the previous chapters. These methodologies are listed in Figure 13.3.
13.2.1 Statistical Data Mining
The data mining techniques described in this topic are primarily drawn from computer
science disciplines, including data mining, machine learning, data warehousing, and
algorithms. They are designed for the efficient handling of huge amounts of data that are
 
Search WWH ::




Custom Search