Database Reference
In-Depth Information
￿ Since data in databases are constantly being modified, discovery methods
should be incremental, to be able to update results as data change, without
needing to rerun the algorithms from scratch.
9.1.1 Data Mining Tasks
Data mining can be categorized into types of tasks, which correspond to
various different objectives of the data analyst. Data mining tasks aim at
discovering models and patterns. A model is a global summary of a data
set. A simple model can be represented by a linear equation like
Y = aX + b
where X and Y are variables and a and b are parameters. Opposite to the
global nature of a model, patterns make statements about restricted regions
of space spanned by the variables. An example is the simple probabilistic
statement
if X>x 1 then prob( Y>y 1 )= p 1 .
Thus, a pattern describes a structure of a relatively small part of the data
space. We next discuss the main data mining tasks.
Exploratory data analysis is an approach for data analysis that uses a
variety of (mostly graphical) techniques to get insight into a data set, aimed
at exploring the data without a clear idea of what we are looking for. Thus,
these techniques are typically visual and interactive. A common example of
an exploratory data analysis technique is to perform a scatterplot of the data
set in the plane and visually analyze the characteristics of such data set. For
example, if we can approximate this set of points using a line, we say that
the data set is linear. The same plot can also be used to visually look for
outliers.
The goal of descriptive modeling is to describe the data or the process
that generates such data. A typical descriptive technique is clustering .This
technique aims at putting together similar records based on the values of
their attributes. For example, in commercial databases, we may want to split
the records into homogeneous groups so that similar people (e.g., customers)
fall in the same group. There are two possibilities in this respect. We can
either define the number of groups in advance or let the algorithm discover
natural groups of data.
Predictive modeling aims at building a model that predicts the value
of one variable from the values of other ones. Typical techniques are
classification and regression . In the former, the variable being predicted
is categorical, while in the latter, the variable to be predicted is quantitative.
For example, in classification, we may want to categorize insurance customers
Search WWH ::




Custom Search