Databases Reference
In-Depth Information
steps. However, there are people (mainly with strong statistical background)
who consider DM as a branch of statistics, because many DM tasks may be
perfectly represented in terms of statistics.
The Data Compression Paradigm
The data compression approach to DM can be stated in the following way:
compress the dataset by finding some structure or knowledge within it, where
knowledge is interpreted as a representation that allows coding the data using
a fewer amount of bits. For example, the minimum description length (MDL)
principle [32] can be used to select among different encodings accounting for
both the complexity of a model and its predictive accuracy.
Machine learning practitioners have used the MDL principle in different
interpretations to recommend that even when a hypothesis is not the most
empirically successful among those available, it may be the one to be chosen
if it is simple enough. The idea is in balancing between the consistency with
training examples and the empirical adequacy by predictive success as it is,
for example, with accurate decision tree construction. Bensusan [2] connects
this to another methodological issue, namely that theories should not be ad
hoc, that is they should not simply overfit all the examples used to build it.
Simplicity is the remedy for being ad hoc both in the recommendations of the
philosophy of science and in the practice of machine learning.
The data compression approach has also connections with the rather old
Occam's razor principle that was introduced in the fourteenth century. The
most commonly used formulation of this principle in DM is “when you have
two competing models which make exactly the same predictions, the one that
is simpler is better”.
Many (if not all) DM techniques can be viewed in terms of the data com-
pression approach. For example, association rules and pruned decision trees
can be viewed as ways of providing compression of parts of the data. Clus-
tering can also be considered as a way of compressing the dataset. There is
a connection with the Bayesian theory for modeling the joint distribution -
any compression scheme can be viewed as providing a distribution on the set
of possible instances of the data.
The Machine Learning Paradigm
The machine learning (ML) paradigm, “let the data suggest a model”, can be
seen as a practical alternative to the statistical paradigm “fit a model to the
data”. It is certainly reasonable in many situations to fit a small dataset to a
parametric model based on a series of assumptions. However, for applications
with large volumes of data under analysis the ML paradigm may be beneficial
because of its flexibility with a nonparametric, assumption-free nature.
We would like to focus here on the constructive induction approach. Con-
structive induction is a learning process that consists of two intertwined
Search WWH ::




Custom Search