Does Relevance Matter to Data Mining Research? - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

steps. However, there are people (mainly with strong statistical background)

who consider DM as a branch of statistics, because many DM tasks may be

perfectly represented in terms of statistics.

The Data Compression Paradigm

The data compression approach to DM can be stated in the following way:

compress the dataset by finding some structure or knowledge within it, where

knowledge is interpreted as a representation that allows coding the data using

a fewer amount of bits. For example, the minimum description length (MDL)

principle [32] can be used to select among different encodings accounting for

both the complexity of a model and its predictive accuracy.

Machine learning practitioners have used the MDL principle in different

interpretations to recommend that even when a hypothesis is not the most

empirically successful among those available, it may be the one to be chosen

if it is simple enough. The idea is in balancing between the consistency with

training examples and the empirical adequacy by predictive success as it is,

for example, with accurate decision tree construction. Bensusan [2] connects

this to another methodological issue, namely that theories should not be ad

hoc, that is they should not simply overfit all the examples used to build it.

Simplicity is the remedy for being ad hoc both in the recommendations of the

philosophy of science and in the practice of machine learning.

The data compression approach has also connections with the rather old

Occam's razor principle that was introduced in the fourteenth century. The

most commonly used formulation of this principle in DM is “when you have

two competing models which make exactly the same predictions, the one that

is simpler is better”.

Many (if not all) DM techniques can be viewed in terms of the data com-

pression approach. For example, association rules and pruned decision trees

can be viewed as ways of providing compression of parts of the data. Clus-

tering can also be considered as a way of compressing the dataset. There is

a connection with the Bayesian theory for modeling the joint distribution -

any compression scheme can be viewed as providing a distribution on the set

of possible instances of the data.

The Machine Learning Paradigm

The machine learning (ML) paradigm, “let the data suggest a model”, can be

seen as a practical alternative to the statistical paradigm “fit a model to the

data”. It is certainly reasonable in many situations to fit a small dataset to a

parametric model based on a series of assumptions. However, for applications

with large volumes of data under analysis the ML paradigm may be beneficial

because of its flexibility with a nonparametric, assumption-free nature.

We would like to focus here on the constructive induction approach. Con-

structive induction is a learning process that consists of two intertwined

Search WWH ::

Custom Search

Home