Introduction to Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

right transformation at the beginning, we may obtain a surprising effect

that hints to us about the transformation needed. Thus, the process

reflects upon itself and leads to an understanding of the transformation

needed. Having completed the above four steps, the following four

steps are related to the Data Mining part where the focus is on the

algorithmic aspects employed for each project.

5. Choosing the appropriate Data Mining task. We are now ready

to decide which task of Data Mining would fit best our needs, i.e.

classification, regression, or clustering. This mostly depends on the

goals and the previous steps. There are two major goals in Data

Mining: prediction and description. Prediction is often referred to as

supervised Data Mining, while descriptive Data Mining includes the

unsupervised classification and visualization aspects of Data Mining.

Most data mining techniques are based on inductive learning where

a model is constructed explicitly or implicitly by generalizing from a

sucient number of training examples. The underlying assumption of

the inductive approach is that the trained model is applicable to future

cases. The strategy also takes into account the level of meta-learning

for the particular set of available data.

6. Choosing the Data Mining algorithm. Having mastered the strat-

egy, we are able to decide on the tactics. This stage includes selecting

the specific method to be used for searching patterns. For example, in

considering precision versus understandability, the former is better with

neural networks, while the latter is better with decision trees. Meta-

learning focuses on explaining what causes a Data Mining algorithm to

be successful or unsuccessful when facing a particular problem. Thus,

this approach attempts to understand the conditions under which a

Data Mining algorithm is most appropriate.

7. Employing the Data Mining algorithm. In this step, we might

need to employ the algorithm several times until a satisfied result is

obtained. In particular, we may have to tune the algorithm's control

parameters such as the minimum number of instances in a single leaf

of a decision tree.

8. Evaluation. In this stage, we evaluate and interpret the extracted

patterns (rules, reliability, etc.) with respect to the goals defined in the

first step. This step focuses on the comprehensibility and usefulness

of the induced model. At this point, we document the discovered

knowledge for further usage.

Search WWH ::

Custom Search

Home