Introduction to Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

(2) Fraud Detection — Oxford English Dictionary defines fraud as “An

act or instance of deception, an artifice by which the right or interest

of another is injured, a dishonest trick or stratagem.” Fraud detec-

tion aims to identify fraud as quickly as possible once it has been

perpetrated.

(3) Churn Detection — This application helps sellers to identify customers

with a higher probability of leavingand potentially moving to a

competitor. By identifying these customers in advance, the company

can act to prevent churning (for example,offeringabetterdealtothe

consumer).

Each application is built by accomplishing one or more machine

learning tasks. The second layer in our four layers model is dedicated to

the machine learning tasks, such as: Classification, Clustering, Anomaly

Detection, Regression etc. Each machine learning task can be accomplished

by various machine learning models as indicated in the third layer. For

example, the classification task can be accomplished by the following two

models: Decision Trees or Artificial Neural Networks. In turn, each model

can be induced from the training data using various learning algorithms.

For example, a decision tree can be built using either C4.5 algorithm or

CART algorithm that will be described in the following chapters.

1.4 Knowledge Discovery in Databases (KDD)

KDD process was defined by [Fayyad et al . (1996)] as “the nontrivial process

of identifying valid, novel, potentially useful, and ultimately understandable

patterns in data.” Friedman (1997a) considers the KDD process as an

automatic exploratory data analysis of large databases. Hand (1998) views

it as a secondary data analysis of large databases. The term “Secondary”

emphasizes the fact that the primary purpose of the database was not data

analysis. Data Mining can be considered as the central step for the overall

process of the KDD process. Because of the centrality of data mining for

the KDD process, there are some researchers and practitioners who use the

term “data mining” as synonymous with the complete KDD process.

Several researchers, such as [ Brachman and Anand (1994) ] , [ Fayyad

et al . (1996) ] and [ Reinartz (2002) ] have proposed different ways of dividing

the KDD process into phases. This topic adopts a hybridization of these

proposals and suggests breaking the KDD process into nine steps as

presented in Figure 1.2. Note that the process is iterative at each step,

which means that going back to adjust previous steps may be necessary. The

Search WWH ::

Custom Search

Home