Intelligent E-marketing with Web Mining, Personalization, and User-Adpated Interfaces - Advances in Data Mining

Information Technology Reference

In-Depth Information

For that kind of data mining, we need to know the classes or goals our system

should predict. In most cases we might know these goals a-priori. However, there are

other tasks were the goals are not known a-priori. In that case, we have to find out the

classes based on methods such as clustering before we can go into predictive mining.

Furthermore, the prediction methods can be distinguished into classification and

regression while knowledge discovery can be distinguished into: deviation detection,

clustering, mining association rules, and visualization. To categorize the actual

problem into one of these problem types is the first necessary step when dealing with

Data Mining.

Note that Figure 4 only describes the basic types of data mining methods. We

consider for e.g. text mining, web mining or image mining only as variants of the

basic types of data mining which need a special data preparation.

4.2 Prediction

4.2.1 Classification

Assume there is a set of observations from a particular domain. Among this set of

data there is a subset of data labeled by class 1 and another subset of data labeled by

class 2. Each data entry is described by some descriptive domain variables and the

class label. To give the reader an idea, let us say we have collected information about

customers, such as marital status, sex, and number of children. The class label is the

information whether the customer has purchased a certain product or not. Now we

want to know how the group of buyers and non-buyers is characterized.

The task is now to find a mapping function that allows to separate samples belonging

to class 1 (e.g. the group of internet users) from those belonging to class 2 (e.g. the

group of people that do not use the internet). Furthermore, this function should allow

to predict the class membership of new formerly unseen samples.

Such kind of problems belong to the problem type "classification". There can be more

than two classes but for simplicity we are only considering the two-class problem.

The mapping function can be learnt by decision tree or rule induction, neural

networks, discriminate analysis or case-based reasoning. We will concentrate in this

paper on symbolic learning methods such as decision tree induction. The decision tree

learnt based on the data of our little example described above is shown in Figure 5.

The profile of the buyers is: marital_status = single, number_of_ children=0. The

profile of the non-buyers is: marital_status = married or marital_status = single and

number_of_children > 0. This information can be used to promote potential

customers.

4.2.2 Regression

Whereas classification determines the set membership of the samples, the answer of

regression is numerical. Suppose we have a CCD sensor. We give light of a certain

luminous intensity to this sensor. Then this light is transformed into a gray value by

the sensor according to a transformation function. If we change the luminous intensity

we also change the gray value. That means the variability of the output variable, will

be explained based on the variability of one or more input variables.

Search WWH ::

Custom Search

Home