Information Technology Reference
In-Depth Information
sub-regions based on the sample set. In this way the decision tree recursively breaks
down the complexity of the decision space. The outcome has a format, which
naturally presents the cognitive strategy of the human decision-making process. This
satisfies our need for visualization and reporting the results to the marketing people.
A decision tree consists of nodes and branches. Each node represents a single test or
decision. In the case of a binary tree, the decision is either true or false.
Geometrically, the test describes a partition orthogonal to one of the coordinates of
the decision space. The starting node is usually referred to as the root node.
Depending on whether the result of a test is true or false, the tree will branch right or
left to another node. Finally, a terminal node is reached (sometimes referred to as a
leaf), and a decision is made on the class assignment. Also non-binary decision trees
are used. In these trees more than two branches may leave a node, but again only one
branch may enter a node. For any tree all paths lead to a terminal node corresponding
to a decision rule of the “IF-THEN” form that is a conjunction (AND) of various
tests.
The main tasks during decision tree learning can be summarized as follows: attribute
selection, attribute discretization, splitting, and pruning. We will develop special
methods for attribute discretization
that allow to discretize numerical attributes
into more than two intervals during decision tree learning and to agglomerated
categorical attribute values into supergroups. This leads to more compact trees with
better accuracy. Besides that we will develop special pruning methods. Both
techniques are necessary for the special kind of data and will be set up for the special
needs of learning the user model.
To understand the concept drift, we will develop a method to compare the outcome
of the decision tree induction process and to derive conclusions from it. This will give
us a special technique to control the user model.
6
5.2.4 Web Usage Mining
Analyzing the server logs and the history list can help to understand the user behavior
and the web structure, thereby improving the design of the website. Applying data
mining techniques on access logs unveils interesting access patterns that can be used
to restructure sites in more efficient groupings, pinpoint effective advertising
locations, and target specific users for specific selling ads.
Methods for web usage analysis based on sequence analysis are described in
.
We intent to develop conceptual clustering technique to understand the user accessing
pattern. Classical clustering methods only create clusters but do not explain why a
cluster has been established. Conceptual clustering methods build clusters and explain
why a set of objects confirms a cluster. Thus, conceptual clustering is a type of
learning by observations and it is a way of summarizing data in an understandable
manner
20
. In contrast to hierarchical clustering methods, conceptual clustering
methods build the classification hierarchy not only based on merging two groups. The
algorithmic properties are flexible enough in order to dynamically fit the hierarchy to
the data. This allows incremental incorporation of new instances into the existing
hierarchy and updating this hierarchy according to the new instance.
We propose an algorithm that incrementally learns the organizational structure
7
.
This organization scheme is based on a hierarchy and can be up-dated incrementally
8
Search WWH ::




Custom Search