Information Technology Reference
In-Depth Information
7.8 Bias Shift Based Decision Tree Algorithm
The key of constructing good decision tree is to select good attributes. In
common, among a great deal of decision tree which can fit given training
examples, the smaller size of the tree, the more predict capability of the tree. To
construct as small as possible, the key is to select proper attributes. Since the
problem of constructing minimum tree is a NP complete problem, a large amount
of research can only employ heuristics to select good attributes. Attribute
selecting dependents on impurity measures of various example subsets. Impurity
measures include information gain, ratio of information gain, Gini-index,
distance measures, J-measures, G-statistics,
0 zero-probability
assumption, evidence weights, minimal description length(MDL), orthogonal
approach measures, correlation degree and Relief, etc. Different measures have
different effect, so difference between single-variant decision tree and
multi-variant decision tree has emerged. The conclusions of deeply studying of
these measures are not agreeable. No one algorithm is absolute predominance in
solving problems of attribute select, data noise, data increasing, pre-pruning and
post-pruning, pruning cease standard, etc. Empirical and intuitive feeling
replaced strict and complete theory proof.
In fact, above problems are bias problems in decision tree learning. Bias
plays an important role in concept learning. Utgoff points out that inductive
learning does not exist without bias. Inductive bias means all factors except
primitive training instances, including language describing hypothesis, program
space considering hypothesis, order of hypothesis procedure, admitting definitive
standard, etc. Bias has two features: one is that strong bias focus on concept
learning in relative less hypotheses. On the contrary, weak bias focus on concept
learning in relative more hypotheses; another feature is that correct bias permits
concept learning selecting target concept, while incorrect bias cannot select target
concept. When bias is strong and correct, concept can select useful target concept
immediately; when bias is weak and incorrect, the task of concept learning is
very difficult.
Bias can be divided into two categories: representation bias and procedure
bias. Since family of ID3 algorithm lacks support of background knowledge, it is
an inductive learning algorithm with support of relatively weak bias. We
strengthen the bias of decision tree through shift of representation bias and
procedure bias.
χ2 statistics,
P
Search WWH ::




Custom Search