Inductive Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

7.8 Bias Shift Based Decision Tree Algorithm

The key of constructing good decision tree is to select good attributes. In

common, among a great deal of decision tree which can fit given training

examples, the smaller size of the tree, the more predict capability of the tree. To

construct as small as possible, the key is to select proper attributes. Since the

problem of constructing minimum tree is a NP complete problem, a large amount

of research can only employ heuristics to select good attributes. Attribute

selecting dependents on impurity measures of various example subsets. Impurity

measures include information gain, ratio of information gain, Gini-index,

distance measures, J-measures, G-statistics,

0 zero-probability

assumption, evidence weights, minimal description length(MDL), orthogonal

approach measures, correlation degree and Relief, etc. Different measures have

different effect, so difference between single-variant decision tree and

multi-variant decision tree has emerged. The conclusions of deeply studying of

these measures are not agreeable. No one algorithm is absolute predominance in

solving problems of attribute select, data noise, data increasing, pre-pruning and

post-pruning, pruning cease standard, etc. Empirical and intuitive feeling

replaced strict and complete theory proof.

In fact, above problems are bias problems in decision tree learning. Bias

plays an important role in concept learning. Utgoff points out that inductive

learning does not exist without bias. Inductive bias means all factors except

primitive training instances, including language describing hypothesis, program

space considering hypothesis, order of hypothesis procedure, admitting definitive

standard, etc. Bias has two features: one is that strong bias focus on concept

learning in relative less hypotheses. On the contrary, weak bias focus on concept

learning in relative more hypotheses; another feature is that correct bias permits

concept learning selecting target concept, while incorrect bias cannot select target

concept. When bias is strong and correct, concept can select useful target concept

immediately; when bias is weak and incorrect, the task of concept learning is

very difficult.

Bias can be divided into two categories: representation bias and procedure

bias. Since family of ID3 algorithm lacks support of background knowledge, it is

an inductive learning algorithm with support of relatively weak bias. We

strengthen the bias of decision tree through shift of representation bias and

procedure bias.

χ2 statistics,

P

Search WWH ::

Custom Search

Home