Geography Reference
In-Depth Information
discovered in related ancillary databases is summarized in the decision tree
structure. This classification with the technique of decision tree can be performed
without complicated computation, and this method can be used for both the
continuous and categorical variables. We find that the C4.5 classifier achieved the
highest accuracy among these methods for the land cover identification. The
classifier is developed on the basis of the decision tree learning, which is a heu-
ristic, one-step look ahead (hill climbing), non-backtracking search through the
space of all possible decision trees.
The specific principles of this classifier are as follows. First, the initial sample data
were recursively partitioned into sub-groups. Then gain values of all the attributes of
sample data are calculated, according to the numerical value of which attributes to
select classification. Next, the attribute with the largest gain value is used in logical
test, and each test forms a branch, and subsets of samples (training data) satisfying
outcomes at those child nodes are moved to their corresponding child nodes.
Thereafter, this process runs recursively on each child node until the needed leaf
nodes are obtained. Finally, the decision tree is modified according to relevant
empirical knowledge. The C4.5 classifier is one of the decision tree families that can
produce both decision tree and rule-sets; The C4.5 classifier uses two heuristic
criteria to rank proper tests, i.e., the information gain that uses the attribute selection
measure, which minimizes the total entropy of subset, and default gain ratio that
divides the information gain provided by the test outcomes. Note that the information
gain algorithm is described as the Gain function (A) as follows:
i. The attribute with the highest information gain is selected.
ii. S contains Situples of the class Ci (i = 1, …, m). m means the number of
classification.
iii. The information measure or expected information is required to classify any
arbitrary tuple:
I ð S 1 ; ... ; S m Þ ¼ X
m
S i
S log 2
S i
S
ð 3 : 14 Þ
i¼1
iv. Entropy of attribute A with values {a 1 , a 2 , …, a v } is calculated.
E ð A Þ ¼ X
v
S 1j þ ... þ S mj
S
I ð S 1j ; ... ; S mj Þ
ð 3 : 15 Þ
j¼1
v. The information gain means how much can be gained by branching on the
attribute A:
Search WWH ::




Custom Search