Spatially Explicit Land-Use and Land-Cover Scenarios for China - Land Use Impacts on Climate

Geography Reference

In-Depth Information

discovered in related ancillary databases is summarized in the decision tree

structure. This classification with the technique of decision tree can be performed

without complicated computation, and this method can be used for both the

continuous and categorical variables. We find that the C4.5 classifier achieved the

highest accuracy among these methods for the land cover identification. The

classifier is developed on the basis of the decision tree learning, which is a heu-

ristic, one-step look ahead (hill climbing), non-backtracking search through the

space of all possible decision trees.

The specific principles of this classifier are as follows. First, the initial sample data

were recursively partitioned into sub-groups. Then gain values of all the attributes of

sample data are calculated, according to the numerical value of which attributes to

select classification. Next, the attribute with the largest gain value is used in logical

test, and each test forms a branch, and subsets of samples (training data) satisfying

outcomes at those child nodes are moved to their corresponding child nodes.

Thereafter, this process runs recursively on each child node until the needed leaf

nodes are obtained. Finally, the decision tree is modified according to relevant

empirical knowledge. The C4.5 classifier is one of the decision tree families that can

produce both decision tree and rule-sets; The C4.5 classifier uses two heuristic

criteria to rank proper tests, i.e., the information gain that uses the attribute selection

measure, which minimizes the total entropy of subset, and default gain ratio that

divides the information gain provided by the test outcomes. Note that the information

gain algorithm is described as the Gain function (A) as follows:

i. The attribute with the highest information gain is selected.

ii. S contains Situples of the class Ci (i = 1, …, m). m means the number of

classification.

iii. The information measure or expected information is required to classify any

arbitrary tuple:

I ð S 1 ; ... ; S m Þ ¼ X

m

S i

S log 2

S i

S

ð 3 : 14 Þ

i¼1

iv. Entropy of attribute A with values {a 1 , a 2 , …, a v } is calculated.

E ð A Þ ¼ X

v

S 1j þ ... þ S mj

S

I ð S 1j ; ... ; S mj Þ

ð 3 : 15 Þ

j¼1

v. The information gain means how much can be gained by branching on the

attribute A:

Land Use Impacts on Climate

Search WWH ::

Custom Search

Home