Information Technology Reference
In-Depth Information
Algorithm 2
Classification And Regression Tree algorithm (CART) for an unique
variable
Require:
x
app
: Value of the variable for training set examples,
x
val
: Value of the variable for validation set examples,
x
test
: Value of the variable for test set examples,
y
app
: True labels of the training set examples,
y
test
: True labels of the test set examples,
y
val
: True labels of the validation set examples,
nb
test
: Number of individuals in the test set
Ensure:
Create a binary tree T and calculate a correct classification rate (CCR).
Initialisation step
T
/0
The tree is initialized to the empty set.
Continue
←
←
True
j
←
0
Tree growing step
while
Continue
do
If
Current Node is terminal
Then
T
y
app
)
It assigns a modality to each leaf of T using a majority vote.
Continue
←
AssignNode
(
T
x
app
,
,
←
False
,
Else
v
j
←
y
app
)
FindThreshold finds the threshold on the variable
x
app
that best separates individuals from
the two conditions.
t
j
←
(
FindThreshold
x
app
,
v
j
)
It constructs the node using the threshold value v
j
. Individuals of the training sample are
split by comparing
x
i
app
and v
i
.
End if
j
ConstructNode
(
T
,
←
j
+
1
t
j
end while
n
T
←
←
j
Tree pruning step
[
y
val
)
It compute the error of classification for each subtree using individuals from the validation
sample.
T
e
1
,
e
2
,...,
e
n
]
←
CalculateError
([
t
1
,
t
2
,...,
t
n
]
,
x
val
,
)
It prunes the tree T by keeping the subtree that gives the lower classification error ei.
i
.
T
←
Pruning
([
e
1
,
e
2
,...,
e
n
]
,
T
←
AssignNodes
(
T
,
x
app
,
y
app
)
Prediction and Correct classification rate
For
i= 1 to
nb
test
do
Pred
i
x
test
)
It predicts the class of the i
th
individual of the test set using the tree T .
end for
CCR
←
Predict
(
T
,
y
test
)
It calculates the correct classification rate by comparing the prediction and true labels.
return
T
,
CCR
←
CalculateCCR
(
Pred
,