Splitting Criteria - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

The same idea is used for classifying a new instance with missing

attribute values. When an instance encounters a node where its splitting

criteria can be evaluated due to a missing value, it is passed through to

all outgoing edges. The predicted class will be the class with the highest

probability in the weighted union of all the leaf nodes at which this instance

ends up.

Another approach known as surrogate splits is implemented in the

CART algorithm. The idea is to find for each split in the tree a surrogate

split which uses a different input attribute and which most resembles the

original split. If the value of the input attribute used in the original split

is missing, then it is possible to use the surrogate split. The resemblance

between two binary splits over sample S is formally defined as:

res ( a i ,dom 1 ( a i ) ,dom 2 ( a i ) ,a j ,dom 1 ( a j ) ,dom 2 ( a j ) ,S )

= σ a i ∈dom 1 ( a i )

a j ∈dom 1 ( a j ) S

+ σ a i ∈dom 2 ( a i )

a j ∈dom 2 ( a j ) S

AND

,

|

S

|

S

|

(5.18)

where the first split refers to attribute a i and it splits dom ( a i )into dom 1 ( a i )

and dom 2 ( a i ). The alternative split refers to attribute a j

and splits its

domain to dom 1 ( a j )and dom 2 ( a j ).

The missing value can be estimated based on other instances. On the

learning phase, if the value of a nominal attribute a i in tuple q is missing,

then it is estimated by its mode over all instances having the same target

attribute value. Formally,

σ a i = v i,j

AND y = y q S ,

estimate ( a i ,y q ,S ) =

argmax

v i,j ∈dom ( a i )

(5.19)

where y q denotes the value of the target attribute in the tuple q .Ifthe

missing attribute a i is numeric, then, instead of using mode of a i ,itis

more appropriate to use its mean.

Search WWH ::

Custom Search

Home