MLEM2 Rule Induction Algorithms: With and Without Merging Intervals - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

G := B

−∪ T∈T [ T ];

end

{

while

}

;

for each T

∈T

do

if S∈T −{T }

[ S ]= B then

T

:=

T−{

T

}

;

end

{

procedure

}

.

denotes the cardinality of X .

MLEM2, a modified version of LEM2, processes numerical attributes dif-

ferently than symbolic attributes. For numerical attributes MLEM2 sorts all

values of a numerical attribute. Then it computes cutpoints as averages for

any two consecutive values of the sorted list. For each cutpoint q MLEM2

creates two blocks, the first block contains all cases for which values of the

numerical attribute are smaller than q , the second block contains remaining

cases, i.e., all cases for which values of the numerical attribute are larger than

q . The search space of MLEM2 is the set of all blocks computed this way,

together with blocks defined by symbolic attributes. Starting from that point,

rule induction in MLEM2 is conducted the same way as in LEM2.

Additionally, the newest version of MLEM2, with merging intervals, at

the very end simplifies rules by, as its name indicates, merging intervals for

numerical attributes.

For a set X ,

|

X

|

4 Classification System

Rules induced from raw, training data are used for classification of unseen,

testing data. The classification system of LERS is a modification of the bucket

brigade algorithm [1, 12]. The decision to which concept a case belongs to is

made on the basis of three factors: strength, specificity, and support. They are

defined as follows: Strength is the total number of cases correctly classified by

the rule during training. Specificity is the total number of attribute-value pairs

on the left-hand side of the rule. The matching rules with a larger number of

attribute-value pairs are considered more specific. The third factor, support ,

is defined as follows

Strength factor ( R )

∗

Specificity factor ( R ) .

matching rules R describing C

The concept C for which the support is the largest is a winner and the

case is classified as being a member of C .

In the classification system of LERS, if complete matching is impossible,

all partially matching rules are identified. These are rules with at least one

attribute-value pair matching the corresponding attribute-value pair of a case.

For any partially matching rule R , the additional factor, called Matching

factor ( R ), is computed. Matching factor ( R ) is defined as the ratio of the

number of matched attribute-value pairs of R with a case to the total number

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home