A Comparison of Rule Induction Using Feature Selection and the LEM2 Algorithm - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

8.5 LERS Classification System

There is a few existing classification systems, e.g., associated with rule induction

systems LERS or AQ [ 25 ]. A classification system used in LERS is a modification

of the well-known bucket brigade algorithm [ 3 , 19 , 30 ]. In the LERS classifica-

tion system the decision to which concept a case belongs is made on the basis of

three factors: strength , specificity , and support . These factors are defined as follows:

strength is the total number of cases correctly classified by the rule during training.

Specificity is the total number of attribute-value pairs on the left-hand side of the

rule. The matching rules with a larger number of attribute-value pairs are considered

more specific. The third factor, support , is defined as the sum of products of strength

and specificity for all matching rules indicating the same concept. The concept C for

which the support, i.e., the following expression

(

) ∗

(

)

Strength

r

Specificity

r

matching rules r describing C

is the largest is the winner and the case is classified as being a member of C .

In the classification system of LERS, if complete matching is impossible, all

partially matching rules are identified. These are rules with at least one attribute-

value pair matching the corresponding attribute-value pair of a case. For any par-

tially matching rule r , the additional factor, called Matching _ factor ( r ), is computed.

Matching_factor( r ) is defined as the ratio of the number of matched attribute-value

pairs of r with a case to the total number of attribute-value pairs of r . In partial

matching, the concept C for which the following expression is the largest

Matching _ factor

(

r

) ∗

Strength

(

r

) ∗

Specificity

(

r

)

partially matching

rules r describing C

is the winner and the case is classified as being a member of C .

8.6 Experiments

In our experiments we used 14 data sets that are available on the Machine Learning

Repository at the University of California at Irvine, see Table 8.6 . Some of these data

sets were incomplete ( Breast Cancer-Slovenia , Soybean , Postoperative Patient and

Primary Tumor ).

For incomplete data sets missing attribute values were replaced by specified

attribute values using an imputation method called the most common value of an

attribute restricted to a concept [ 16 ].

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home