A Comparison of Rule Induction Using Feature Selection and the LEM2 Algorithm - Feature Selection for Data and Pattern Recognition - page 163

Information Technology Reference

In-Depth Information

Table 8.3 An inconsistent

data set

Case Attributes

Decision

Temperature Headache

Nausea Flu

1

High

Ye s

No

Ye s

2

Very_high

No

No

Ye s

3

Very_high

Ye s

No

No

4

Normal

No

No

No

5

High

No

Ye s

Maybe

6

Normal

Ye s

Ye s

Maybe

7

High

Ye s

No

No

The LERS data mining system uses rough set approach to inconsistent data, i.e.,

it computes lower and upper approximations for all concepts before applying LEM1

or LEM2 algorithm. Let X be a concept. In general, X is not definable in A . However,

X may be approximated by two definable sets in A , the first one is called a lower

approximation of X , denoted by appr

(

X

)

and defined as follows

{[

x

]|

x

∈

U

, [

x

]ↆ

X

} .

The second set is called an upper approximation of X , denoted by appr

(

X

)

and

defined as follows

∪{[

x

]|

x

∈

U

, [

x

]∩

X

=∅} .

For example, for the concept [( Flu , yes )] = {1, 2},

appr

( {

,

} )

1

2

={2},

and

appr

= {1, 2, 7}.

Rules induced from lower approximations are called certain , rules induced from

upper approximations are called possible .

Note that even though the data set from Table 8.3 is inconsistent, the attribute

Nausea is still redundant (irrelevant), since

( {

1

,

2

} )

} ∗ ={

} ∗

{

Temperature

,

Headache

Temperature

,

Headache

,

Nausea

={{

1

,

7

} , {

2

} , {

3

} , {

4

} , {

5

} , {

6

}} .

The LERS system computes, for every concept, a pair of data sets, based on lower

and upper approximations to induce certain and possible rule sets, respectively. For

example, for the concept {1, 2}, certain rule sets are induced from the data set

presented in Table 8.4 and possible rule sets from Table 8.5 .

Obviously, the final rule set, certain or possible, is a union of rule sets induced for

all concepts, from data sets based on lower or upper approximations, respectively,

with all rules for SPECIAL values removed.

Next Page

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home