A Comparison of Rule Induction Using Feature Selection and the LEM2 Algorithm - Feature Selection for Data and Pattern Recognition - page 159

Information Technology Reference

In-Depth Information

8.2 Rule Induction Based on Feature Selection

We will discuss rule induction which belongs to supervised learning , i.e., we will

assume that all cases are preclassified by an expert. In the data set one of variables

is called a decision and the decision value is assigned by an expert to each case. A

very simple example of such a table is presented as Table 8.2 , in which attributes

are: Temperature , Headache , Nausea , and the decision is Flu . The set of all cases

labeled by the same decision value is called a concept . For Table 8.2 , case set {1, 2}

is a concept of all cases affected by flu (for each case from this set the corresponding

value of Flu is yes ).

Note that in Table 8.2 the attribute Nausea is redundant (irrelevant). Remaining

two attributes ( Temperature and Headache ) distinguish all six cases. Let us make it

more precise using fundamental definitions of rough set theory [ 26 - 28 ]. Let B be

a nonempty subset of the set A of all attributes. Let U denote the set of all cases.

The indiscernibility relation IND

(

B

)

is a relation on U defined for x

,

y

∈

U by

(

x

,

y

) ∈

IND

(

B

)

if and only if for both x and y the values for all attributes from B

are identical.

The indiscernibility relation IND

(

B

)

is an equivalence relation. Equivalence

classes of IND

are called elementary sets of B . Any union of elementary sets

of B is called a definable set in B . For Table 8.2 , and B ={ Temperature , Headache },

elementary sets of IND

(

B

)

are {1}, {2}, {3}, {4}, {5}, and {6}.

The family of all B -elementary sets is a partition on U . This set will be denoted

by B ∗ , for example, in Table 8.2 ,

(

B

)

} ∗ ={{

{

Temperature

,

Headache

1

} , {

2

} , {

3

} , {

4

} , {

5

} , {

6

}} .

depends on B if and only if B ∗ ≤{

} ∗ , i.e., for

For a decision d we say that

{

d

}

d

} ∗ such that X

any elementary set X in B there exists a concept C from

{

d

ↆ

C .A

global covering (or relative reduct )of

depends

on B and B is minimal in A . The algorithm to compute a single global covering is

presented below.

{

d

}

is a subset B of A such that

{

d

}

Table 8.2 A consistent data

set

Case

Attributes

Decision

Temperature Headache

Nausea Flu

1

High

Ye s

No

Ye s

2

Very_high

No

No

Ye s

3

Very_high

Ye s

No

No

4

Normal

No

No

No

5

High

No

Ye s

Maybe

6

Normal

Ye s

Ye s

Maybe

Next Page

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home