Dependency Analysis and Attribute Reduction in the Probabilistic Approach to Rough Sets - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

information-preserving reduct with respect to this dependency. Couple of efficient,

linear-time algorithms for computing single attribute reducts, either in classification

tables or probabilistic decision tables, are presented. The ability to compute reducts

allows us also to determine the importance, or significance of attributes. This is the

subject of Sect. 6.7 . Finally, in Sect. 6.8 , we discuss the concept of generalized core

attributes, the extension of the original core attributes introduced by Pawlak [ 10 , 11 ].

The core attributes are the fundamental ones, which are preserved in every attribute

reduction.

6.2 Variable Precision Rough Sets

In the rough set approach to data analysis, the crucial aspect is the existence of

an ability, or knowledge, to form the prior classification of the universe of objects

of interest into distinct classes. This ability, or classification knowledge , is usually

associated with an external agent, such as medical professional for example, who

is assumed to know how to classify objects (for example patients) into categories

(for example, into health condition groups). However, in automated systems such

an expert typically is not available. Instead, the system has to rely on measurements

taken by system sensors (for example, temperature, blood pressure etc.) to perform

the classification. In the rough set approach, the measurements are converted into

discrete features called attribute values , which are then used to classify objects. We

elaborate in detail about the attribute value-based classifications in Sect. 6.4 .

The general variable precision rough set (VPRS) model does not make any

assumptions how the prior classification was performed. It just assumes that some

kind of prior knowledge exists and is represented in mathematical form by an equiv-

alence relation, referred to as an indiscernibility relation IND on the universe U , IND

ↆ

U . The relation is assumed to have a finite number of equivalence classes, i.e.

classification categories, called elementary sets. It should be noted that the assump-

tion of finite number of classes may not be satisfied in general, but in attribute-value

systems, which are the focus of this chapter, it is always the case. The collection of

elementary sets of the IND relation will be denoted as IND ∗ . The pair ( U, IND )is

called an approximation space.

Let X be an arbitrary subset, referred to as the target set , of the universe U , X

U

×

U .

In practice, the universe is a finite non-empty collection of objects of interest, such

as medical patients, and the target set is our “goal” class, for example representing

the class of patients suffering from a specific disease. Our objective is to create a

system which would allow us to classify arbitrary objects into the “goal” class, or its

complement, with an error rate which we would consider acceptable in the context

of our criteria (which are domain-specific and, consequently, outside of the rough set

model), but lower, on average, than in the case of random classification. For exam-

ple, the objective may be to predict (diagnose) the presence, or absence, of a specific

disease based on the results of medical tests, which are supposed to increase the accu-

racy of such predictions (if tests are properly designed) in comparison to predictions

based solely on the frequency of occurrence of the disease in the population.

ↆ

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home