Dependency Analysis and Attribute Reduction in the Probabilistic Approach to Rough Sets - Feature Selection for Data and Pattern Recognition

Information Technology Reference

In-Depth Information

initial preprocessing via a discretization procedure to make it applicable to rough

set methodology. This pre-processing however leads to a loss of information and

introduces a subjective factor into the method.

The variable precision and Bayesian rough set models are focused on the recogni-

tion andmodelling of set overlap-based, also referred to as probabilistic, relationships

between sets, which are most useful when dealing with noisy data. In this approach,

the set-overlap relationships are used to construct approximations of undefinable sets

[ 11 ]. The primary application of the approach is to the analysis of data co-occurrence-

based dependencies in classification tables and probabilistic decision tables derived

from data, as discussed in the following sections. Both, the probabilistic decision

tables and classification tables are normally “learned” from data to represent some

inter-data item connections, typically for the purpose of their analysis or data value

prediction. The probabilistic decision tables can also be used as a basis of generalized

probabilistic rule induction algorithms [ 29 ], but this topic is outside the scope of this

chapter.

In practical applications of the data-acquired decision tables, one of the main

issues is the identification of a minimal subset of attributes, which are discrete func-

tions of measured features, to represent an identified data dependency without any

loss, or with minimal loss, of information. The original general idea of attribute

reduct, as introduced by Pawlak [ 10 , 11 ], is applicable here. However, the original

specific notion of reduct is applicable only to functional, or partial functional, data

dependencies. In this chapter, we discuss an extended notion of reduct, as defined

in the contexts of variable precision and Bayesian rough set models. The notion of

reduct in these contexts allows for information-preserving identification of minimal

subsets of attributes, in the presence of probabilistic dependencies between attributes.

The chapter is organized as follows. In the next section, we review the fundamen-

tals of the variable precision rough set approach, which include the introduction of set

approximations and the presentation of the basics of the related Bayesian rough set

model. In Sect. 6.3 , we discuss different kinds of probabilistic dependencies occur-

ring between a “target set” and a partition of the universe of interest. The partition

is assumed to represent our classification knowledge. The target set is our learning

goal, whose approximate classification in terms of the classification knowledge we

are trying to learn. The dependencies in question reflect our overall ability to cre-

ate such a classification. In Sect. 6.4 , the probabilistic attribute value-based decision

tables are introduced, along with related classification tables. Both kinds of these

tables represent our classification knowledge with respect to the target set.

The probabilistic decision tables additionally represent rough approximations of

the target set, as defined in the framework of the variable precision rough set theory.

The inter-attribute dependencies occurring in both, the probabilistic decision tables

and classification tables, are subject of Sect. 6.5 . All the discussed dependencies are

of probabilistic nature and are either defined in the contexts of variable precision or

Bayesian rough set models. They generalize and expand the attribute dependencies

introduced by Pawlak in the original rough set theory [ 11 ]. Attribute reduction with

respect to introduced dependencies is a subject of Sect. 6.6 . The monotonicity prop-

erty of the introduced

ʻ

— dependency measure allows for a definition of the notion of

Feature Selection for Data and Pattern Recognition

Search WWH ::

Custom Search

Home