A Bipolar Interpretation of Fuzzy Decision Trees - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

instances, it provides interesting alternatives to the variety of decision trees

proposed in the literature.

The rest of this paper is organized as follows. In Sects. 2 and 3, we present

the data format and rule form of the classification problem respectively. In

particular, we propose fuzzy data table for data representation, and use fuzzy

decision logic as the rule representation language. In Sect. 4, we introduce a

uniform framework, called general fuzzy decision trees . The edges of a general

fuzzy decision tree are labeled by fuzzy decision logic formulas and the nodes

are split according to the satisfaction of these formulas in the data records (or

objects). We also present a construction algorithm for general fuzzy decision

trees. In Sect. 5, we show the application of our framework to different types

of training data by instantiating it to some specific cases. In particular, the

bipolar interpretation of general fuzzy decision trees results in ordinary fuzzy

decision trees [6] and multi-valued decision trees [2]. Finally, in Sect. 6, we

briefly conclude this paper and indicate some further research directions.

2 Data Representation

A data table is normally used as means of storing data. A formal definition

of a data table is given in [12].

Definition 1. A data table 1 is a pair S =( U,A ) such that

•

U =

{

x 1 ,x 2 ,

···

,x n }

is a nonempty finite set, called the universe

•

A =

{

f 1 ,f 2 ,

···

,f m }

is a nonempty finite set of primitive attributes

•

For 1

V i is a total function, where V i is the set of

values for f i , called the domain of values of f i .

≤

i

≤

m, f i : U

→

To distinguish data tables from fuzzy data tables, we call them precise data

tables. Hereafter, when we mention a data table S =( U,A ), we assume that

the cardinalities of U and A are respectively n and m , f i denotes the i th

attribute in A ,and V i is its domain of values. Each element in U represents a

data record. Since each data record describes the attributes of an object, we

identify a data record with the object described by the data record. Thus, the

elements of U are also called objects. In the following presentation, we treat

the terms “data records” and “objects” interchangeably.

In a precise data table, it is assumed that f i ( x ) is exactly known for each

object x and attribute f i . However, in some practical situations, we have only

incomplete information about f i ( x )forsome f i and x . To accommodate such

situations, incomplete information systems have been proposed [8-10, 16, 17].

Furthermore, many practical data mining problems need to deal with multi-

valued data [2].

1 Also called knowledge representation system, information system, or attribute-

value system.

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home