An Algorithm for Mining Weighted Dense Maximal 1-Complete Regions - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

in interpreting the relationships between two possibly overlapping patterns.

Our algorithm treats the objects and dimensions (attributes) equally, and all

patterns are reported with their associated dimensions and subsets of objects.

Another advantage of our algorithm lies in its step-wise characteristic, that

is, the computation of the next pattern depends only on the current pattern.

Our algorithm is memory e cient due to this property, since there is no need

to keep all previously generated patterns in the memory.

In the rest of the chapter, we present our algorithm in the context of the

subspace clustering problem, but the algorithm and the theorem can also be

applied to other closed set mining problems such as frequent closed item-

sets [7] and maximal biclique [8]. We first present in Sect. 2 the definition of

maximal 1-complete region, where we also introduce the terms and notations

used in this chapter. Section 3 presents our algorithm. Section 4 presents the

experimental results. Finally, we make the conclusion.

2 Problem Statement

Adataspace

is characterized by a set of attributes

(attribute space)

and a population of objects

(object space). Each object o i ∈O

has a

value assigned for each attributes a j ∈A

, denoted as d ij . We consider only

binary valued datasets in this chapter, that is, d i,j ∈

[0 , 1]. However, real

valued datasets can be quantized into binary values, and different quantization

methods lead to clusters of different semantics [6]. A subspace S is a subset

We call O and A the object set and the attribute set of the subspace cluster

respectively. Subspace clustering is a search for subsets of

. A subspace cluster C is defined as <O,A> ,where O

⊆O

and A

⊆A

(

) (the power

set of

) where interesting clusters exist.

2.1 The Prefix Tree of Subspaces

Let “ < L ” be a lexicographic order on the attributes in

, and we use a i < L

a j to indicate that attribute a i is lexicographically smaller than attribute

a j . Each subspace is represented as the set of attributes contained in it in

the lexicographically increasing order. For example, a subspace containing

attribute a 1 ,a 2 ,a 3 ( a 1 < L a 2 < L a 3 ) is labeled as

{

a 1 a 2 a 3 }

. And we arrange

all subspaces into a prefix-based tree structure

T DS as follows:

1. Each node in the tree corresponds to one subspace, and the tree is rooted

at the node corresponding to the empty subspace that contains no at-

tributes.

2. For a node with label S = {a 1 ,...,a k− 1 ,a k } , its parent is the node whose

label is S =

{

a 1 ,...,a k− 1 }

Table 2 shows an example dataset, and Fig. 1 shows its prefix tree of sub-

spaces.

Data Mining: Foundations and Practice

Search WWH ::

Custom Search

Home