Databases Reference
In-Depth Information
in interpreting the relationships between two possibly overlapping patterns.
Our algorithm treats the objects and dimensions (attributes) equally, and all
patterns are reported with their associated dimensions and subsets of objects.
Another advantage of our algorithm lies in its step-wise characteristic, that
is, the computation of the next pattern depends only on the current pattern.
Our algorithm is memory e cient due to this property, since there is no need
to keep all previously generated patterns in the memory.
In the rest of the chapter, we present our algorithm in the context of the
subspace clustering problem, but the algorithm and the theorem can also be
applied to other closed set mining problems such as frequent closed item-
sets [7] and maximal biclique [8]. We first present in Sect. 2 the definition of
maximal 1-complete region, where we also introduce the terms and notations
used in this chapter. Section 3 presents our algorithm. Section 4 presents the
experimental results. Finally, we make the conclusion.
2 Problem Statement
Adataspace
DS
is characterized by a set of attributes
A
(attribute space)
and a population of objects
O
(object space). Each object o i ∈O
has a
value assigned for each attributes a j ∈A
, denoted as d ij . We consider only
binary valued datasets in this chapter, that is, d i,j
[0 , 1]. However, real
valued datasets can be quantized into binary values, and different quantization
methods lead to clusters of different semantics [6]. A subspace S is a subset
of
.
We call O and A the object set and the attribute set of the subspace cluster
respectively. Subspace clustering is a search for subsets of
A
. A subspace cluster C is defined as <O,A> ,where O
⊆O
and A
⊆A
P
(
A
) (the power
set of
A
) where interesting clusters exist.
2.1 The Prefix Tree of Subspaces
Let “ < L ” be a lexicographic order on the attributes in
, and we use a i < L
a j to indicate that attribute a i is lexicographically smaller than attribute
a j . Each subspace is represented as the set of attributes contained in it in
the lexicographically increasing order. For example, a subspace containing
attribute a 1 ,a 2 ,a 3 ( a 1 < L a 2 < L a 3 ) is labeled as
A
{
a 1 a 2 a 3 }
. And we arrange
all subspaces into a prefix-based tree structure
T DS as follows:
1. Each node in the tree corresponds to one subspace, and the tree is rooted
at the node corresponding to the empty subspace that contains no at-
tributes.
2. For a node with label S = {a 1 ,...,a k− 1 ,a k } , its parent is the node whose
label is S =
{
a 1 ,...,a k− 1 }
.
Table 2 shows an example dataset, and Fig. 1 shows its prefix tree of sub-
spaces.
Search WWH ::




Custom Search