Databases Reference
In-Depth Information
us to obtain a succinct representation of nonredundant
-pCluster is
maximal if no more rows or columns can be added into the cluster while maintaining the
-pClusters. A
-pCluster property. To avoid redundancy, instead of finding all
-pClusters, we only
need to compute all maximal
-pClusters.
MaPle is an algorithm that enumerates all maximal
-pClusters. It systematically
enumerates every combination of conditions using a set enumeration tree and a depth-
first search. This enumeration framework is the same as the pattern-growth methods
for frequent pattern mining (Chapter 6). Consider gene expression data. For each con-
dition combination, J , MaPle finds the maximal subsets of genes, I , such that I J is
a
-pCluster. If I J is not a submatrix of another
-pCluster, then I J is a maximal
-pCluster.
There may be a huge number of condition combinations. MaPle prunes many
unfruitful combinations using the monotonicity of
-pClusters. For a condition com-
bination, J , if there does not exist a set of genes, I , such that I J is a
-pCluster, then
we do not need to consider any superset of J . Moreover, we should consider I J as a
candidate of a
-subset J 0 of J , I J 0 is a
-pCluster.
MaPle also employs several pruning techniques to speed up the search while retaining
the completeness of returning all maximal
-pCluster only if for every
.j J j1
/
-pClusters. For example, when examining a
current
-pCluster, I J , MaPle collects all the genes and conditions that may be added
to expand the cluster. If these candidate genes and conditions together with I and J form
a submatrix of a
-pCluster that has already been found, then the search of I J and any
superset of J can be pruned. Interested readers may refer to the bibliographic notes for
additional information on the MaPle algorithm (Section 11.7).
An interesting observation here is that the search for maximal
-pClusters in MaPle is
somewhat similar to mining frequent closed itemsets. Consequently, MaPle borrows the
depth-first search framework and ideas from the pruning techniques of pattern-growth
methods for frequent pattern mining. This is an example where frequent pattern mining
and cluster analysis may share similar techniques and ideas.
An advantage of MaPle and the other algorithms that enumerate all biclusters is that
they guarantee the completeness of the results and do not miss any overlapping biclus-
ters. However, a challenge for such enumeration algorithms is that they may become very
time consuming if a matrix becomes very large, such as a customer-purchase matrix of
hundreds of thousands of customers and millions of products.
11.2.4 DimensionalityReductionMethodsandSpectral
Clustering
Subspace clustering methods try to find clusters in subspaces of the original data
space. In some situations, it is more effective to construct a new space instead of using
subspaces of the original data. This is the motivation behind dimensionality reduction
methods for clustering high-dimensional data.
Example11.14 Clustering in a derived space. Consider the three clusters of points in Figure 11.9. It is
not possible to cluster these points in any subspace of the original space, X Y , because
 
Search WWH ::




Custom Search