Database Reference
In-Depth Information
applications such as grouping Web transactions [ 135 ]. Subsequently, a very large
number of methods have been designed for clustering high-dimensional data with
the use of pattern-based methods. A detailed discussion of the connections between
such high dimensional clustering algorithms and the frequent pattern mining problem
may be found in the survey article [ 106 ] and in chapter on high dimensional data in [ 4 ].
A second problem is on using pattern mining methods for clustering discrete
attributes such as the case of biological data. Clusters can be considered as an or-
thogonal representation of the localized associations, as is the case for all subspace
clustering methods. Such a technique for finding localized associations and clusters
simultaneously is discussed in [ 9 ]. In this work, it is shown that localized associa-
tions can be enhanced, when local regions of the data are explored simultaneously
with the association analysis process. At the same time, the clustering process is
enhanced as well. This is also the general principle in many clustering methods such
as matrix factorization and co-clustering [ 4 ]. Biological data is often represented as
a sequence of discrete values corresponding to the amino-acids or the DNA/RNA
bases. The sequences are usually too long to be clustered purely by similarity com-
putations alone. Therefore, the use of pattern or motif-mining can be very useful in
these cases. An example of a sequence-based clustering approach is the CLUSEQ
method [ 136 ]. A common class of algorithms in this context is those of biclustering,
in which clusters are constructed from frequent patterns in biological data [ 93 , 99 ].
An excellent survey on biclustering methods may be found in [ 93 ]. The problem
of motif discovery is very closely related to that of clustering in such domains. A
discussion of different methods which connect the frequent pattern mining problem
to the clustering problem in the context of biological data may be found in [ 4 ].
4
Frequent Patterns for Classification
The problem of data classification is closely related to that of frequent pattern mining,
particularly in the context of rule-based methods . A classification rule is a condition
of the form:
A 1 =
a 1 , A 2 =
a 2
C
=
c
In the case, the left hand side of the rule implies that attributes A 1 and A 2 should take
on values a 1 and a 2 respectively, and the right hand side implies that the class value
should be c . The training phase creates a set of rules from the labeled data, whereas
the testing phase determines the relevant (or fired ) rules, for which the left-hand side
of the rule matches the test instance. The final class label for the test instance is
determined as a carefully designed combination of the class labels on the right-hand
side of the fired rules. In addition, a default (or catch-all) label may be defined, if no
rules are fired by a test instance, in order to ensure full coverage.
Since classification rules are of a very similar form as association rules, it is
possible to determine relevant patterns from the data with the use of association
rule mining techniques. The main goal is to ensure that the patterns are sufficiently
Search WWH ::




Custom Search