Database Reference
In-Depth Information
of scalability is particularly important in the context of pattern mining of biological
sequences, because such sequences are typically very long [ 127 ]. As discussed ear-
lier, frequent pattern mining algorithms are used for biclustering of biological data
[ 93 ]. It should be pointed out that many of the sequential pattern mining algorithms,
which were originally designed for temporal data, can also be applied directly to
biological sequences.
In the context of graph structured biological data, a key problem is that of clus-
tering protein-protein interaction networks. Such networks can be rather large, and
the problem shares a number of similarities with that of community detection in
social networks. Since frequent pattern mining algorithms are closely related to that
of clustering, such methods can also be used for community detection in interac-
tion networks. Such a method for using frequent subgraph mining algorithms for
community detection in interaction networks has been discussed in [ 26 ].
Trees are often used to represent many biological structures such as glycans, RNA,
and phylogenies. Frequent subgraph mining is often used on all of these biological
structures in the context of different kinds of applications. In many cases, when
phylogenies are inferred with the use of different techniques, many different trees
are produced for a given set input genes. As a result, it becomes hard to assimilate
and understand the relationships between such trees. Typically, while the goal is to
understand evolutionary relationships between entities, the large number of possible
trees makes this very difficult. Therefore, it is often desirable to find the broader
patterns in these trees, a problem which is closely related to that of frequent subtree
mining. Such trees are also referred to as consensus trees or supertrees [ 96 , 120 ].
The common relations between the different trees provides an idea of the commonly
occurring patterns in the underlying data. For example, pairs (or groups)of nodes
which share the same ancestral node are useful in discovering common patterns in
multiple phylogenies. Methods for finding such frequent patterns are discussed in
[ 113 ]. Frequent subtree mining algorithms are useful for extending such methods to
more complex data [ 139 , 142 ], which are not necessarily represented as trees.
Frequent pattern mining is also used for mining different kinds of RNA data.
Multiple species often have common substructures due to common evolutionary ori-
gins [ 98 ]. These similarities are often expressed in the form of functional similarities
among RNAs. Therefore, it is useful to apply frequent pattern mining algorithms
for predictive mining. In particular, the discovery of common RNA substructures
has been used for prediction of RNA folding and processing mechanisms [ 71 , 112 ].
Note that such predictive learning methods are closely related to the classification
problem, which is commonly solved by frequent pattern mining in the context of
rule-based methods.
Frequent subtree mining methods are also very useful for mining glycan databases
[ 61 ]. These methods can be used to develop a classification method for glycan
databases by using pattern-based classification methods. As in all pattern based
classification methods, rules can be constructed in order to determine whether or not
a particular glycan belongs to a given class. In this case, the left hand sides of the
rules correspond to the subtrees in the glycan database.
Search WWH ::




Custom Search