Applications of Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

of scalability is particularly important in the context of pattern mining of biological

sequences, because such sequences are typically very long [ 127 ]. As discussed ear-

lier, frequent pattern mining algorithms are used for biclustering of biological data

[ 93 ]. It should be pointed out that many of the sequential pattern mining algorithms,

which were originally designed for temporal data, can also be applied directly to

biological sequences.

In the context of graph structured biological data, a key problem is that of clus-

tering protein-protein interaction networks. Such networks can be rather large, and

the problem shares a number of similarities with that of community detection in

social networks. Since frequent pattern mining algorithms are closely related to that

of clustering, such methods can also be used for community detection in interac-

tion networks. Such a method for using frequent subgraph mining algorithms for

community detection in interaction networks has been discussed in [ 26 ].

Trees are often used to represent many biological structures such as glycans, RNA,

and phylogenies. Frequent subgraph mining is often used on all of these biological

structures in the context of different kinds of applications. In many cases, when

phylogenies are inferred with the use of different techniques, many different trees

are produced for a given set input genes. As a result, it becomes hard to assimilate

and understand the relationships between such trees. Typically, while the goal is to

understand evolutionary relationships between entities, the large number of possible

trees makes this very difficult. Therefore, it is often desirable to find the broader

patterns in these trees, a problem which is closely related to that of frequent subtree

mining. Such trees are also referred to as consensus trees or supertrees [ 96 , 120 ].

The common relations between the different trees provides an idea of the commonly

occurring patterns in the underlying data. For example, pairs (or groups)of nodes

which share the same ancestral node are useful in discovering common patterns in

multiple phylogenies. Methods for finding such frequent patterns are discussed in

[ 113 ]. Frequent subtree mining algorithms are useful for extending such methods to

more complex data [ 139 , 142 ], which are not necessarily represented as trees.

Frequent pattern mining is also used for mining different kinds of RNA data.

Multiple species often have common substructures due to common evolutionary ori-

gins [ 98 ]. These similarities are often expressed in the form of functional similarities

among RNAs. Therefore, it is useful to apply frequent pattern mining algorithms

for predictive mining. In particular, the discovery of common RNA substructures

has been used for prediction of RNA folding and processing mechanisms [ 71 , 112 ].

Note that such predictive learning methods are closely related to the classification

problem, which is commonly solved by frequent pattern mining in the context of

rule-based methods.

Frequent subtree mining methods are also very useful for mining glycan databases

[ 61 ]. These methods can be used to develop a classification method for glycan

databases by using pattern-based classification methods. As in all pattern based

classification methods, rules can be constructed in order to determine whether or not

a particular glycan belongs to a given class. In this case, the left hand sides of the

rules correspond to the subtrees in the glycan database.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home