Databases Reference
In-Depth Information
The dynamic programming approach is commonly used for sequence alignments.
Among many available analysis packages, BLAST (Basic Local Alignment Search Tool)
is one of the most popular tools in biosequence analysis.
Hidden Markov Model for Biological Sequence Analysis
Given a biological sequence, biologists would like to analyze what that sequence repre-
sents. To represent the structure or statistical regularities of sequence classes, biologists
construct various probabilistic models such as Markov chains and hidden Markov
models . In both models, the probability of a state depends only on that of the previous
state; therefore, they are particularly useful for the analysis of biological sequence data.
The most common methods for constructing hidden Markov models are the forward
algorithm, the Viterbi algorithm, and the Baum-Welch algorithm. Given a sequence of
symbols, x , the forward algorithm finds the probability of obtaining x in the model; the
Viterbi algorithm finds the most probable path (corresponding to x ) through the model,
whereas the Baum-Welch algorithm learns or adjusts the model parameters so as to best
explain a set of training sequences.
13.1.2 Mining Graphs and Networks
Graphs represents a more general class of structures than sets, sequences, lattices, and
trees. There is a broad range of graph applications on the Web and in social networks,
information networks, biological networks, bioinformatics, chemical informatics, com-
puter vision, and multimedia and text retrieval. Hence, graph and network mining
have become increasingly important and heavily researched. We overview the follow-
ing major themes: (1) graph pattern mining; (2) statistical modeling of networks;
(3) data cleaning, integration, and validation by network analysis; (4) clustering and
classification of graphs and homogeneous networks; (5) clustering, ranking, and classifi-
cation of heterogeneous networks; (6) role discovery and link prediction in information
networks; (7) similarity search and OLAP in information networks; and (8) evolution
of information networks.
Graph Pattern Mining
Graph pattern mining is the mining of frequent subgraphs (also called (sub)graph pat-
terns ) in one or a set of graphs. Methods for mining graph patterns can be categorized
into Apriori-based and pattern growth-based approaches. Alternatively, we can mine
the set of closed graphs where a graph g is closed if there exists no proper supergraph
g 0 that carries the same support count as g . Moreover, there are many variant graph
patterns , including approximate frequent graphs, coherent graphs, and dense graphs.
User-specified constraints can be pushed deep into the graph pattern mining process to
improve mining efficiency.
Graph pattern mining has many interesting applications. For example, it can be
used to generate compact and effective graph index structures based on the concept of
 
Search WWH ::




Custom Search