Biology Reference
In-Depth Information
authors observe that the maximal clique in an N -partite graph is much smaller
than the size of the graph, and use a divide-and-conquer approach to solve the
problem. They apply a branch-and-bound algorithm to each subproblem, check-
ing along the way whether a string, close to the current set of possible motif in-
stances, exists.
The main idea in the clique-finding part of the algorithm is to subdivide the
problem into a number of independent N 0 -partite subgraphs with N 0 <N , iden-
tifying all the cliques in each of the smaller subgraphs, and combine the results.
Cliques, found in the smaller subgraphs, form vertices in a new graph, with two
cliques from the same subproblem corresponding to vertices in the same part of
the new graph. Two vertices in different parts are connected by and edge if their
corresponding cliques combine to form a larger clique in the original graph. The
authors show how to choose an appropriate subdivision parameter N 0 , so that the
number of cliques in each subproblem does not grow too high, and the second
graph problem based on sub-cliques is not more complex than the original one.
Deciding whether a string, close to a given set of strings of the same length,
exists is an NP-hard problem. Sze and coauthors [42] circumvent this difficulty
by noting that the close-string problem need not be solved every time a clique
of size j is expanded to size j +1, but rather when a clique potentially larger
than before is found. Weaker necessary conditions are used instead during other
clique expansion steps to prune impossible branches. These conditions constrain
existence of a close string based on pair-wise distances between strings in the set.
The authors further generalize their approach for problems such that not every
input sequence contains a motif instance, and demonstrate an optimized branch-
and-bound algorithm for clique finding that is faster than the standard methods
and is applicable to relatively large graphs.
4.6. Discussion
We have presented an overview of graph-based methods for motif discovery. The
authors of the respective methods, the LP-based methods and MotifCut in particu-
lar, thoroughly test the performance of the algorithms and show improved results
in biological and/or simulated data over a number of standard and well-regarded
approaches. The methods are based on motif models substantially different from
the common PSSM model, and allow for the relaxation of the positional indepen-
dence assumption within motifs. This line of research provides a successful and
powerful alternative to traditional stochastic and enumerative techniques.
Search WWH ::




Custom Search