Graph-based Approaches for Motif Discovery - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

relating average degree of vertices and current graph density, then one of the sub-

graphs induced by the minimum cut is guaranteed to have higher density than the

original graph. A new flow network is constructed from the subgraph with higher

density and the process is repeated, converging to the maximum density subgraph

after a polynomial number of iterations.

To achieve a speed-up over the push/relabel algorithm for the maximum flow

computations, Fratkin and colleagues [8] propose a heuristic that looks for the

maximum density subgraph in local, high average degree neighborhoods around

vertices. Finally, the authors use a refinement step to eliminate the high number of

false positives in the obtained set of candidate motif instances by greedily finding

a subset of minimum entropy. The overall approach is interesting and novel in that

it poses motif finding as an optimization problem that is solved by a polynomial-

time algorithm and seems to scale well with longer input sequences.

4.5. Subtle Motif Algorithms

The subtle motif finding formulation of Pevzner and Sze [35] has generated a host

of algorithms to address the (15, 4) challenge problem issued in the paper, which

was to find implanted motifs of length 15 with 4 random mutations, and many of

the subsequent algorithms solve this problem successfully (see [36] for compara-

tive analysis). Additionally, they also address more difficult variants, in which a

higher percentage of motif positions is corrupted and the motif is implanted into

longer background sequences. Here we discuss a few of these methods, which

focus on looking for structure in the subtle motif finding graphs G =( V,E ).

Note that these differ from the standard motif finding graphs in that they aren't

complete, as some pairs of substrings are not close enough in terms of their Ham-

ming distances and can't simultaneously be instances of the same implanted motif.

Finding subtle motifs reduces to identifying cliques of size N in these N -partite

graphs.

First, we note that the linear programming formulations and graph pruning

approaches above are also applicable to the subtle motifs problem, and indeed

solve many variants successfully with excellent performance statistics in iden-

tifying motifs. The LP formulations remain largely unchanged except that the

variables corresponding to non-existent edges are removed, and summations in

the corresponding constraints are taken only over existing edges.

Search WWH ::

Custom Search

Home