Biology Reference
In-Depth Information
time algorithm for the problem, the authors find that while it is weaker than the
alternative LP formulation for motif finding [47], an exponentially-sized class of
constraints can be added to make the two LP formulations equivalent. However,
they then show that it is not necessary to explicitly add all these constraints by
giving a separation algorithm, based on the Max-flow min-cut theorem. In the
presence of a separation algorithm the ellipsoid method [9] can be used to find a
solution to the tightened LP in time polynomial in the number of variables of the
mathematical programming problem. In practice, the authors use a polynomially-
sized subset of constraints in a cutting-planes approach to find solutions to the LP
problem. Finally, it is interesting to note that the LP relaxations often have integral
optimal solutions, making solving the LP sufficient in many cases for solving
the original ILP, similar to what was observed above with the edge-modeling LP
formulation.
4.4. Maximum Density Subgraph-based Algorithm
Looking for sequence motifs by identifying maximum density subgraphs (MDS)
was suggested by Matsuda [28], as applied to the problem of detection of con-
served domains in protein sequences. Fratkin and coauthors [8] employ maximum
density subgraphs at the core of their MotifCut algorithm when searching for DNA
motifs using motif finding graphs. Given the construction of such graphs in that
the edges with higher weights correspond to pairs of substrings that are more sim-
ilar and more likely to constitute motif instances, graph structures with greatest
edge-weight density are likely to represent motif occurrences. Note that the Mo-
tifCut algorithm operates on motif finding graphs such that all pairs of vertices are
connected by edges.
While the notion of subgraph density can be characterized in a number of
ways, the authors use the most common tractable definition, and define density for
agraph G =( V,E ) as
||
is the total weight of all the edges. They also provide some empirical evidence that
their choice is a good one for biological data by evaluating which definitions lead
to densities of motifs being most different from densities of subgraphs in the back-
ground. Building on this definition, [8] formulate motif finding as the problem of
search for the maximum density subgraph G = argmax G ⊆G (
||
E
||
/
||
V
||
,where
||
V
||
is the number of vertices and
||
E
).
The solution to the maximum density subgraph problem is based on the Max-
flow min-cut theorem. The graph is augmented by a source and sink nodes to
become a flow network, with appropriate capacities connecting the source/sink to
the rest of the graph, and the maximum flow is computed while also finding the
minimum cut.
||
E ||
/
||
V ||
If the maximum flow in the network satisfies certain conditions
Search WWH ::




Custom Search