Biology Reference
In-Depth Information
In fact, the values of F i are further constrained by requiring a connection
to z when z is under consideration. That is, when considering a node z
V k
to eliminate, and calculating F i according to Equation 4.2 among all possible
u
V i ,the F u of Equation 4.1 is instead computed as:
F u = w zu +
j = i,k
max
v∈V j
w uv
(4.4)
The value of C can be computed from any “good” alignment, such as the weight
of the clique imposed by the best overall star .
More complex graph-pruning techniques, based on three-way, as op-
posed to pairwise alignments, and divide-and-conquer graph decomposition ap-
proaches [47], are more effective in reducing graph size, but are also significantly
less efficient. The overall algorithm applies the various pruning techniques in or-
der of increasing complexity, and stops when the resulting graph has reached the
desired size, and can be modeled and successfully solved as a linear program.
Interestingly, the vast majority of motif finding instances are not only effec-
tively pruned by the optimality-preserving DEE methods, but also lead to linear
programs whose optimal solutions are integral, and do not require the invocation
of an ILP solver. These two conditions together guarantee optimality of the final
solution for the original SP-based motif finding problem, making the method a
polynomial-time algorithm for many practical instances of the problem.
4.3.2. Cost-Aggregating Formulation
In the context of a metric, such as the Hamming distance or 1/0 match/mismatch
similarity score, which, when imposed on pairs of substrings allows for only a
discrete set of possible values, Kingsford and coauthors [17] introduce an alterna-
tive integer linear program that better exploits the structure of the graph problem
by allowing aggregation of edges of the same weight. The previous ILP formula-
tion modeled edges in the graph G =( V,E ),where V = V 1
V N explicitly,
while they are only used to ensure that if two nodes u and v are chosen in the
optimal solution then the edge weight w uv is added to the cost of the clique. In
this formulation no edge variables exist; instead, in addition to the node variables
X u , there's a variable Y ujc for each node u , each graph part j such that u/
...
V j ,
and each edge weight c . The intuition is that Y ujc is 1 if node u and some node
v ∈ V j are chosen such that the edge weight w uv = c .These Y variables model
groupings of the edges by weight into cost bins , as shown in Fig. 4.2, and selec-
tion of a cost bin requires the “payment” of the appropriate cost by the objective
function.
Search WWH ::




Custom Search