Biology Reference
In-Depth Information
4.3. Linear Programming-based Algorithms
Utilizing a linear programming (LP) framework for motif finding [17, 47] allows
for a very flexible formulation of the problem that is applicable to a number of
variants. This includes the standard motif finding formulation as well as the phy-
logenetic footprinting and subtle motifs problems. Underlying the approach, motif
discovery is cast as the problem of finding the best gapless local multiple sequence
alignment using the sum-of-pairs (SP) scoring scheme, which seeks a l -long sub-
string from every input sequence such that the sum of all their pair-wise simi-
larities is maximized. In the graph G =( V,E ) the equivalent is a selection of
vertices, one from each graph part, such that the weight of the induced clique is
maximized. We refer to this problem as finding the highest weight N -clique.
4.3.1. Edge-Modeling Formulation
For a graph G =( V,E ),where V = V 1
...
V N and E =
{
( u,v ): u
V i ,v∈ V j ,i = j}
, Zaslavsky and Singh [47] introduce a formulation that explic-
itly models both vertices and edges in the graph. They define a binary decision
variable X u for every vertex u , and a binary decision variable Y uv for every edge
( u,v ). Setting X u to 1 corresponds to selecting vertex u for the N -clique and
thus choosing the sequence position corresponding to u in the alignment. Setting
variable Y uv to 1 corresponds to choosing the edge ( u,v ) for the N -clique.
The following integer linear program (ILP) models the motif finding problem
formulated above:
( u,v ) ∈E w uv ·
Maximize
Y uv
subject to
u∈V j X u =1
for 1
j
N
u∈V j Y uv = X v
j
N,v
V
\
V j
for 1
X u ,Y uv ∈{
0 , 1
}
for u
V, ( u,v )
E
The first set of constraints ensures that exactly one vertex is picked from every
graph part, corresponding to one position being chosen from every input sequence.
The second set of constraints relates vertex variables to edge variables, allowing
the objective function to be expressed in terms of finding a maximum edge-weight
clique. An edge is chosen only if it connects two chosen vertices.
ILP itself is NP-hard, but replacing the integrality constraints on the X and
Y variables with 0
1 allows for a polynomial-time heuristic for
the problem. It is important to note that should a linear programming solution
happen to be integral, it is guaranteed to be optimal for the original ILP and motif
X u ,Y uv
Search WWH ::




Custom Search