Graph-based Approaches for Motif Discovery - Clustering Challenges in Biological Network

Biology Reference

In-Depth Information

In fact, the values of F i are further constrained by requiring a connection

to z when z is under consideration. That is, when considering a node z

V k

to eliminate, and calculating F i according to Equation 4.2 among all possible

u

∈

V i ,the F u of Equation 4.1 is instead computed as:

F u = w zu +

j = i,k

max

v∈V j

w uv

(4.4)

The value of C ∗ can be computed from any “good” alignment, such as the weight

of the clique imposed by the best overall star .

More complex graph-pruning techniques, based on three-way, as op-

posed to pairwise alignments, and divide-and-conquer graph decomposition ap-

proaches [47], are more effective in reducing graph size, but are also significantly

less efficient. The overall algorithm applies the various pruning techniques in or-

der of increasing complexity, and stops when the resulting graph has reached the

desired size, and can be modeled and successfully solved as a linear program.

Interestingly, the vast majority of motif finding instances are not only effec-

tively pruned by the optimality-preserving DEE methods, but also lead to linear

programs whose optimal solutions are integral, and do not require the invocation

of an ILP solver. These two conditions together guarantee optimality of the final

solution for the original SP-based motif finding problem, making the method a

polynomial-time algorithm for many practical instances of the problem.

4.3.2. Cost-Aggregating Formulation

In the context of a metric, such as the Hamming distance or 1/0 match/mismatch

similarity score, which, when imposed on pairs of substrings allows for only a

discrete set of possible values, Kingsford and coauthors [17] introduce an alterna-

tive integer linear program that better exploits the structure of the graph problem

by allowing aggregation of edges of the same weight. The previous ILP formula-

tion modeled edges in the graph G =( V,E ),where V = V 1 ∪

V N explicitly,

while they are only used to ensure that if two nodes u and v are chosen in the

optimal solution then the edge weight w uv is added to the cost of the clique. In

this formulation no edge variables exist; instead, in addition to the node variables

X u , there's a variable Y ujc for each node u , each graph part j such that u/

...

∪

V j ,

and each edge weight c . The intuition is that Y ujc is 1 if node u and some node

v ∈ V j are chosen such that the edge weight w uv = c .These Y variables model

groupings of the edges by weight into cost bins , as shown in Fig. 4.2, and selec-

tion of a cost bin requires the “payment” of the appropriate cost by the objective

function.

∈

Clustering Challenges in Biological Network

Search WWH ::

Custom Search

Home