Biology Reference
In-Depth Information
4.5.1. Winnowing Techniques
In their paper proposing the subtle motifs problem, Pevzner and Sze [35] make the
observation that the vast majority of edges in G appear due to spurious similari-
ties between pairs of substrings in the input that are not instances of the implanted
motif. They suggest a winnowing approach, which attempts to identify these spu-
rious edges and remove them from the graph, so that the remaining edges are
signal edges, connecting the instances of the motif.
The Winnower algorithm utilizes the notion of an extendable clique. This
notion captures the ability to grow a clique of size smaller than N ,removing
spurious edges along the way.
More formally, let vertex u be a neighbor of a
clique C =
is also a clique in the graph. A clique
with vertices in graph parts V 1 ,...,V k wlog is called extendable if it has at least
one neighbor vertex in every other part of the graph. Certainly, edges that are not
part of any extendable cliques can be removed, and these are the edges defined as
spurious .For k =1,avertex u is a neighbor of clique C =
{
v 1 ,...,v k }
if
{
v 1 ,...,v k ,u
}
of size one, if u
and v are connected by an edge in the graph. The winnowing condition amounts
to removing vertices that do not have a connection to every graph part. For k =2,
avertex u is a neighbor of clique C =
{
v
}
form a triangle in the
graph. The winnowing strategy for this case deletes edges that do not form three-
way connections to every other part of the graph. For k
{
v,w
}
if
{
u,v,w
}
3, the authors make a
stronger observation by noting that not only does an edge that is part of a maximal
N -clique have to belong to one extendable clique, but rather it has to belong to
ā‰„
at least Nāˆ’ 2
k
2 extendable cliques of size k .For k =3this condition removed
edges that belong to fewer than N
āˆ’
2 extendable cliques.
The general complexity of the winnowing approach for cliques of size k is
O ( n k +1 ),where n is the total number of vertices in the motif finding graph. The
authors demonstrate in a probabilistic analysis, based on the probability of an edge
existing between two random vertices, that the expected runtime of the algorithm
is of lower complexity, making it practical for many instances of the problem.
In follow-up work, Liang and coauthors [23] improve sensitivity of the
method, enabling it to detect weaker motifs present in longer input sequences.
āˆ’
4.5.2. Clique Finding with Consensus Constraint
Sze and colleagues [42] proposed to formulate motif finding as the problem of
looking for large cliques in motif finding graphs with the additional constraint
that there must exist a string s that is close to every motif instance. The existence
of such a string ensures that motif instances are derived from a single pattern. The
Search WWH ::




Custom Search