Biology Reference
In-Depth Information
Chapter 4
Graph-based Approaches for Motif Discovery
Elena Zaslavsky
Department of Computer Science
Princeton University, Princeton, NJ 08544, USA
elenaz@cs.princeton.edu
Sequence motif finding is a very important and long-studied problem in com-
putational molecular biology. While various motif representations and discov-
ery methods exist, a recent development of graph-based algorithms has allowed
practical concerns, such as positional correlations within motifs, to be taken into
account. This survey provides an overview of the multi-partite graph formulation
of motif finding, and focuses on algorithmic aspects of various motif discovery
methodologies.
Motif finding has been recast as a number of different graph substructure
identification problems. First we review a formulation as a maximum-weight
clique finding problem, and examine two different integer linear programs to
model it. The motif finding algorithms use graph pruning techniques and a cut-
ting planes approach in conjunction with linear programming relaxations. Sec-
ondly, we discuss a formulation of motif discovery as that of maximum density
subgraph finding, and review a maximum flow based algorithm in an appropri-
ately augmented flow network. Finally, we mention the 'subtle' motifs formu-
lation, and define its corresponding graph problem of maximal clique identifi-
cation. We discuss two different approaches to tackle this problem, one based
on winnowing spurious edges and the other on divide-and-conquer sub-clique
finding.
4.1. Introduction
With the advent of the post-genomic era when scores of genomes have been se-
quenced and many protein structures solved, functionally important elements can
be mined from vast amounts of genomic data through identification of common
patterns or motifs . Existence of such shared sequence or structure elements allows
one to draw links between various genes or proteins in an organism, creating a net-
work of biological associations. The ability to analyze such networks has shown
83
Search WWH ::




Custom Search