Biology Reference
In-Depth Information
Chapter 1
Methods for Discovery and
Characterization of DNA
Sequence Motifs
Philipp Bucher
1. Introduction
Motif discovery is considered to be an important problem in bioinfor-
matics, as documented by a large number of papers. It is also believed to
be a hard and still partly unsolved problem, despite considerable efforts
by many distinguished researchers. Finally, it is an old problem with a
long tradition in bioinformatics. An early example is the discovery of the
Pribnow box in E. coli promoters. 1 Although this motif was found by
visual inspection of DNA sequences, it was probably instrumental in
defining the paradigm that subsequently led to the formalization of the
motif discovery problem in its modern form and to the development of
algorithms to solve it.
A DNA motif, such as the Pribnow box shown in Fig. 1, is
defined by a set of short subsequences from longer sequences with
high similarity. The subsequences share some common features,
which typically are described by a consensus sequence or weight
matrix. A motif must be overrepresented in a biologically defined col-
lection of genome sequences, i.e. it must occur more frequently than
3
Search WWH ::




Custom Search