Biology Reference
In-Depth Information
A
D
Seq1
G
TACGATG
TATAATA
TTA
T
A
1 2 3 4 5 6 7
Seq2
C
G
A
A
0 6 0 3 4 0 1
Seq3
C
AATG
GATACTG
TATGATG
TA
T
G
C
0 0 1 0 1 0 0
Seq4
GT
A
G
T
1 0 0 3 0 0 5
5 0 5 0 1 6 0
Seq5
A
C
G
Seq6
C
G
T
GTTG
T
E
1 2 3 4 5 6 7
A
10 70 10 40 50 10 20
B
T
AT
RA
TG
C
10 10 20 10 20 10 10
G
20 10 10 40 10 10 60
60 10 60 10 20 70 10
T
C
Seq1
G
T
A
C
G
A
T
T
G
A
F
1 2 3 4 5 6 7
Seq2
C
T
A
T
A
A
T
G
A
A
9 10 9 5 7 9 2
Seq3
C
T
T
A
T
A
A
G
T
G
A
9 9 2 9 2 9 9
Seq4
GT
G
A
T
A
C
T
G
A
C
G
2 9 9 5 9 9 9
Seq5
A
T
A
T
G
A
T
C
G
G
9 9 9 9 2 10 9
Seq6
C
G
T
A
T
G
T
T
T
G
T
Fig. 1. The Pribnow box, an early discovered DNA motif. The Pribnow box is
a promoter element of E . coli promoters, originally discovered by visual inspection
of six experimentally characterized promoter sequences. 1 (a) Input sequence set.
(b) Consensus sequence proposed in the original paper (R is the IUPAC
(International Union of Pure and Applied Chemistry) code for A or G). (c) Input
sequence set with highlighted motif instances. The set of motif instances is also
referred to as “motif annotation” in this chapter. (d) Base count frequency matrix.
(e) Base probability matrix estimated by adding one pseudocount to each element of
the base count frequency matrix (probabilities are given as percentages). (f) Weight
matrix. The position-specific weights of corresponding bases are summed up to com-
pute a score for a DNA sequence of the same length as the motif. The weights of the
matrix were computed as a natural log-likelihood ratio from the base probabilities,
multiplied by 10, and rounded to the nearest integer (see Chapter 2).
one would expect by chance. The DNA motif has become a central
concept of molecular biology, a research field which has its roots in
biology as well as in physics. In order to understand why motifs are
of interest, a brief look at the leading paradigms of both disciplines
will be useful.
Search WWH ::




Custom Search