Biology Reference
In-Depth Information
The base probabilities are often estimated by adding one pseudo-
count to the observed base count of base b at position i :
cib
(, )
+
1
.
pib
(, )
=
(1)
T
Â
4
+
ci b
bA
(,
¢
)
¢=
The elements of a weight matrix may be computed from a probability
matrix as a log-likelihood ratio:
pib
pb
(, )
()
.
wib
(, )
= ln
(2)
0
Here, w ( i,b ) and p ( i,b ) are the weight and probability of base b at
position i of the motif, respectively, and p 0 ( b ) is the background prob-
ability of base b . Note, however, that log-likelihood ratios are not uni-
versally used in the field. One of the best known motif search
programs, MATINSPECTOR, uses a different way of scoring tran-
scription factor binding sites with a base probability matrix. 13
Like a consensus sequence, a weight matrix in conjunction with a
cut-off value defines a subset of k -letter words which qualify
as motif instances. However, the power of the matrix representation
lies in the quantitative evaluation of candidate k -letter words, which
can be exploited, for instance, for transcription factor binding
site affinity prediction. On the other hand, a base probability matrix
defines a motif in a probabilistic manner, a property which is exploited
by probabilistic motif optimization methods such as expectation
maximization.
The goal of the motif search problem is to find the best motif for a
given set of input sequences. The complete statement of the problem
requires the specification of a quality criterion (objective function) related
to overrepresentation. A specific motif discovery method is thus charac-
terized by three components: (a) the motif descriptor, (b) the objective
function, and (c) the algorithm to scan the search space of possible motifs.
Search WWH ::




Custom Search