Methods for Discovery and Characterization of DNA Sequence Motifs - Bioinformatics: A Swiss Perspective

Biology Reference

In-Depth Information

The assumption underlying this formula is that all bases occur with an

equal probability of 0.25 in random sequences. This is the simplest back-

ground (null) model that can be used in this context. Markov chains,

which assume unequal probabilities for different bases and dependencies

between consecutive bases, are more realistic background models for

genomic DNA sequences. Algorithms have been presented for comput-

ing p i for such a model, as well as for consensus sequences including

ambiguous positions represented by IUPAC codes 18 and also for weight

matrices. 19

The Bayesian approach will be illustrated with the mixture model

used by the program MEME. Again, we assume the “arn” search mode.

To circumvent the mathematical difficulties of overlapping words statis-

tics, the input sequence set is usually evaluated as if it were to consist of

N nonoverlapping k- letter subsequences ( N is the search space defined

before). In the simplest case, the mixture model consists of two compo-

nents, a motif model given by a probability matrix and a background

model given by a base probability distribution. The probability of the

sequences given the model is then computed as

(

)

'

j

Prob (,

MM q

,

, )

=

qP x M

(

| ) (

+

1

-

qPx M

) ( |

).

(5)

0

j

In this notation, x denotes the total set of overlapping k -letter subse-

quences contained in the input sequences, and x j is an individual member

of it. P ( x j | M ) and P ( x j | M 0 ) are the probabilities of subsequence x j given

the motif model and the background model, respectively. q is the mixture

coefficient indicating the sequence-independent probability that a given

subsequence constitutes a motif. The models M and M 0 both define prob-

ability distributions over all k- letter words. The probabilities of sequence

x j under the motif and background models, respectively, are defined as

follows:

k

' 1

j

Px

(|

M

)

=

pix

(,

)

(6)

i

=

Bioinformatics: A Swiss Perspective

Search WWH ::

Custom Search

Home