Biology Reference
In-Depth Information
{
A
,
G
,
C
,
T
}
; for example:
x
=
GATTACATTC
,
y
=
GCCAT ACTTC
.
y .
2. The Jukes-Cantor correction dJC to the Hamming distance is defined as
Compute the Hamming distance dH
(
x
,
y
)
between x
,
4 log 1
3 f
3
4
dJC
(
x
,
y
) =−
,
where f is the frequency of the different sites between two sequences. For exam-
ple, suppose we have two sequences x
,
y of length 10, for which the Hamming
distance between x
,
y is dH
(
x
,
y
) =
6. Then f
=
6
/
10
=
0
.
6.
a. Compute the Jukes-Cantor correction for the sequences x
,
y as in part (1) of
this exercise.
b. Observe that for any two sequences x
,
y of the same length, dJC
(
x
,
y
) =
dJC
(
y
,
x
)
(and the same is true for dH ). What can you say about dJC
(
x
,
y
)
(respectfully, dH
(
x
,
y
)
)if x and y are the same?
The Hamming distance represents an easy, but rather crude measure of difference
between sequences. For example, it fails to take into account the possibility that
sequences could have characters change over time, and then change back. Also, there
is no accounting for well-known biochemical phenomenon such as the fact that the
probability that one DNA character might change to another is not generally uniform,
but likely differs on the particular DNA bases themselves, or how they are arranged or
grouped along the sequence. So-called “evolutionary models” describe special sets of
additional assumptions that are made to account for these kinds of issues, andmethods
for determining the related evolutionary distances between any two given leaves x
,
y ,
that are represented by two aligned sequences (of DNA, RNA, proteins, etc.) s x ,
s y ,
generally depend on the choice of particular models of evolution. For example, the
model of evolution that eventually gives rise to the Jukes-Cantor correction in Exercise
10.8 above is obtained from the Jukes-Cantor model of evolution; see e.g., [ 4 ]or[ 1 ],
for more details on this model and [ 3 ] a derivation of this distance from the model by
an algebraic geometry approach.
The evolutionary model for Hamming distance is very simple; it assumes all bases
have equal probability of changing into one another, all sites in the string of DNA
are independent, and that the only change that has occurred over time is that which is
observed. As with the Hamming distance or the Jukes-Cantor model and Jukes-Cantor
correction, in general, evolutionary distances incorporate information differences or
distinctions between sequences s x and s y , and hence give rise to a so-called dissim-
ilarity map , to be defined precisely further below. Sequences which are the same
provide no new information; it's the sense in which they are dissimilar that shows
change and hence evolution.
Search WWH ::




Custom Search