Biology Reference
In-Depth Information
phylogenetic X -tree T with edge-weighting
and D will be “close”
in some specified way, and in this sense, say a tree with a “best fit” has been found. To
discuss “closeness,” it would be helpful to have a notion of distance that could be used
to compare dissimilaritymaps D coming from sequence data with induced treemetrics
D T
ω
for which D T
. Having discussed previously how to find distances between two points on a line
2 , or in three space
3 , and more generally in
n for n
1 by means
of a standard distance (via Pythagorean theorem analog), we might be motivated to
seek a way to regard both D and D T
, in the plane
R
R
R
R
as points, and then seek a distance between
those points that we would then try to minimize in order to find an edge-weighted
tree T that “best fits” the original data. Taking that step is our next focal point.
Since any matrix M
is really just a nicely ordered list of elements (i.e.,
listed by row and cross-listed by column), there is already an immediate way to
regard matrices as points (vectors): just start writing out the entries of the matrix
across row 1; when you get to the end, start adding the elements of row 2, and so
on, until all elements are listed. For example, if M is a 4
=[
m i , j ]
×
4 matrix, then there is a
correspondence of M and a vector with 16
=
4
·
4 entries:
M
=[
m i , j ]←→ (
m 1 , 1 ,
m 1 , 2 ,
m 1 , 3 ,
m 1 , 4 ,
m 2 , 1 ,
m 2 , 2 ,
m 2 , 3 ,
m 2 , 4 ,
m 3 , 1 ,...,
m 3 , 4 ,
m 4 , 1 ,...,
m 4 , 4 ).
(10.2)
Other orderings of the elements of M to create a 16-tuple could be taken, of course,
but as long as we are consistent this has no effect on our definition of distance.
Given the symmetry and existence of all zero-diagonal elements of any dissimi-
larity map D
=[
d
(
x
,
y
) ] x , y
X , there is a lot of redundancy in the entries; in fact, if
1
n 2
|
X
|=
n , then for the non-zero entries of D , there are at most m
=
2 (
n
)
distinct
entries d
in the n by n matrix D . Starting with D , it is straightforward to create a
streamlined vector
(
i
,
j
)
m carrying the same information as D , by choosing an order-
ing for reading off, and then listing in that order, the elements of D above the diagonal.
This is illustrated for one choice of such an ordering below, for D as in Eq. ( 10.1 ):
v D in
R
Associated vector
v D = (
2
.
3
,
1
.
6
,
3
.
6
,
1
.
5
,
2
.
9
,
2
.
8
)
←→ (
d T ,w (
a
,
b
),
d T ,w (
a
,
c
),
d T ,w (
a
,
d
),
d T ,w (
b
,
c
),
d T ,w (
b
,
d
),
d T ,w (
c
,
d
)).
Exercise 10.16. We have just seen how to take any n by n dissimilarity map D and,
having fixed an ordering of elements in D , turn it into a point (vector)
m ,
v D R
1
n 2
m , explain how to reverse this
for m
=
2 (
n
)
. Starting from any vector
v
in
R
process to obtain an n by n dissimilarity map D
. Apply this to the particular case
v
v = (
With this adjustment, it is now possible to represent any two n by n dissimilarity
maps D
1
.
1
,
0
,
3
.
3
,
2
.
5
,
1
.
4
,
5
)
.Is D v atreemetric?
n
2
D by vectors
m , where m
1
n 2
,
v D ,v D R
=
=
2 (
n
)
. One can use the
D ) =
d (
2
standard Euclidean distance measure
δ(
D
,
1 i < j n (
d
(
i
,
j
)
i
,
j
))
to measure a distance between D and D in
m . In particular, suppose D is a dis-
similarity map coming from evolutionary distances between elements in a set X (or
R
 
Search WWH ::




Custom Search