Biology Reference
In-Depth Information
phylogenetic
X
-tree
T
with edge-weighting
and
D
will be “close”
in some specified way, and in this sense, say a tree with a “best fit” has been found. To
discuss “closeness,” it would be helpful to have a notion of distance that could be used
to compare dissimilaritymaps
D
coming from sequence data with induced treemetrics
D
T
,ω
ω
for which
D
T
,ω
. Having discussed previously how to find distances between two points on a line
2
, or in three space
3
, and more generally in
n
for
n
1 by means
of a standard distance (via Pythagorean theorem analog), we might be motivated to
seek a way to regard both
D
and
D
T
,ω
, in the plane
R
R
R
R
as points, and then seek a distance between
those points that we would then try to minimize in order to find an edge-weighted
tree
T
that “best fits” the original data. Taking that step is our next focal point.
Since any matrix
M
is really just a nicely ordered list of elements (i.e.,
listed by row and cross-listed by column), there is already an immediate way to
regard matrices as points (vectors): just start writing out the entries of the matrix
across row 1; when you get to the end, start adding the elements of row 2, and so
on, until all elements are listed. For example, if
M
is a 4
=[
m
i
,
j
]
×
4 matrix, then there is a
correspondence of
M
and a vector with 16
=
4
·
4 entries:
M
=[
m
i
,
j
]←→
(
m
1
,
1
,
m
1
,
2
,
m
1
,
3
,
m
1
,
4
,
m
2
,
1
,
m
2
,
2
,
m
2
,
3
,
m
2
,
4
,
m
3
,
1
,...,
m
3
,
4
,
m
4
,
1
,...,
m
4
,
4
).
(10.2)
Other orderings of the elements of
M
to create a 16-tuple could be taken, of course,
but as long as we are consistent this has no effect on our definition of distance.
Given the symmetry and existence of all zero-diagonal elements of any dissimi-
larity map
D
=[
d
(
x
,
y
)
]
x
,
y
∈
X
, there is a lot of redundancy in the entries; in fact, if
1
n
2
|
X
|=
n
, then for the non-zero entries of
D
, there are at most
m
=
2
(
−
n
)
distinct
entries
d
in the
n
by
n
matrix
D
. Starting with
D
, it is straightforward to create a
streamlined vector
(
i
,
j
)
m
carrying the same information as
D
, by choosing an order-
ing for reading off, and then listing in that order, the elements of
D
above the diagonal.
This is illustrated for one choice of such an ordering below, for
D
as in Eq. (
10.1
):
v
D
in
R
Associated vector
v
D
=
(
2
.
3
,
1
.
6
,
3
.
6
,
1
.
5
,
2
.
9
,
2
.
8
)
←→
(
d
T
,w
(
a
,
b
),
d
T
,w
(
a
,
c
),
d
T
,w
(
a
,
d
),
d
T
,w
(
b
,
c
),
d
T
,w
(
b
,
d
),
d
T
,w
(
c
,
d
)).
Exercise 10.16.
We have just seen how to take any
n
by
n
dissimilarity map
D
and,
having fixed an ordering of elements in
D
, turn it into a point (vector)
m
,
v
D
∈
R
1
n
2
m
, explain how to reverse this
for
m
=
2
(
−
n
)
. Starting from any vector
v
in
R
process to obtain an
n
by
n
dissimilarity map
D
. Apply this to the particular case
v
v
=
(
With this adjustment, it is now possible to represent any two
n
by
n
dissimilarity
maps
D
1
.
1
,
0
,
3
.
3
,
2
.
5
,
1
.
4
,
5
)
.Is
D
v
atreemetric?
n
2
D
by vectors
m
, where
m
1
n
2
,
v
D
,v
D
∈
R
=
=
2
(
−
n
)
. One can use the
D
)
=
d
(
2
standard Euclidean distance measure
δ(
D
,
1
i
<
j
n
(
d
(
i
,
j
)
−
i
,
j
))
to measure a distance between
D
and
D
in
m
. In particular, suppose
D
is a dis-
similarity map coming from evolutionary distances between elements in a set
X
(or
R
Search WWH ::
Custom Search