Biology Reference
In-Depth Information
The second part measures how well leaves
X
outside
S
fit with respect to
all of the leaves in
S
. This is
(
)
2
TT d
+
-
Â
.
AR
RX
AX
EX
()
=
2
2
s
AS
Œ
AX
For each
X
, we will assume that we are free to choose its optimal distance
to
R
. (If this is not the case, it is the “fault” of some other part of
the tree, not the “fault” of
S
.)
T
RX
is then set so that
E
2
(
X
) is minimal
(which is equivalent of setting
T
RX
by LS for each
S
and each
X
).
Combining the two parts (thereby considering all leaves
X
outside
S
), we
get as an intermediate result the total sum of the weighted squared
errors:
Â
2
(
.
EE
¢ =+
Œ
EX
S
1
XS
The number of degrees of freedom in the computation of
E
′
S
is
k
=
Ê
ˆ
˜
+-- -++-
(
)
p
kn
(
k
)
(
231
k
)
(
n
k
)
Á
,
12
44443
4444
2
12
443
44
# branches to be set
#
distance relations
where
k
=
|
S
| is the number of leaves in the subtree
S
. Clearly,
k
>
1 for
this to make sense. Normalizing
E
′
S
by the degrees of freedom, we obtain
the final index
(
)
Â
2
E
+
--
E
()
X
E
p
¢
1
2
.
s
X
Œ
S
E
=
=
s
(
2
nk
4
)(
k
-
1
)
For each subtree, we evaluate
E
S
and choose the one with the small-
est
E
S
. This is our best-fitting subtree, and we will now assume that it can