Biomedical Engineering Reference
In-Depth Information
In the past decade, taxonomists have acquired access to the full genome
sequences of many different organisms. The genome size of most bacteria fall
in the range of 0.5 million base pairs up to about 10 million base pairs. This is
a tiny fraction of the size of the human genome, which is about 3 billion base
pairs in length. The organism with the largest genome is currently thought to
be Polychaos dubium (Class Amoebozoa, Chapter 22), with a genome length
of 670 billion base pairs. Because the bacterial genome is small, many of the
first genomes to be fully sequenced belong to bacterial species, and dozens of
full-length sequences are currently available to taxonomists.
It was hoped that comparisons between whole-genome sequences, on many
different bacterial species, would solve many of the mysteries and controver-
sies of bacterial taxonomy. These expectations were overly optimistic, due,
in no small part, to an analytic phenomenon now known as “non-phylogenetic
signal” [23]. When gene sequence data are analyzed, and two organisms share
the same sequence in a stretch of DNA, it can be very tempting to infer that
the two organisms belong to the same class (i.e., that they inherited the identi-
cal sequence from a common ancestor). This inference is not necessarily
correct. Because DNA mutations arise stochastically over time (i.e., at random
locations in the gene, and at random times), two organisms having different
ancestors may achieve the same sequence in a chosen stretch of DNA.
Conversely, if two organisms are closely related, there may be an identifiable
ancestor, with the same DNA sequence found in one of the two organisms,
that is not found in the other organism. When mathematical phylogeneticists
began modeling inferences drawn from analyses of genomic data, they
assumed that most class assignment errors would occur when the branches
between sister taxa were long (i.e., when a long time elapsed between evolu-
tionary divergences, allowing for many random substitutions in base pairs).
They called this phenomenon, wherein non-sister taxa were assigned the
same ancient ancestor class, “long branch attraction.” In practice, errors of
this type can occur whether the branches are long, or short, or in-between.
Over the years, the accepted usage of the term “long branch attraction” has
been extended to just about any error in phylogenetic grouping due to gene
similarities acquired through any mechanism other than inheritance from
a shared ancestor. This would include random mutations and adaptive
convergences [24]. The moral here is that powerful data-intensive analytic
techniques are sometimes more confusing than they are clarifying.
Though the field of computational taxonomy is flawed, readers must also
understand that the field of classical taxonomy suffers from a self-referential
paradox known as bootstrapping. Classical taxonomists need to have a classi-
fication of organisms before they can clearly see the relationships among
classes (that is the purpose of a classification). Furthermore, taxonomists
must see the relationships among classes before they can create the classifi-
cation. Basically, a classification cannot be built without the assistance of a
Search WWH ::




Custom Search