Biology Reference
In-Depth Information
Given a recent gene duplication event, even if one copy has degen-
erated and lost its function, it will still have a gene-like structure. Gene
identification algorithms will classify it as a novel gene and automatic sys-
tems will attempt to determine its function. It is only by creating a par-
alogous phylogenetic tree of the gene family that the error can be
detected. Furthermore, since after gene duplication constraints on
the nonfunctional copy are virtually nonexistent, this copy will evolve
more rapidly than a functional one. It is therefore important to use
robust tree reconstruction methods. A simple NTP clustering method
will not give a useful result.
If, on the other hand, an attempt is made to determine the function
of a gene by analogy to genes in another organism without creating a
detailed phylogenetic tree beforehand, problems will also arise. One such
method of function attribution called COG (Cluster of Orthologous
Genes) 18 proceeds as follows. Gene X from species 1 and gene Y from
species 2 are considered orthologous if X has a higher similarity to Y than
to any other gene from species 2, and vice versa. In other words, the top-
scoring match in a BLAST search of X against known sequences in
species 2 returns Y at the top of the list, and a BLAST search of Y against
all known sequences from species 1 returns X at the top of the list. Once
genes are considered orthologous, their functions are deemed to be sim-
ilar and the unknown gene can be attributed its putative function.
This process is fundamentally flawed. Inferring orthology on the
sole criterion of highest degree of similarity is not possible. If two par-
alogous genes are present in an ancestor and a different one of the pair
is deleted in each of two descendants, it is likely that the COG method
will show the two surviving genes to be orthologs. Moreover, this is a
best-case scenario, in the sense that datasets are assumed to be com-
plete. If one of the genomes has not been completely sequenced, mat-
ters are still worse since the real ortholog may be present in the genome
but absent from the database. The only way to resolve the issue of
orthology is to create a phylogenetic tree of a gene family with as high
a number as possible of different species present. While two descen-
dants of a given ancestor may each contain one copy of a pair of
paralogs, it is likely that both paralogs will have survived in a third or
fourth species.
Search WWH ::




Custom Search