Biology Reference
In-Depth Information
sequence comparisons in complete genomes, and (b) using a phyloge-
netic framework to construct trees of all homologous genes and then
reconcile these with the species phylogeny.
Identifying orthologs from gene sequence comparisons usually relies
on the clustering of genes around reciprocal best hits (RBHs, also known
as SymBets or best-best hits; denoting genes most similar to each other
in between-genome comparisons), first introduced by the database of
Clusters of Orthologous Groups (COGs). 27 Triggered by the earlier
availability of much smaller and simpler bacterial genomes, the database
has quickly gained wide recognition and has been extended to eukaryotic
(KOGs) and archaeal (arCOGs) genomes. 28 The concept of RBHs can be
interpreted in phylogenetic terms as genes from different species with the
shortest connecting path over the distance-based tree. The identification
of RBHs is currently widely adopted in comparative genomics for its sim-
plicity and feasibility of application to large-scale data; however, RBH
analysis in its simplest form using BLAST suffers from inaccuracies of
sequence distance estimates and ignores many gene duplications after the
speciation that are, in fact, co-orthologs. The inclusion of such co-
orthologs can be achieved through a further step to identify genes that
are more similar to the members of the RBH set in intragenome com-
parisons than to any other gene in the other genomes, as adopted, for
example, in InParanoid/MultiParanoid, 29,30 OrthoDB, 31 and eggnog. 32
Notable alternative methodologies include a probabilistic clustering
approach of OrthoMCL, 33 and the use of additional gene orthology evi-
dence from the consideration of orthologous chromosomal regions (syn-
teny) in BUS 34 and SYNERGY, 35 which although substantial in yeast and
slowly evolving vertebrates is not very helpful, for example, in distantly
related insect species.
The phylogenetic framework approach takes advantage of the well-
quantified models of amino acid substitutions in the conserved cores of
globular proteins to estimate evolutionary distances among genes, and to
reconcile gene trees with the species phylogeny. A notable example of this
tree-based approach to delineate orthologous genes is TreeFam, 36,37
adopted recently by Ensembl. 38 There are also several hybrid methods,
relying on phylogenetic methods to estimate pairwise evolutionary gene
distances followed by their clustering employing methods similar to those
Search WWH ::




Custom Search