Biology Reference
In-Depth Information
as statistical error expectation values, raw similarity scores, and percent-
age of amino acid identities are often used to define the family cut-off
thresholds that affect the granularity (tightness) of the resulting clusters.
A range of cut-offs may be used to build a hierarchy of clusters so as to
delegate the problem of choosing a biologically relevant cut-off to fol-
low-up examinations. In practice, the BLASTClust utility from the
BLAST package offers a straightforward solution through the imple-
mentation of single-linkage (nearest-neighbor) clustering to the BLAST
scores from all-against-all sequence comparisons. More complex, derived
comparison scores and different clustering techniques have been applied
in projects like SYSTERS 19 and CluSTr, 20 and alternative clustering algo-
rithms are available through specialized software like the OC program 21
or more general packages like MATLAB ® . An important distinction
between domain-based and unsupervised clustering techniques of gene
family definition is that the former allow one gene to be classified into
several groups, while the latter usually restrict gene membership to only
one group.
From a comparative genomics perspective, gene family novelties and
extinctions as well as significant size differences resulting from expansions
or contractions of gene copy numbers can point to interesting lineage-
specific biology. Indeed, comparative analyses of olfactory and other
chemosensory receptors in humans and mice revealed large variations in
gene family sizes that might be associated with the different lifestyles. 22
Changes in gene family sizes over evolutionary time are affected by the
essentially random processes of gene duplication and loss, e.g. through
pseudogenization. The development of statistical models that consider
phylogenetic relationships and population genetics is essential to confi-
dently identify any deviations from a stochastic background. Analysis of
gene family dynamics in terms of duplication and pseudogenization fre-
quencies show that functional characteristics, such as essentiality, can be
predictive of evolutionary features and vice versa. 23 Gene families con-
taining at least one essential gene are subject to stronger purifying selec-
tion than those without any essential genes; they survive longer and
consequently may become more divergent in terms of sequence and
upstream regulatory regions. Families without essential genes appear
more dynamic, with higher rates of both fixation and pseudogenization
Search WWH ::




Custom Search