Biology Reference
In-Depth Information
task requires adding distantly related sequences to include large diversity,
and ensuring that the sequences share a minimal sequence identity to gener-
ate an accurate multiple sequence alignment.
Diverse algorithms have already been developed to construct high-quality
multiple alignments within a reasonable time limit that will allow high-
throughput processing of large sequence sets. Protein sequences should
be prioritized as they have been shown to be more useful than nucleotide
sequences for obtaining true tree topology. However, this priority can vary
with the sequence similarity of the datasets ( Russo et al., 1996 ). Modern
alignment programmes allowed us to combine the advantages of local
and global alignment algorithms and to incorporate an iterative refinement
strategy. Among these methods are DbClustal ( Thompson et al., 2000 ),
developed to align sets of sequences detected by a BlastP homology search,
T-Coffee ( Notredame et al., 2000 ), MAFFT ( Katoh et al., 2002 ), MUSCLE
( Edgar, 2004 ) and ProbCons ( Do et al., 2005 ). The reliability of specific
alignments is difficult to estimate and doubtful regions should preferably
be removed from subsequent phylogenetic analysis.
B. CONSTRUCTION OF A RELIABLE PHYLOGENETIC TREE
Although both gene trees and species trees can be generated, we will focus here
only on the reconstruction of gene trees since they are mostly used in function-
al inference. The commonly modular ligninolytic enzymes include modules
originating from different evolutionary histories. Therefore, phylogenetic his-
tory has to be constructed at the module level rather than fromwhole enzymes.
Module boundaries can easily be defined, based on the alignment, and a
phylogenetic analysis at the individual domain level is critical. Basically,
there are two method types of phylogenetic tree construction, either based
on distance (neighbour joining) or on character (maximum parsimony, maxi-
mum likelihood and Bayesian method; Brocchieri, 2001 ). As these reconstruc-
tion methods are different, it is wise to combine them to calculate the final trees
and provide bootstrap analysis. However, a fusion tree requires congruent
topologies, so the maximum likelihood tree is given preference in our analysis.
C. EVOLUTIONARY EVENT DETECTIONS
Genome evolution is shaped by various genetic events including gene dupli-
cation, gene loss, horizontal gene transfer and chromosomal rearrangements.
Gene duplication is of particular interest for functional inference, as
many scenarios can result from this major genetic event. Theoretically,
after duplication one of the copies is lost, or both duplicates undergo
Search WWH ::




Custom Search