Biology Reference
In-Depth Information
consumption of the tree-building phase via the ktuple alignment
distance calculation amongst pairs of sequences. The length of
the final alignment, and therefore by extension the lengths of the
input sequences, however, mostly impacts on the profile-profile
alignment phase. For every profile-profile alignment the MAC
algorithm constructs six L 1
L 2 matrices of double variables,
where L 1 and L 2 are the lengths of the two profiles to be aligned.
An alignment of two profiles, each 100 residues in length will
therefore require 8
¼
480,000 bytes. The max-
imum alignment length for a machine with 2 GB would therefore
be two profiles of
6
100
100
p
ð
688 positions in length
each, or equivalent ( see Note 1 ). The number of sequences affects
the resource requirements of the profile-profile alignment stage
only indirectly, in that it influences how much the lengths of the
intermediary profiles grow from the lengths of the individual
sequences. This growth is difficult to predict and depends, amongst
other factors, on the similarity of the sequences.
The time required for the profile-profile alignment stage is a
function of the number of sequences N , the lengths of the inter-
mediary profiles L and the shape of the guide-tree. An MSA of N
sequences requires N
2GB
6
8
ÞÞ
¼
6
;
1 profile-profile alignments; increasing the
number of sequences increases the number of profile-profile align-
ments linearly and therefore the alignment time will also grow in a
linear fashion, at least in simple cases. Increasing the lengths of the
input sequences clearly will increase the lengths of the intermediate
profiles. Building up the HMM matrices requires a multiple of
L 1
L 2 operations, so increasing the lengths of the sequences
will increase the matrix construction times in a quadratic fashion.
The guide-tree topology affects the profile-profile alignment times
in a subtle way. Roughly speaking, alignments generated using a
balanced tree will require less time than using an imbalanced
(chained) tree. For example, on a single core of a 64 bit 3.0 GHz
machine with 4 GB of RAM it takes just over 5 min to construct the
tree and align 50,000 zinc-finger sequences of average length 23
residues, It takes 25 min for 20,000 and 68 min for 50,000 sdr
sequences of average length 163 residues. It takes 106 min for
20,000 p450 sequences of average length 331 amino acids.
The current implementation of Clustal Omega is command-
line driven. There is as of yet no GUI and no interactive menu but it
is hoped to have one in place during 2013. A list of all permissible
command-line arguments is available by typing -h (--help) on
the command-line. There is an exhaustive help file explaining all
command-line arguments and their usage in detail. The help file
also contains many examples, elucidating the use of all individual
command-line arguments and a range of typical combinations
of command-line arguments.
Search WWH ::




Custom Search