Biology Reference
In-Depth Information
Chapter 6
Clustal Omega, Accurate Alignment of Very Large
Numbers of Sequences
Fabian Sievers and Desmond G. Higgins
Abstract
Clustal Omega is a completely rewritten and revised version of the widely used Clustal series of programs for
multiple sequence alignment. It can deal with very large numbers (many tens of thousands) of DNA/RNA
or protein sequences due to its use of the mBED algorithm for calculating guide trees. This algorithm
allows very large alignment problems to be tackled very quickly, even on personal computers. The accuracy
of the program has been considerably improved over earlier Clustal programs, through the use of the
HHalign method for aligning profile hidden Markov models. The program currently is used from the
command line or can be run on line.
Key words Multiple sequence alignment, Progressive alignment, Protein sequences, Clustal
1
Introduction
Clustal Omega [ 1 ] is a package for performing fast and accurate
multiple sequence alignments (MSAs) of potentially large numbers
of protein or DNA/RNA sequences. It is the latest version of the
popular and widely used Clustal MSA package [ 2 , 3 ]. Clustal
Omega retains the basic progressive alignment MSA approach of
the older ClustalX and ClustalW implementations, where the order
of alignments is determined by a so called guide-tree, which in turn
is constructed from pairwise distances amongst the sequences.
The main improvements over ClustalW2 are (1) use of the mBed
algorithm for creating guide trees of any size [ 4 ] and (2) a very
accurate profile-profile aligner, based on the HHalign package [ 5 ].
As a first step a traditional progressive aligner calculates all
N ( N
1)/2 pairwise distances amongst all N input sequences.
This may be computationally too demanding for much more than
10,000 sequences. The mBed algorithm, as implemented in Clustal
Omega, reduces the time and memory complexity for guide tree
calculation from O ( N 2 )to O ( N (log( N )) 2 ). This is achieved by
calculating the pairwise distances of all N sequences with respect
Search WWH ::




Custom Search