Biology Reference
In-Depth Information
Individual sequences and small profiles are most vulnerable to
misalignment. On the other hand large profiles have already built
up more pseudo-counts and are more likely to resemble the final
alignment. Clustal Omega therefore up-weights the pseudo-count
transfer for single sequences and intermediate alignments with
small numbers of sequences and reduces the transfer to larger
intermediate alignments. Pseudo-count transfer to alignments
larger than, say, 10 is negligible. Using EPA increases the
profile-profile alignment time approximately threefold. Firstly,
each of the two profiles is aligned to the external HMM, and finally
the pre-aligned profiles have to be aligned themselves. As of yet
Clustal Omega only accepts one external profile-HMM. This is not
a problem for a HMM that extends over the entire length of the
final alignment. However, if the unaligned sequences extend over
multiple domains, and HMMs are only known for the individual
domains, then only one of these HMMs can be submitted. This is a
current limitation, which will hopefully be rectified in a future
version of Clustal Omega.
To use a HMM in conjunction with unaligned sequences, first
determine the appropriate HMM. For example, a search of the
PfamWeb site for “globin” finds PF00042 which is the Pfam family
for globins. Go to “Curation & model” of the PF00042 page and
download the HMM called PF00042.hmm. Then type:
¼
$ clustalo -i globin.fa --hmm-in
PF00042.hmm
A useful means of refining an alignment is by “iterating” the
alignment process. The guide-tree used for performing the initial
alignment is based on pairwise distances between unaligned
sequences. This may not be a reliable distance measure and the
guide-tree derived from these distances may not be ideal. A better
distance measure between sequences is one based on a full multiple
alignment [ 9 ]. In Clustal Omega these distances are calculated
from the initial alignment and are used to calculate a new, hopefully
better, guide-tree. Any subsequent guide-tree refinement will again
use the full alignment distances between sequences. These distances
are expected to become more accurate as the alignments they are
based upon become more accurate, leading in turn to better guide-
trees and by extension to better alignments.
EPA required an externally computed HMM. This can be used
to create a simple iteration scheme. In a first step unaligned
sequences are aligned without any external profile. This produces
an alignment which can be internally converted into a HMM and
used in a second round of aligning in the same way as an EPA.
Both of these steps: the initial unassisted alignment and the second
alignment using a HMM and a new guide-tree derived from the first
alignment, can be performedwith one invocation of Clustal Omega:
3.3
Iteration
¼
$ clustalo -i globin.fa --iter
1
Search WWH ::




Custom Search