Biology Reference
In-Depth Information
The following options provide a means for adjusting the way in
which GRAMALIGN creates the distance matrix which guides the
order in which sequences are progressively aligned.
3.5 Distance Matrix
Options
Option -C : Force GRAMALIGN to generate a complete distance
matrix prior to determining the alignment order. The default allows
GRAMALIGN to generate a partial distance matrix with a time
complexity on the order of N log( N ). Using this option will ensure
the most accurate grammar-based alignment order, but requires a
time complexity on the order of N 2 . In creating the partial distance
matrix, one initial column is completely filled in and divided into
two clusters—one with the smallest distances and the other with
the largest distances. Then each cluster is recursively processed,
whereby one sequence is compared to all others in the cluster, the
subset of which is further divided into two clusters, and so on.
The underlying basis for this to work is the transitivity of grammars;
if a sequence has a short grammar distance to two other sequences,
then those two sequences should likely have a short grammar
distance to each other. Suggestion: If you are using GRAMALIGN to
output a distance matrix—say for studying phylogeny—then you
should enable this option. Otherwise, you should not include this
option in order to greatly decrease computation time, especially for
many input sequences.
Option -M : Disable use of the merged amino acid alphabet.
As discussed in Subheading 2.4 , we developed a merged alphabet
whereby certain amino acid characters were found to have similar
row scores within the substitution matrices. We were able to reduce
the original 23 characters into a set of 11 characters. This ability is
particularly useful for the grammar-based distance calculation. This
option will disable using the merged alphabet. This option is
ignored for nucleotide sequences. Suggestion: Because this option
only affects the distance matrix for amino acids, you should not use
it unless you have a good reason to believe the grammar present in
the original alphabet is significant to the alignment order. This
option does not directly affect the pairwise alignment scoring.
Option -T <value>
: Specify the relative grammar-based similarity
threshold. Referring to the left half of Fig. 1 , all sequence pairs that
have a relative complexity measure below this threshold will be
grouped together prior to alignment. Sequences within each
group will be aligned to each other first. Then a consensus sequence
for each group will be aligned to the overall alignment ensemble.
Lower thresholds will force sequences to be more identical before
they will be grouped together. If this option is not specified, the
default value is 0.10. Suggestion: The default value is quite low,
thereby ensuring that sequences need to be very similar before
being grouped together. We have performed a series of classifica-
tion comparisons on known 16S Ribosomal RNA sequences, the
Search WWH ::




Custom Search