Biology Reference
In-Depth Information
sequenced, approaches that require MSA of the same gene in
different organisms now find a more populated data set. In both
cases computational time in MSA is becoming an important issue
that needs to be addressed.
Given a scoring scheme to evaluate the fitness of an MSA,
calculating the best MSA is an NP-complete problem [ 1 ]. Variances
in scoring schemes, need for expert-hand analysis in most applica-
tions, and many-to-one mappings governing elements-to-func-
tionality (codon mapping and function) make MSA a more
challenging problem when considered from a biological context
as well [ 4 ].
Generally, three approaches are used to automate the genera-
tion of MSAs. The first uses a brute-force method of multidimen-
sional dynamic programming [ 5 ], which may result in a good
alignment but is generally computationally expensive and, there-
fore, usable only for a small number of sequences. Another method
uses a probabilistic approach where Hidden Markov Models
(HMMs) are approximated from unaligned sequences. The final
method, progressive alignment, is possibly the most commonly
used approach for obtaining MSAs [ 6 ].
A progressive alignment algorithm begins with an optimal
alignment of two of the N sequences. Then, each of the remaining
N sequences are aligned to the current MSA, either via a consensus
sequence or one of the sequences already in the MSA. Variations on
the progressive alignment method are presented throughout the
chapters in this text. In most cases, the algorithms attempt to
generate accurate alignments while minimizing computational
time or space. This chapter presents GRAMALIGN , a computationally
efficient progressive alignment method. In particular, the natural
grammar inherent in biological sequences is estimated to determine
the order in which sequences are progressively merged into the
ongoing MSA. The following sections focus on the somewhat
obscure idea of using a grammar-based relative complexity measure
to estimate the distance matrix which is used to guide the progres-
sive alignment process. This is followed by a section containing
several details regarding the usage of GRAMALIGN .
2 Distance Matrix Calculation
As shown in Fig. 1 , the first significant step in GRAMALIGN , and
progressive alignment in general, is to determine the order in
which sequences are added to the ongoing alignment. To do so it
is typical for an algorithm to generate a distance matrix whose
elements are inter-sequence distances which often reflect phylog-
eny. Unfortunately, many of these methods are known to be com-
putationally expensive and sometimes impose a paradoxical
situation in which an alignment is required prior to estimating the
Search WWH ::




Custom Search