GramAlign: Fast alignment driven by grammar-based phylogeny - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

sequenced, approaches that require MSA of the same gene in

different organisms now find a more populated data set. In both

cases computational time in MSA is becoming an important issue

that needs to be addressed.

Given a scoring scheme to evaluate the fitness of an MSA,

calculating the best MSA is an NP-complete problem [ 1 ]. Variances

in scoring schemes, need for expert-hand analysis in most applica-

tions, and many-to-one mappings governing elements-to-func-

tionality (codon mapping and function) make MSA a more

challenging problem when considered from a biological context

as well [ 4 ].

Generally, three approaches are used to automate the genera-

tion of MSAs. The first uses a brute-force method of multidimen-

sional dynamic programming [ 5 ], which may result in a good

alignment but is generally computationally expensive and, there-

fore, usable only for a small number of sequences. Another method

uses a probabilistic approach where Hidden Markov Models

(HMMs) are approximated from unaligned sequences. The final

method, progressive alignment, is possibly the most commonly

used approach for obtaining MSAs [ 6 ].

A progressive alignment algorithm begins with an optimal

alignment of two of the N sequences. Then, each of the remaining

N sequences are aligned to the current MSA, either via a consensus

sequence or one of the sequences already in the MSA. Variations on

the progressive alignment method are presented throughout the

chapters in this text. In most cases, the algorithms attempt to

generate accurate alignments while minimizing computational

time or space. This chapter presents GRAMALIGN , a computationally

efficient progressive alignment method. In particular, the natural

grammar inherent in biological sequences is estimated to determine

the order in which sequences are progressively merged into the

ongoing MSA. The following sections focus on the somewhat

obscure idea of using a grammar-based relative complexity measure

to estimate the distance matrix which is used to guide the progres-

sive alignment process. This is followed by a section containing

several details regarding the usage of GRAMALIGN .

2 Distance Matrix Calculation

As shown in Fig. 1 , the first significant step in GRAMALIGN , and

progressive alignment in general, is to determine the order in

which sequences are added to the ongoing alignment. To do so it

is typical for an algorithm to generate a distance matrix whose

elements are inter-sequence distances which often reflect phylog-

eny. Unfortunately, many of these methods are known to be com-

putationally expensive and sometimes impose a paradoxical

situation in which an alignment is required prior to estimating the

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home