Biology Reference
In-Depth Information
Chapter 11
GramAlign: Fast alignment driven
by grammar-based phylogeny
David J. Russell
Abstract
Multiple sequence alignment involves identifying related subsequences among biological sequences. When
matches are found, the associated pieces are shifted so that when sequences are presented as successive
rows—one sequence per row—homologous residues line-up in columns. Exact alignment of more than a
few sequences is known to be computationally prohibitive. Thus many heuristic algorithms have been
developed to produce good alignments in an efficient amount of time by determining an order by which
pairs of sequences are progressively aligned and merged. GRAMALIGN is such a progressive alignment
algorithm that uses a grammar-based relative complexity distance metric to determine the alignment
order. This technique allows for a computationally efficient and scalable program useful for aligning both
large numbers of sequences and sets of long sequences quickly. The GRAMALIGN software is available at
http://bioinfo.unl.edu/gramalign.php for both source code download and a web-based alignment server.
Key words Multiple sequence alignment, Progressive alignment, Relative complexity measure,
Abstract grammar, GramAlign
1
Introduction
Generation of meaningful multiple sequence alignments (MSAs) of
biological sequences is a well-studied NP-complete problem, which
has significant implications for a wide spectrum of applications
[ 1 , 2 ]. In general, the challenge is aligning N sequences of varying
lengths by inserting gaps in the sequences so that in the end
all sequences have the same length. Of particular interest to
computational biology are DNA/RNA sequences and amino
acid sequences, which are comprised of nucleotide and amino acid
residues, respectively.
Advances in sequencing technology continue to provide vast
amounts of data in need of multiple alignment. In the case of
large sequencing projects, high numbers of fragments that lead
to longer contigs to be combined are generated with much less
time and money [ 3 ]. In addition, as more organisms' genomes are
Search WWH ::




Custom Search