Objective Functions - Multiple Sequence Alignment Methods

Biology Reference

In-Depth Information

l Consistency-Based Methods: Constructs a database of local

and global alignments to find a final alignment (e.g. T-Coffee,

DiAlign, ProbCons)

Structure-Based Methods: Utilizes external knowledge such as

protein structure (e.g. 3D-Coffee)

l

In heuristic implementation of MSA algorithms, the most

widely used approach is the SP scoring scheme. The main idea is

to progressively combine induced pairwise alignments to obtain the

final MSA. Here, we briefly review a few such algorithms and

indicate specific improvements the programs offer.

ClustalW is one of the earliest multiple sequence alignment pro-

grams and it is still widely used. It has three main steps. First, it

starts by pairwise alignment of all pairs of sequences via global

dynamic programming with a plausible scoring function. Second,

it uses the pairwise alignment scores to build a phylogenetic tree

employing the Neighbor-Joining algorithm. Finally, the sequences

are aligned starting from leaves to the root and, as a result, the MSA

of all sequences are obtained [ 10 ].

In the tree reversal process, highest scoring pairs are progres-

sively combined. On the other hand, as new sequences are intro-

duced to the MSA, initial alignment structures propagate. This

introduces a twofold problem: the greediness of the progressive

alignment approach calls for the possibility of staying at a local

minimum as far as the overall MSA score is concerned and errors

in the early alignments cannot be rectified. Therefore, CLUSTALW

designates weights for sequences to overcome the aforementioned

problems. If there exists an edge of length l and n i 2

3.1 ClustalW

n that can be

reached by traversing l , then the designated weight to n i from this

edge is

n

l . This way, tree reversal is not solely dependent on the tree

structure but also benefits from the distance between sequences

based on pairwise alignment scores. One of the main improvements

offered by CLUSTALW is the appropriate parameter value selec-

tion for scores involving gaps. As the protein core has less insertions

and deletions, CLUSTALW considers short stretches of hydro-

philic residues (e.g. 5 or more) as an indication of loops or random

coil regions and reduces the gap opening penalty for these

stretches. Besides, it increases the gap opening penalty for gaps

that are less than eight residues apart based on the observation of

alignments between sequences of known structures, where it is rare

to find gaps within 8-residue segments [ 11 ]. The initial gap open-

ing penalty and the extended gap penalty are defined as follows:

GOP

þ

log

ð

min

ð

N

;

M

ÞÞ

S

ð

a

;

b

Þ

ISF

log N

M

GEP

1

:

0

þ

Multiple Sequence Alignment Methods

Search WWH ::

Custom Search

Home