Biology Reference
In-Depth Information
that achieves low entropy scores for each column as the total score
of the MSA is the sum of column entropy scores. The entropy-
based scoring schemes are not affected by the number of sequences
in the MSA as the entropy calculation involves relative frequencies
of each symbol in the column.
One of the drawbacks of SP scoring is the assumption that substi-
tution probabilities are uniformly distributed and time-invariant.
However, substitution probabilities may also depend on structural
and functional properties of the proteins [ 8 ]. Normalized mean
distance score (norMD), a column-based MSA scoring method, is
proposed to overcome with this deficiency of SP scoring [ 9 ].
NorMD is formally defined as follows:
2.5 NorMD
GAPCOST
MaxMD
MD
NorMD
ΒΌ
LQRID ;
where
l MD: mean distance.
l GAPCOST: affine gap cost.
l MaxMD: maximum obtainable MD score.
LQRID (lower quartile range of the pairwise hash score):
similarity measure of sequences based on a hash score which is
obtained from dot plots of pairs of sequences.
l
The MD score is the negative exponential of the weighted
pairwise distances between the sequences. The weights are inversely
proportional to the percentage identities between pairs of
sequences. The MaxMD value is included in the score as a normali-
zation factor to eradicate the effects of high MD values of long
sequences. Eventually, norMD is normalized into a value between
0 and 1. The advantage of the norMD scoring scheme is its inde-
pendence from the number and length of the sequences. However,
its major drawback is formidable hash computation during scoring.
3 Applications of Objective Functions
Implementation of MSA algorithms can be divided into five groups:
Exact Methods: Dynamic Programming (DP) using an n -
dimensional matrix
l
Progressive Methods: Uses a guided tree to combine pairwise
alignments to obtain the final multiple alignment (e.g. Clus-
talW, MUSCLE, GramAlign)
l
Iterative Methods: First computes a sub-optimal solution and
provides improvements via DP until solution converges (e.g.
MAFFT)
l
Search WWH ::




Custom Search