Biomedical Engineering Reference
In-Depth Information
G)
ATTCG
G
CATT
CAG
AG
C
G
AG
A
H)
ATTCG
A
CATT
GCT
AG
T
G
GT
A
Unlike the previous cases, there are no relatively long runs of character pairings, and the matching
pairs are separated by unaligned characters. The alignment score is 1 point per aligned pair, or 13.
One attempt at visual alignment by adding four gaps into sequence (H) results in:
G)
ATTCG
G
CATT
CAGA
GCTAG
A
I)
ATTCG
A
CATT----GCTAG
TGGTA
This alignment results in a score of 12, or 14 alignments minus 2 points for the 4 gaps introduced
into sequence (H), transforming it to sequence (I). In addition, a penalty of -0.5 per character pair is
scored for an inexact match. In the case of sequences (G) and (I), there are 6 inexact matches, for a
penalty of (6 x -0.5 = -3). Using this new alignment-scoring algorithm, and ignoring the length
difference between the two sequences, the alignment score for the (G)-(I) alignment becomes:
Alignment Score = 14 alignments + 4 gaps + 6 inexact matches
= 14 + (4 x -0.5) + (6 x -0.5)
= 14 - 2 - 3
= 9
In this example, adding gaps results in a lower alignment score, illustrating how the relative worth of
exact matches, inexact matches, and gaps determines the eventual alignment of two sequences. For
example, if gaps are penalized heavily and inexact matches are minimally counted, then sequences
will have few gaps.
Although a simple gap penalty of -0.5 point per gap has been used to illustrate the role of alignment
scores on sequence alignment, gap penalty is typically calculated as:
Penalty
gap
= Cost
opening
+ Cost
extension
x Length
gap
In this formula,
Penalty
gap
is the total gap penalty,
Cost
opening
is the cost of opening a gap in a
sequence,
Cost
extension
is the cost of extending an existing gap by one character, and
Length
gap
is
the length of the gap in characters. The minimum value of
Length
gap
is one. Returning to sequence
pair (E)-(F), assuming that
Cost
opening
is (-0.5) and
Cost
extension
is (-0.5), the gap penalty becomes:
Penalty
gap
= Cost
opening
+ Cost
extension
x Length
gap
= -0.5 + (-0.5 x 4)
= -2.5
With the expanded method of computing gap penalty, the score becomes 10 + 6 - 2.5 = 13.5 points.
The gap penalty formula can be extended to include a penalty for alignments for the gaps at the end
of a sequence to make the sequences of equal length. However, if the sequences are of very different
lengths, then it probably doesn't make sense to penalize for these end gaps.