Biomedical Engineering Reference
In-Depth Information
Fundamentals
Sequence alignment is fundamental to inferring homology (common ancestry) and function. For
example, it's generally accepted that if two sequences are in alignment—part or all of the pattern of
nucleotides or polypeptides match—then they are similar and may be homologous. Another heuristic
is that if the sequence of a protein or other molecule significantly matches the sequence of a protein
with a known structure and function, then the molecules may share structure and function. The
issues related to single pairwise sequence alignment, global versus local alignment, and multiple
sequence alignment are introduced here.
Pairwise Sequence Alignment
Pairwise sequence alignment involves the matching of two sequences, one pair of elements at a time.
The challenge in pairwise sequence alignment is to find the optimum alignment of two sequences
with some degree of similarity. This optimum condition is typically based on a score that reflects the
number of paired characters in the two sequences and the number and length of gaps required to
adjust the sequences so that the maximum number of characters are in alignment. For example,
consider the ideal case of two identical nucleotide sequences, (A) and (B):
A) ATTCGGCATTCAGTGCTAGA
B) ATTCGGCATTCAGTGCTAGA
Assuming that the alignment scoring algorithm counts one point per pair of aligned characters
(shown in bold type), then the score is one point for each of the 20 pairs, or 20 points. Now, consider
the case when several of the character pairs aren't aligned:
C) ATTCGGCATT CAGT G CTAGA
D) ATTCGGCATT GCTA G A
In this case, the score would be 11, because only 11 pairs of characters in sequences (C) and (D) are
aligned. However, by examining the end of the sequences, it can be seen that the sequence of the
last six characters are identical. By moving these last six characters ahead in sequence (D) by adding
four spacers or gaps, the sequences become:
E) ATTCGGCATT CAGT GCTAGA
F) ATTCGGCATT----GCTAGA
Now the score, based on the original algorithm of character pairings, is 16. However, because the
score would have been 11 without the inserted gaps, a penalty should be extracted for each gap
inserted into the sequence to favor alignments that can be made with as few gaps as possible.
Assuming a gap penalty of -0.5 per gap, the alignment score becomes 10 + 6 + (4 x -0.5) or 14.
A more likely scenario is one in which the areas of similarity and difference are not obvious. Consider
the sequences (G) and (H):
 
 
Search WWH ::




Custom Search