Biology Reference
In-Depth Information
Fig. 2. A character matrix of peptide sequences, as observed for 15 plant species.
The displayed sequence is for the first amino acids of a polypeptide, coded for by the
5
end of the chloroplastic gene rps4 . For nine of these sequences, the beginning of
the gene has not been sequenced; accordingly, the unknown states are symbolised
by “?”. Some characters, such as Char. 18 (amino acid leucine, “L”) and Char. 0 (the
initiating amino acid methionine, “M”), display the same state in all 15 sequences.
But most characters are multi-state in this matrix, e.g. Char. 10 which presents three
states: isoleucine (“I”, 5 times), leucine (“L”, once), and lysine (“K”, 9 times).
peptide sequences are displayed. In the case of molecular sequences, the
alignment matrix is often called a multiple alignment in reference to the
way it is constructed.
Technically, the total length of this alignment could be reduced to
37 characters. However, the shortest alignment is not necessarily the
best because, in the course of evolution, characters can be deleted and
new ones can be inserted. For example, we have good reason to believe
that the two characters “VG” in the Polypodium sequence are both spe-
cific inserts: they have no equivalent in the other sequences, where
a double gap (“—”) is shown. Also the gap for Char. 17 is kept because
this is an extract from a larger data matrix with many more sequences,
some of them displaying an amino acid insert on site 17. We shall
further discuss the alignment procedure which produced this matrix
in Sec. 3.2.
Search WWH ::




Custom Search