Information Technology Reference
In-Depth Information
tein structure analysis, see [32]. A protein sequence is very short in length, being
on an average 300 residues long. There are proteins as short as 50 residues and
those that are larger than 1000 residues, but most of the proteins are a few
hundred residues long. The duration of secondary structure elements are even
shorter. Hence it is not suitable to use Fourier transform in the analysis of pro-
tein signal. Also, while Fourier transform can capture periodicities at any scale
in the overall signal, it cannot identify the location of occurrence of periodicity.
To capture local periodicities, Wavelet transform appears to be a more suitable
mathematical tool [33] and has been applied earlier to speech recognition [34,
35]. Previously, the application of Wavelet transform in the context of transmem-
brane helix prediction has primarily been to de-noise the hydropathy signal by
removing high frequency variations [36-39]. In the work presented here, wavelet
transform is used to derive features from amino acid sequences.
In order to facilitate the use of signal analysis for the transmembrane helix
prediction problem, polar/non-polar characteristics are mapped polar = 1, non-
polar = 0. Other mappings such as by electronic properties, viz., mapping from
strong electron donor to strong electron acceptors to numerical values +2 to
-2, have also been studied. However, the best results were observed empirically
by the choice of polar/non-polar representation. Application of wavelet trans-
form to the polar/non-polar representation of one particular membrane protein,
bovine rhodopsin (Swissprot ID: OPSD BOVIN ), is shown in Fig. 9. The numerical
mapping of the sequence with polar/non-polar property is the same as shown
in Fig. 6. A standard analysis function, Mexican-hat, at scales from 1 to 32 has
been applied to this protein signal, resulting in a continuous wavelet transform
of the protein sequence.
The wavelet transform gives rise to patterns that are distinct between the
transmembrane regions from non transmembrane regions. An image represen-
tation of the wavelet transform, called the scalogram is shown in Fig. 9A.
Superimposed on the scalogram is the location of transmembrane and non-
transmembrane regions. Further, the wavelet transformed signal at different
scales is also mapped onto the 3-dimensional structure of the protein, to visually
analyze the distribution of feature values in different segments of the protein,
here for scale 9 in rhodopsin (Fig. 9B).
3.5
Formal Analysis of the Features Derived
Using Wavelet Methodology
Comparing the scalogram of a transmembrane protein in Fig. 9A to the spec-
trograms of speech in Fig. 7, it can be seen that the durational characteristic
of transmembrane segments is very similar to that of phones in speech. The ob-
servations are very similar from one sample (or frame) to the next; there is an
onset period and offset period from the transmembrane segment. In the absence
of such durational feature, a classifier would have been suitable to classify the
protein residues as transmembrane or non-transmembrane. However, to capture
the time (or position) specific characteristics of the wavelets with respect to
transmembrane domains, hidden Markov modeling (HMM) like architecture is
Search WWH ::




Custom Search