Computational Biology and Language - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

tein structure analysis, see [32]. A protein sequence is very short in length, being

on an average 300 residues long. There are proteins as short as 50 residues and

those that are larger than 1000 residues, but most of the proteins are a few

hundred residues long. The duration of secondary structure elements are even

shorter. Hence it is not suitable to use Fourier transform in the analysis of pro-

tein signal. Also, while Fourier transform can capture periodicities at any scale

in the overall signal, it cannot identify the location of occurrence of periodicity.

To capture local periodicities, Wavelet transform appears to be a more suitable

mathematical tool [33] and has been applied earlier to speech recognition [34,

35]. Previously, the application of Wavelet transform in the context of transmem-

brane helix prediction has primarily been to de-noise the hydropathy signal by

removing high frequency variations [36-39]. In the work presented here, wavelet

transform is used to derive features from amino acid sequences.

In order to facilitate the use of signal analysis for the transmembrane helix

prediction problem, polar/non-polar characteristics are mapped polar = 1, non-

polar = 0. Other mappings such as by electronic properties, viz., mapping from

strong electron donor to strong electron acceptors to numerical values +2 to

-2, have also been studied. However, the best results were observed empirically

by the choice of polar/non-polar representation. Application of wavelet trans-

form to the polar/non-polar representation of one particular membrane protein,

bovine rhodopsin (Swissprot ID: OPSD BOVIN ), is shown in Fig. 9. The numerical

mapping of the sequence with polar/non-polar property is the same as shown

in Fig. 6. A standard analysis function, Mexican-hat, at scales from 1 to 32 has

been applied to this protein signal, resulting in a continuous wavelet transform

of the protein sequence.

The wavelet transform gives rise to patterns that are distinct between the

transmembrane regions from non transmembrane regions. An image represen-

tation of the wavelet transform, called the scalogram is shown in Fig. 9A.

Superimposed on the scalogram is the location of transmembrane and non-

transmembrane regions. Further, the wavelet transformed signal at different

scales is also mapped onto the 3-dimensional structure of the protein, to visually

analyze the distribution of feature values in different segments of the protein,

here for scale 9 in rhodopsin (Fig. 9B).

3.5

Formal Analysis of the Features Derived

Using Wavelet Methodology

Comparing the scalogram of a transmembrane protein in Fig. 9A to the spec-

trograms of speech in Fig. 7, it can be seen that the durational characteristic

of transmembrane segments is very similar to that of phones in speech. The ob-

servations are very similar from one sample (or frame) to the next; there is an

onset period and offset period from the transmembrane segment. In the absence

of such durational feature, a classifier would have been suitable to classify the

protein residues as transmembrane or non-transmembrane. However, to capture

the time (or position) specific characteristics of the wavelets with respect to

transmembrane domains, hidden Markov modeling (HMM) like architecture is

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home