Information Technology Reference
In-Depth Information
10 -8 is used as an indicator of allergenicity. Should the 52 motifs fail to match
the protein sequence, a BLAST search is carried out against the 135 allergens
without motifs.
The results of a synthetic dataset indicate that the method had a very high recall
(100%) and precision (95.5%). This contrasts with the FAO/WHO guidelines, which
scored 98.6% for recall and 36.5% for precision. In a more true-to-life scenario, the
method has a recall of 100% and precision of 8.6% for the entire Swiss-Prot dataset.
The FAO/WHO guidelines, on the other hand, had 100% recall and 0.5% precision.
Therefore, although the recall of both methods was comparable, the method
employed here showed improved precision.
The ability to retain the high recall while achieving higher precision (>17 times
higher) is particularly useful in a screening procedure. A high recall is important as it
prevents any potential allergens from slipping through. Moreover, the increased
precision reduces the number of false positives and therefore the number of time-
consuming laboratory screenings.
5.3.5 Wavelet Transform
Wavelet transform (Krishnan, Li, and Issac 2004) has also been used instead of MEME
to extract motifs from allergens for allergenicity predictions (Li et al., 2004). Wavelet
transform is used to convert the aligned amino acid sequences into signals where
conserved motifs may be detected on different scales.
The study used a set of 664 allergens collected from the IUIS list of allergens,
Swiss-Prot allergen list, BIFS, and FARRP. As the wavelet transform method requires
a set of aligned sequences sharing a common motif, the method first clusters the
allergen sequences into groups. Clustering into groups was achieved by computing the
distance between every pair of allergens using ClustalW (Thompson, Higgins, and
Gibson 1994). Allergens were then clustered into groups using the “partitioning around
mediods” method (Kaufman and Rousseeuw 1990). Within each group of allergens,
ClustalW or T-Coffee (Notredame, Higgins, and Heringa 2000) programs were used to
generate multiple aligned amino acid sequences. Wavelet transform was applied to the
multiple sequence alignment to extract conserved motifs. Then, HMM (Hidden
Markov Model) profiles were created from these motifs using the HMMER package.
The HMM profiles are used for searching and predicting the allergenicity of novel pro-
teins. About 20% of the allergens in the dataset did not contain any of the motifs. These
allergens were subjected to another round of clustering, wavelet transform, and motif
extraction. Any remaining allergens that did not contain motifs were stored separately
for sequence similarity search using BLAST.
The allergenicity prediction proceeds with the novel protein sequence being
subjected to a search using hmmpfam against all discovered motifs. Should the
protein sequence contain any of the discovered motifs, it is predicted to be an
allergen. If not, a BLAST search is carried out against the allergens in the dataset
that do not contain any discovered motifs. Should the BLAST search result in a good
match, then the protein sequence is predicted to be an allergen, otherwise the protein
sequence is predicted to be a nonallergen. The threshold values for both the motif
search using hmmpfam and the BLAST search were set at an E value of 0.001.
Search WWH ::




Custom Search