Information Technology Reference
In-Depth Information
with an D-helical domain in the middle of the sequence. To test this hypothesis, the C-
termini SAND sequences were split into three protein fragments. The first contains amino-
acid residues 1-100, the second spans residues 101-255 and the third comprise residues
256-525 (for numbering see Figure 3). It is generally accepted that many protein fold
recognition programs predict more accurately if the domain boundaries are known [21-22,
28]. Each of the three regions was analyzed using the protein structure MetaServer (see
section 2.4). The results generated support the predictions obtained from SeqFold and
Profiles-3D analyses.
4. Discussion and Conclusions
We searched the available genomes, transcriptomes and protein sequence databases and
determine that SAND is a eukaryotic gene. We categorised three SAND protein
subfamilies. The first subfamily comprises members from protoctista, fungi, plants,
invertebrate metazoans. The second and third classes comprise the vertebrate SAND1 and
SAND2 proteins respectively. We postulate that the duplication event that gave rise to the
SAND1 and SAND2 paralogues is likely to have coincided with the evolution of vacuoles
to lysosomes in early vertebrates, therefore providing valuable clues and leads as to the
function of SAND1 and SAND2.
We predicted a robust secondary structure for the SAND proteins and have
determined amino-acid sequences and motifs that are either invariant or highly conserved
across certain subgroups and across the family. The secondary structure prediction on a
residue-per-residue level is expected to be 74% accurate [4, 43]. We have made some
suggestions as to the type, number and location of structural domains likely to be present in
the C-termini of SAND proteins however we did not build these to atomic resolution (Table
3, Figure 1) as these predictions require validation. Bioinformatics techniques are becoming
increasingly more effective, more accessible, quicker and simpler to use, whilst the databanks
are growing in size and diversity. So these approaches, if used appropriately, should help to
close the gap between sequence and structure and complement in vitro approaches to
investigate molecular structure and function.
References
[1]
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped
BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res
25:3389-3402.
[2]
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon
S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR (2004). The Pfam protein families database
Nucleic Acids Res 32: D138-D141.
[3]
Zdobnov EM, Lopez R, Apweiler R, Etzold T. (2002) The EBI SRS server - recent developments.
Bioinformatics. 18:368-373.
[4]
Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ (1998) JPred: a consensus secondary structure
prediction server. Bioinformatics 14:892-893
[5]
Rost B, Yachdav G, Liu JF (2004). The PredictProteinServer. Nucleic Acids Research 32: W321-
W326 Suppl.
[6]
Jones DT (1999) GenTHREADER: An efficient and reliable protein fold recognition method for
genomic sequences. J Mol Biol 287: 797-815.
[7]
Shi JY, Blundell TL, Mizuguchi K (2001) FUGUE: Sequence-structure homology recognition using
environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol . 310 : 243-
257.
[8]
Kelley LA., MacCallum RM, Sternberg MJE (2000) Enhanced genome annotation using structural
profiles in the program 3D-PSSM. J. Mol. Biol 299 , 499-520.
Search WWH ::




Custom Search