Information Technology Reference
In-Depth Information
3. Results
3.1. Identification of SAND Homologues and Phylogenetic Analysis
Our sequence database searches identified 40 SAND sequences in 32 species of eukaryote
(Tables 4a and 4b). A single copy of the SAND gene exists in plants, invertebrates,
protoctista (single celled eukaryotes) and fungi (Table 4a). In vertebrates, where the full
genome sequence was available, two SAND sequences were always identified. We
designated these SAND1 and SAND2 (Table 4b). Two full length SAND sequences were
found in the following mammals; human, mouse, rat (Table 4b) and chimpanzee (data not
shown). Partial SAND sequences were found in pig, cow, sheep and dog from EST
searches (data not shown). Two full-length sequences were identified in each of the teleost
fishes Fugu rubripes , Danio rerio and Tetraodon nigroviridis (Table 4b). Two partial
SAND sequences were found in frog and chicken from EST searches (Table 4b). Subfamily
divisions of the SAND family can be seen from the phylogenetic tree, with SAND1,
SAND2 and the plant SANDs forming distinct clades (Figure 2). This may be indicative of
divergence and specialisation in the function within these SAND groups compared to other
SAND groups. As mentioned previously, plants, invertebrates, protoctista and fungi have a
single copy of SAND and the yeast sand protein is known to function in mediating
vesicle/vacuole fusion [26,27]. Vacuoles are organelles characteristic of eukaryotes such as
plants, invertebrates, protoctista and fungi; whilst lysosomes are specialised “vacuole-like”
organelles found in vertebrates. The SAND gene duplication event is likely to be associated
with the evolution of mediating fusion events into the more specialised lysosome in
vertebrates and the duplication event leading to SAND1 and SAND2 in vertebrates
occurred somewhere between Chordata (chordates) and Gnathostomata (jawed vertebrates).
As we are aware from yeast functional studies, SAND functions in mediating vacuole
fusion events and in view of the above, we make the hypothesis that the duplication event
occurred concurrently with the evolution of lysosomes from vacuoles in early vertebrates.
3.2. BLASTP versus NRL3D and Protein Sequence Characterisation
No homologues with experimentally determined structures were identified by BLASTP
searches of NRL3D with the eleven full-length SAND sequences (Table 4). The iterative
BLAST algorithm PSI-BLAST can be used to identify homologous protein sequences with
known 3D structures even if the subject and query sequences have less than 20% sequence
identity. However in this example, using both full and partial SAND sequences, after four
successive PSI-BLAST 2 iterations there was a failure to return any similar sequence of
known structure.
Profile Hidden Markov Models (HMMs) built from Pfam alignments can be used to
determine if a query protein sequence contains an existing characterised protein domain.
Pfam HMMs [2] were searched with all SAND sequences and each returned a match of
their C-terminus to the domain DUF254. The DUF254 seed alignment contains 26 SAND
sequences from 13 species. These are sequences with an SPTR accession number (Table 4).
Our analysis reveals 40 members from 32 species. SANDs from an additional 19 species
are uncovered through our analysis of the available databases; these are entries with an
EMBL or REFSEQ accession number (Table 4).
From the PIX analysis various features were predicted in individual SAND
sequences, for example; coiled coils, signal peptides and peptide cleavage sites.
Unfortunately the threshold at which these features were determined was not significant. A
putative transmembrane domain was reported by TMPred and DAS comprising residues
Search WWH ::




Custom Search