Predicting Protein Function and Structure Using Bioinformatics Protocols: A Case Study of the SAND Protein Family - Essays in Bioinformatics

Information Technology Reference

In-Depth Information

Predicting Protein Function and Structure

Using Bioinformatics Protocols:

A Case Study of the SAND Protein Family

Amanda COTTAGE 1 , Lisa J. MULLAN 2 , Miriam B.D. PORTELA 1 , Elizabeth HELLEN 1 ,

Tim J. CARVER 3 , Sunil PATEL 4 , Tanya VAVOURI 1 , Greg ELGAR 1 , Yvonne J.K.

EDWARDS 5

1 MRC Rosalind Franklin Centre for Genomic Research, Genome Campus, Hinxton,

Cambridge, CB10 1SB, UK. 2 EMBL - European Bioinformatics Institute, Genome Campus,

Hinxton, Cambridge, CB10 1SD, UK. 3 Wellcome Trust Sanger Institute, Genome Campus,

Hinxton, Cambridge, CB10 1SA, UK. 4 Accelrys Inc., 334 Cambridge Science Park, Milton

Road, Cambridge, CB4 OWN, UK. 5 Comparative Genomics & Bioinformatics, School of

Biological and Chemical Sciences, Queen Mary, University of London, Mile End Road,

London E1 4NS, UK

Abstract. In this chapter, bioinformatics techniques are used to gain some insights

into the structure and function of a largely uncharacterised protein family called

SAND. From a phylogenomics analysis, we determine SAND as a eukaryotic gene

and show that a duplication event gave rise to two SAND genes in vertebrates.

SAND was found to be absent from archea and bacteria. From a phylogenetic

analysis, we characterise a number of subfamilies. With the use of multiple sequence

alignments, we highlight amino acids and sequence motifs conserved in SAND

proteins plus those invariant in subfamilies or taxonomical groups. In addition, we

predict a secondary structure and solvent accessibility profile and carry out protein

fold predictions for the SAND proteins.

Introduction

Predicting protein structure from sequence often involves tailored sequence similarity

searches against specialised databases. For example, carrying out a BLASTP search against

NRL3D (a databank of protein sequences of known structures), or a PSI-BLAST search

against a non-redundant protein databank, or a HMMER search against PFAM (Tables 1-

3). Protein structure prediction could also include performing multiple sequence

alignments, secondary structure predictions, solvent accessibility predictions, protein fold

recognition, constructing models to atomic resolution and model validation. In many cases,

not all protein structure prediction projects involve the use of all these techniques. The key

or most central part of a typical protein structure prediction is to identify a structural target

from which to extrapolate three-dimensional information for a query sequence. If this

central part is in error, the whole prediction will be incorrect. This is the most crucial part

of the project.

Search WWH ::

Custom Search

Home