Biomedical Engineering Reference
In-Depth Information
Table 2.4 The major primary sequence (protein and nucleic acid) databases and the web
addresses from which they may be accessed
Database
Web address
Protein
PIR
http://www-nbrf.georgetown.edu/
Swiss-Prot
http://www.ebi.ac.uk/swissprot/
MIPS
http://www.mips.biochem.mpg.de/
NRL-3D
http://www-nbrf.georgetown.edu/pirwww/
dbinfo/nrl3d.html
Tr EMBL
http://www.ebi.ac.uk/index.html
Owl
http://www.bis.med.jhmi.edu/Dan/
proteins/owl.html
Nucleic acid
EMBL
http://www.ebi.ac.uk/embl/index.html/
GenBank
http://www.ncbi.nlm.nih.gov
DDBJ
http://www.ddbj.nig.ac.jp/
An alternative approach to amino acid sequence determination is to sequence its gene
(Chapter 3). The amino acid sequence can be inferred from the nucleotide sequence obtained.
This approach has gained favour in recent years. Refi nements to DNA sequencing methodolo-
gies and equipment have made such sequence analysis both rapid and relatively inexpensive. The
ongoing genome projects continue to generate enormous amounts of sequence data. By the early
2000s, substantial/complete sequence data for some 300 organisms were available ( Table 2.3). As
a result, the putative amino acid sequences of an enormous number of proteins (most of unknown
function/structure) had been determined.
Upon its generation, sequence information is normally submitted to various databases. The
major databases in which protein primary sequence data are available are listed in Table 2.4.
Also included in this table are the major nucleic acid sequence databases, as amino acid sequence
information can potentially be derived from these.
The Swiss-Prot database is probably the most widely used protein database. It is maintained
collaboratively by the European Bioinformatics Institute (EBI) and the Swiss Institute for Bioin-
formatics. It is relatively easy to access and search via the World Wide Web ( Table 2.4). A sample
entry for human insulin is provided in Figure 2.4. Additional information detailing such databases
is available via the web addresses provided in Table 2.4 and in the bioinformatics publications
listed at the end of this chapter.
A polypeptide's amino acid sequence can thus be determined by direct chemical (Edman) or
physical (mass spectrometry) means, or indirectly via gene sequencing. In practice, these methods
are complementary to one another and can be used to cross-check sequence accuracy. If the target
gene/messenger RNA (mRNA) has been previously isolated, then DNA sequencing is usually
most convenient. However, this approach reveals little information regarding any PTMs present
in the mature polypeptide, many of whom are of critical signifi cance in the context of therapeutic
proteins (discussed in Section 2.5).
Search WWH ::




Custom Search