Information Technology Reference
In-Depth Information
implementations of allergen databases. In addition, we will also touch on one of
the main bioinformatics applications of these data, namely, allergenicity
prediction.
5.2 Allergen Databases
The primary databases like GenBank/EMBL/DDBJ (O'Donovan et al. 2002; Kulikova,
Aldebert, Althorpe, Baker, Bates, Browne, van den Broek, Cochrane, Duggan,
Eberhardt, Faruque, Garcia-Pastor, Harte, Kanz, Leinonen, Lin, Lombard, Lopez,
Mancuso, McHale, Nardone, Silventoinen, Stoehr, Stoesser, Tuli, Tzouvara, Vaughan,
Wu, Zhu, and Apweiler 2004; Miyazaki, Sugawara, Ikeo, Gojobori, and Tateno 2004),
Swiss-Prot, Protein Data Bank (PDB) (Bourne, Addess, Bluhm, Chen, Deshpande,
Feng, Fleri, Green, Merino-Ott, Townsend-Merino, Weissig, Westbrook, and Berman
2004), and PubMed now provide large amounts of publicly available data of various
types. Primary databases are the first-stop depositories for biological data and as such
are more comprehensive and well maintained. GenBank/EMBL/DDBJ are the major
providers of nucleotide sequences. Most nucleotide sequences described in research
articles are required to be deposited in any one of these databases. As the data in these
three databases are synchronized, they contain virtually the same data. In addition to
the requirement by journals on the deposit of nucleotide sequences into these
databases, the rapid advances in sequencing technology have tremendously increased
the amount of information present in these databases. From 1982 to 2004, the amount
of bases in GenBank has doubled every 14 months. This is also reflected in the
translated protein sequences derived from the nucleotide sequences available as
GenPept, TrEMBL, and DAD. Swiss-Prot, a primary protein sequence database, has
experienced lower growth rates due to its manually curated nature. However, its size is
still growing at a rapid rate. Release 52.3 (Apr 2007) contains 264,492 protein
sequence entries. The manually curated nature of Swiss-Prot provides for quality and
rich annotations that have made it popular for specialized allergen databases. The
complexity involved in 3D protein structure determination means that there is far less
3D structure information contained in PDB. However, the data contained in PDB, like
the rest of the primary databases, is growing and this has placed more 3D structure
information on allergens in the hands of researchers. PubMed is a large store of
literature information. Sequence and 3D structure information are usually deposited
into the previously mentioned primary databases, while the literature contains other
types of data that are not found in these primary databases. Some of the information
that is of interest in the field of allergens are cross-reactivity data, clinical relevance,
and antigen epitopes. Together, these four types of primary databases serve as the
primary data source for most if not all specialized allergen databases.
5.2.1 Need for Specialized Databases
Most specialized allergen databases derive their information from the primary
databases and provide additional features dedicated to the allergen research
community. Since primary databases are meant to be central depositories for
Search WWH ::




Custom Search