Allergen Bioinformatics - Immunoinformatics

Information Technology Reference

In-Depth Information

5.2.2 Desired Features of Allergen Databases

One of the main desired features of an allergen database is the aggregation of all publicly

available allergen-specific information into a comprehensive resource. This aggregation

activity should take note of the following points:

1. The database should aim to be as comprehensive as possible. In practice, the

creation of a one-stop resource for all allergen information is a nontrivial task.

There are already allergen databases that cater to specific needs. Besides, it

would require huge efforts and resources to create and maintain a comprehensive

database that only a few groups could afford.

2. The records contained in the database should be nonredundant and steps should

be taken to ensure this. Redundancy is leading to over- and underrepresentation

of data that can cause errors in the allergen analyses. This is particularly

important if the records are used as training sets for allergenicity prediction.

Moreover, redundancy leads to false estimates of true known allergens. Sequence

similarity methods like BLAST (Altschul, Gish, Miller, Myers, and Lipman

1990) can be effectively used to reduce sequence redundancy by searching for

similar sequence records.

3. Each source database contains different types of biological data necessitating the

design of a common data format that can encompass all available information.

4. The fields contained in the records should be useful for allergen researchers.

Therefore, the design of the record format should take into account the expected

usage. Some of the common fields required include nucleotide sequence, protein

sequence, literature references, and 3D protein structure.

5. As far as possible the allergen names should comply with the nomenclature

(King, Hoffman, Lowenstein, Marsh, Platts-Mills, and Thomas 1994) set out by

the Allergen Nomenclature subcommittee of the IUIS (International Union of

Immunological Societies). Allergens contained in the IUIS allergen list should be

used with its official names to prevent naming conflicts.

6. The use of multiple source databases may lead to conflicting data. Manual

curation would then be required to resolve these conflicts.

7. There is a need to update the allergen database whenever there are changes or

updates in the source databases. The propagation of information from the source

databases to the specialized allergen databases ensures that the database is

current.

8. Some allergen information is only present in the literature and the lack of a

structured form of literature data necessitates the manual extraction of this

information. This requires large amounts of time and effort.

9. The source databases may contain errors that have to be validated. In most cases,

the validation has to be done manually. Again, like information extraction from

the literature, this requires both time and effort.

In view of these factors, the aggregation process should be performed as a two-step

process. The first step would be to aggregate the information present in the source

databases to a format that encompasses all the required and useful fields. As far as

Immunoinformatics

Search WWH ::

Custom Search

Home