Information Technology Reference
In-Depth Information
biological data, they do not provide specific subsets of the data. Although the search
and retrieval tools of general databases allow to some extent the extraction of
allergen data, it takes multiple steps and the results may contain records irrelevant for
allergen research. For example, GenBank keyword searches are not sufficiently
specific and result in large numbers of false positives (Malandain 2004).
Specialized databases collect allergen-specific data from primary databases and
validate the records to ensure they refer to genuine allergens. Some of the primary
databases do not perform quality assurance of their data. GenBank only requires that
submitters check their records prior to submission. This means that the data may be
of low quality, requiring additional validation by the specialized databases. This is
one of the reasons why manually curated databases with high-quality data, like
Swiss-Prot, are popular with developers of specialized databases.
Most primary databases cover only a certain type of biological data. For example,
GenBank is focusing on nucleotide sequences. This presents a problem, as
contemporary research is multifaceted and requires different types of biological data.
This need is fulfilled by allergen databases that collect allergen-specific data from
multiple sources and aggregate them for the benefit of the researcher. Thus,
specialized allergen databases serve as a one-stop shop for researchers.
A relatively large amount of allergen data that are required by researchers, such
as information on epitopes, cross-reactivity, and clinical phenotypes, are only
present in the literature. Although PubMed provides text search and retrieval
functions, the information contained in the literature is unstructured, making
automated extraction difficult. In addition, PubMed is limited to abstracts only.
While PubMed may serve as an initial resource for locating allergen-related
literature, expert annotation using full-text literature is often required to extract
allergen-specific information. This is a time-consuming process but provides
invaluable information that would otherwise be unavailable.
The allergen information in the primary databases is also void of any form of
classifications. Classifications are useful to researchers because they partition the
data into meaningful subsets that can be independently analysed and used for
deriving generalizations or improving database search functions. The most
common form of allergen classification is based on the allergen source, for
example, food allergen.
Search tools of allergen databases should be better than the standard tools of
primary databases. Often, allergen databases have search tools that use fields
relevant to allergies. The adaptation of meaningful search fields and terms allows
researchers to quickly and accurately extract the desired information. In addition,
allergen databases integrate allergen-specific bioinformatics applications to aid
researchers in the analysis of allergens. Often, primary databases do not provide
ready-to-use tools but require researchers to use their own computational tools. This
can be a lengthy process involving the creation of specific allergen datasets followed
by the computation itself. In contrast, allergen databases already contain the allergen
datasets and can easily integrate existing bioinformatics applications to provide
user-friendly analysis tools to the research community.
Search WWH ::




Custom Search