Information Technology Reference
In-Depth Information
that is not yet present in PROSITE, the PROSITE staff creates a discriminator (pattern or
profile) for that domain. Many other family/domain databases were created in the last ten
years, most of which are cross-referenced in Swiss-Prot and also incorporated in the
InterPro [14] resource which unites these databases “under one roof”. Today a Swiss-Prot
entry contains an average of 5.2 links to family/domain databases. These cross-references
can also be seen as a pointer to the existence of a specific domain in a given protein
sequence.
As mentioned in 2.5.2, in 2003, we have added cross-references to the three GO
ontologies. These cross-references have a dual purpose: they allow navigation toward an
external resource (here GO), and they also serve as information items. This may be better
explained by the following example:
DR GO; GO:0012501; P:programmed cell death; TAS.
In the above line, the GO accession number “GO:0012501” provides a handle to
access the GO database (navigation), the “P:programmed cell death” indicated that the
protein is involved in the biological process (“P”) of programmed cell death and the “TAS”
stands for “Traceable Author Statement”.
2.7.2 Cross-referencing versus integrating
Over the years, it became clear that our strategy to “delegate” specialist tasks to the
specialists (and establish reciprocal links), while concentrating on the more “generalist”
annotation was satisfactory. This was facilitated and influenced by the appearance of more
and more databases: the word-wide web made it a lot easier to publish expert knowledge.
Existing and well-established databases (e.g FlyBase) took advantage of the increased
visibility offered by the world-wide web, and many additional new information resources
burgeoned. A number of these databases were constructed around the primary sequence or
organism-specific gene nomenclature databases, and used the accession numbers of the
sequence databases (or the primary gene names) as their set of unique identifiers. An
example is GeneCards, a database of “information cards” on every human protein in Swiss-
Prot and TrEMBL. Such databases are usually cross-referenced to Swiss-Prot via “implicit”
links, created on the fly by the NiceProt tool (see 3) that displays a Swiss-Prot entry on
ExPASy. In addition to the explicit cross-references “hard-coded” in the Swiss-Prot DR
lines, the concept of implicit links enforces the role of Swiss-Prot as a central hub for
molecular biology information [15].
There may seem to be certain drawbacks related to the strategy of establishing
extensive cross-links vs. the idea of integration of all data locally: 1) “Loss of control”; 2)
Cross-references create a certain dependency (when free public access to the Yeast
Proteome Database (YPD) was discontinued, expectations grew again for Swiss-Prot to
provide more extensive annotation for Saccharomyces cerevisiae) 3) Necessity to rely on
the willingness to collaborate of providers of the specialised cross-referenced databases
(e.g. use of standard nomenclature and common identifiers, provide or at least help with
mappings between Swiss-Prot accession numbers and their database) 4) Some foresight and
knowledge of the related field is necessary, in order not to make the effort of adding links
to a resource which will not be updated or which is likely to loose funding - with the
consequence of being forced to remove those links after a short while. However, these
disadvantages are easily outweighed by a gain in time and the relief not to “have to be an
expert in every field”, as well as the reward of fruitful collaborations and exchanges.
Procedures have been established to obtain mappings between Swiss-Prot sequences on one
side, and relatively heterogeneous information on the other: nucleotide sequences, gene
names, modification sites, domain descriptors, ontologies, etc. Many cross-references, in
Search WWH ::




Custom Search