Information Technology Reference
In-Depth Information
entry. Such a need was not apparent in the early days of Swiss-Prot as most information
was derived from a single paper that both reported the sequence and its characterisation.
This is no longer true and some entries contain information originating from up to 110
references as well as the results of many sequence analysis tools. It is therefore necessary to
provide 'evidence tags'. These are links between an information item and its source,
whether a reference, the judgement of annotator or the result of a program. Such evidence
tags already exist in TrEMBL. We have been very slow in the process of providing them in
Swiss-Prot, partly because they are difficult to implement in the current annotation platform
and because they are very cumbersome in the current flat file format. Evidence tags are
therefore probably going to be implemented in the XML and relational versions of Swiss-
Prot and will probably not be available in the flat file distribution.
2.7 Cross-references
2.7.1 Cross-references in Swiss-Prot
Cross-references as a way to access related information in other databases have been an
integral part of Swiss-Prot almost since the beginning (they were introduced in release 4 of
April 1987). Whilst navigating between databases is much less of a challenge now, thanks
to the web, than it was back in the late eighties. The early presence of DR (Database cross-
Reference) lines in Swiss-Prot shows how anticipatory we were in conceiving the database
in a way that facilitates data integration. One of the first important software applications
that made use of Swiss-Prot cross-references was the Sequence Retrieval System (SRS)
[13], developed by Thure Etzold at EMBL, from 1990 on. In addition to providing a search
interface for multiple databases with a single query, an important feature of SRS is its
ability to combine all indexed databanks into a network, where new ways of linking
information from different sources can be explored. One of the main reasons why this
became possible was the fact that Swiss-Prot, one of the first databases indexed under SRS,
was so highly cross-referenced. SRS documentation contained in 1990, and still contains in
2003, an image showing biological databases linked to each other in form of a network, the
centre of which is Swiss-Prot, connected with practically all the other databases indexed
under SRS.
The first databases cross-referenced in Swiss-Prot were the primary DNA and
protein sequence databases EMBL and PIR, and the PDB protein structure database. New
links were regularly added at each of the major Swiss-Prot releases. Currently Swiss-Prot is
linked to 55 different databases and each entry contains an average of 9.1 links. One would
naively assume that an entry does not contain more than a single cross-reference to a given
external database. This is not always true for a variety of reasons that generally depend on
the structure of the external database. For example, there is an average of 1.92 cross-
references to the EMBL DNA sequence database per Swiss-Prot entry. This reflects the
redundant archival nature of the nucleotide databases. However, this overall average does
not convey the true nature of the situation: 58% of all Swiss-Prot entries only contain one
single cross-reference to EMBL, while 6.2% contain more than 5 such cross-references.
A special emphasis should be given to the cross-references to family/domain
databases. PROSITE was the first of these databases to be created and accordingly the first
to be cross-referenced in Swiss-Prot. When cross-references to PROSITE were introduced
in 1990, there was an average of 0.42 per Swiss-Prot entry. In 2003, this number is more
than twice as high, an increase that can be explained by improved methods to detect
domains, but also by the fact that PROSITE increasingly reacts to the demands from Swiss-
Prot annotators: Whenever a newly annotated protein family carries a particular domain
Search WWH ::




Custom Search