Information Technology Reference
In-Depth Information
particular those that are based on sequence searches, i.e. domain and family classification,
are now already applied to TrEMBL. This means that an entry comes with a certain number
of DR lines before manual annotation even starts. Some other DR lines however require
careful checking by an annotator, and yet others have to be added completely “manually” as
they can only be established after perusal of literature and other sources (e.g. MIM). While
the list of cross-referenced databases keeps growing, it does happen that we are obliged to
remove links to certain databases. This can have several different reasons, the most frequent
ones being a lack of funding and subsequent discontinuation of a database, or the decision
of a database maintainer to commercialise a resource and discontinue free web access even
for academic users.
2.7.3 Some thoughts on unique and stable identifiers
There are some important observations to make about cross-referencing in general. To
implement cross-referencing to a database, that database needs to provide unique and stable
identifiers (USI) for each of their entries. These USI are often known as accession numbers.
Such a requirement may seem obvious, but it is still often the case that databases do not see
the need for stable identifiers. For example, a species-specific database may use gene
names as their unique identifiers. The problem is that such identifiers may be unique but are
certainly not stable as it is most probable that some of the gene names will change over
time. Far more important for future developments is our belief that major objects in a
database require their own independent sets of USI. We became aware of this when we saw
the need to add USI to a number of objects in Swiss-Prot thus allowing external databases
to seamlessly implement cross-references to a specific object in Swiss-Prot rather than at
the level of the entire entry. A good example of such developments is the creation of feature
identifiers (FTId) for all human protein sequence variants in Swiss-Prot. These identifiers
allow specialized databases that report mutations concerning a specific set of genes to make
a cross-reference to the representation of that mutation in Swiss-Prot.
3. Making Swiss-Prot available to the users
In prehistoric times - i.e. before the Web! -, Swiss-Prot reached its users by a variety of
means. It was sent on computer tapes by the EMBL, it was distributed on floppy disks by
companies selling sequence analysis software and, in 1989, it became the first major
biomolecular database to be distributed on CD-ROM. In parallel to the physical distribution
of Swiss-Prot, the database was made available by anonymous FTP and was searchable
from a number of on-line resources such as BIONET and the NCBI IRX database retrieval
software.
When the World-Wide Web began in 1993, Swiss-Prot became available on the
ExPASy [16] server (www.expasy.org), which was born on August 1, 1993. At that date
there were less than 150 web servers worldwide. To the best of our knowledge it was the
first web server for the life science community. We were very pleased to see that it was
accessed 7'295 times during its first month of activity. We never imagined that a few years
later it would be accessed at a rate of 8-10 million hits per month. It has now been accessed
more than 300 million times by a total of more than three million computer hosts from 200
countries. Seven mirror sites, i.e. exact copies of the main site in Switzerland have been
established in Australia, Bolivia, Canada, China, Korea, Taiwan and the USA. It is also
noteworthy to mention that ExPASy and the EBI server (www.ebi.ac.uk) are far from being
Search WWH ::




Custom Search