Information Technology Reference
In-Depth Information
the only web servers that redistribute Swiss-Prot and TrEMBL, we estimate that there are
about 50 such sites world-wide.
ExPASy has constantly evolved in its ten years of existence. It is outside of the scope of
this article to describe all of what is available on the server, yet we want to point out two
significant developments that reflect our response to the needs of users.
In autumn 1998, we initiated “NiceProt”, with the intention to provide scientists
with a more user-friendly way of looking at Swiss-Prot and TrEMBL entries. Instead of
showing the raw Swiss-Prot data format (with its two-letter line types), we decided to make
use of html tables to group certain fields under common headings, to replace the line type
by a more explicit key (e.g. “Cross-references” instead of “DR”). This was initially targeted
at users who are not familiar with the Swiss-Prot data format, but rapidly caught on in the
scientific community. Gradually, more and more functionalities were added, including
many implicit cross-references, and links to context-specific documentation. During the
first eight months of 2003, ExPASy treated about 1 million requests for individual Swiss-
Prot or TrEMBL entries on average per month. An overwhelming majority of these hits (85
%) are for NiceProt, whereas the remaining 15 % account for accesses to the raw text
version, or the “htmlised” view that was prevalent prior to September 1998.
The NEWT [17] taxonomy browser (http://www.ebi.ac.uk/newt/) is a service
introduced in 2002 that serves as an entry point into Swiss-Prot and TrEMBL using
taxonomic search criteria. The core of NEWT consists in the integration of Swiss-Prot
specific taxonomy information with the NCBI taxonomy data in a relational database.
Taxonomic nodes are stored in a hierarchical tree; this allows easy navigation through the
taxonomy lineage from every taxon. The web interface to NEWT allows users to search and
browse the daily updated taxonomy data. Users can navigate through the taxonomy tree and
access corresponding Swiss-Prot and TrEMBL protein entries. Additionally, a manually
curated selection of over 24,000 external links (including more than 13,000 photographs)
provides specific information on selected species.
Both UniProt and NEWT are representatives of the trend toward a 'customisation'
of the representation of knowledge. We believe that this trend will not abate; there are
many specific communities of life scientists that require information on proteins, yet want
them to be represented in a style or perspective specific to their field of research. We are in
the process of developing new types of views.
We also believe that the ExPASy server access log files are a valuable source of
information as to the most frequently consulted TrEMBL entries (i.e. unannotated entries
that will greatly benefit from manual annotation) scientists' use of search engines, the
context in which certain entries are consulted etc. We therefore plan to mine the ExPASy
log files and expect to be able to draw enlightening conclusions!
4. Conclusions
Being a well-established database, we can say that the tireless effort of juggling between
evolution and stability has been an exhausting but suitable strategy for the development of
the Swiss-Prot protein knowledgebase. Early design features of the database such as the
detailed structuring of the entry format, the standardisation of nomenclature, the regular
review of the annotation of protein families have been shown to be indispensable. The
explosive growth in uncharacterised sequence data has led us to the implementation of
automatic and semi-automatic processes. They are designed to ensure the same high-
quality standards that have always been the hallmark of Swiss-Prot. Automation has to go
in parallel with the introduction of evidence tags that will allow distinguishing data sources
and inferences. We strongly believe that the future of Swiss-Prot and of any similar curated
Search WWH ::




Custom Search