Information Technology Reference
In-Depth Information
dimensionally into separate files, this system allows curators to immediately find orthologs,
which can all be updated when new findings become available for at least one protein, or
when a review article summarises relevant knowledge on a protein family or subfamily and
comes to new conclusions. The quick availability of all related entries (all in the same file)
also ensures consistent annotation of all relevant entries. The ~140,000 entries in the
current release are thus split into ~3,000 files.
Most of the annotation is done manually with the help of a continuously growing
number of tools. We currently use a text editor, Crisp (from Vital, Inc.), that is easy to use
and comes with a powerful C-like macro language that we extensively use both for
literature-driven textual annotation and as a platform to launch sequence analysis programs
(see 2.4.2). An extensive series of macro-commands have been developed to reformat
references, comment lines, feature lines or sequences, to check controlled vocabulary or
syntax, and to retrieve entries from other databases. Analysis tools are also run directly
from the editor with the help of macro-commands that send the sequence and other relevant
information to the analysis program, and then retrieve the result and format it in the
annotation platform. All commands are available both from keyboard shortcuts (which are
preferred by experienced annotators) and from menus and dialog boxes that are fully
integrated in the editor's GUI environment.
Swiss-Prot annotation has always been subjected to very strict rules and guidelines.
All entries are reviewed before they enter the database, which guarantees the homogeneity
of the annotation. We developed a “syntax checker” so as to make sure that our annotation
and format rules are enforced. This syntax checker, implemented in Perl, is much more than
a program that verifies the basic syntax of a Swiss-Prot entry. It also enforces the use of
controlled vocabularies (see 2.5) and checks for dependencies and consistencies between
different portions of an entry. In December 2003, the syntax checker contained almost
1'100 different rules, each of which can lead to the detection of errors or inconsistencies.
Many people are surprised to hear that Swiss-Prot annotation is done from within a
text editor. However, those same people are usually even more surprised once they see how
powerful the annotation platform developed around that text editor is, and that almost every
command can be launched, and its results treated, from within the editor, in a remarkable
speed. One major disadvantage of this environment is that it relies heavily on the flat file
format. We are now developing a Swiss-Prot specific editor, which will work with the
XML-formatted version of the databases, and will include many consistency checks and
context-specific menus. The new annotation platform will also include many graphical
features, e.g. visualization of domain and site predictions along the sequence. We believe
that such a development is highly desirable, as it will allow the implementation of
consistency checks directly at the level of the annotation platform while we now have to
rely on a regular post-processing check of the data, using the syntax checker to enforce
consistency.
2.4.2 Sequence analysis tools
The task of annotating Swiss-Prot entries has always relied on the use of the most
appropriate sequence analysis programs so as to predict important sequence features. Over
the years we have implemented many different methods and programs in our annotation
platform. We have also spent a considerable amount of time testing new methods and
selecting the most appropriate ones. In some cases, when no existing program could satisfy
our needs, we have developed our own set of predictive methods [6, 7]. All these activities
are carried out by a small research component within the Swiss-Prot group whose missions
Search WWH ::




Custom Search