Information Technology Reference
In-Depth Information
complain when they look for specific information across one or many databases and fail to
obtain a comprehensive answer because that information is heterogeneously described.
Therefore we always felt that Swiss-Prot had a mission to fulfil in enforcing existing rules
and more and more, as time passed by, to actively participate in the development of new
nomenclature and controlled vocabularies. Anecdotally such an active role can have some
unexpected consequence: we were once threatened with a lawsuit because we did not
accept to use as a valid gene symbol the one proposed by an author.
All of this leads us to give the following advice to would-be developers of
databases:
x Try to follow as much as possible existing controlled vocabularies and nomenclatures;
x Do not hesitate to contact the groups maintaining these resources and to point out
inconsistencies and/or errors;
x Do not be afraid to take a firm stand toward your users when they request the
representation in your database of terms that do not follow a specific guideline. You can
always (and you should!) store this information as a synonym.
2.5.2 Going ahead with GO in Swiss-Prot
If we assume, as mentioned above, that “users and database should agree on the meaning of
the term being used”, given the large number of biomolecular databases available, this
indirectly implies that all databases should agree on the meaning of a term! In an attempt to
achieve this ambitious goal, maintainers of FlyBase, MGD and SGD joined forces and
formed the GeneOntology (GO) Consortium [12]. They established three ontologies,
gathering key terms for cellular components, biological process and molecular function,
thus catering for a large need for standardisation that could be observed all across the
scientific community.
From the beginning of the GO activities, we were repeatedly approached by users
wondering when we would introduce GO terms to Swiss-Prot and TrEMBL. However,
while clearly welcoming the effort made by the GO consortium, we were reluctant to add
links to GO at that time: Given the initially small scope (GO specialised in three major
organism groups, whereas Swiss-Prot has to deal with thousand of different species), and
the fact that many mappings had been created automatically and were thus likely to assign
GO terms to unrelated proteins, we considered it dangerous to mislead users into incorrect
assumptions. We did not want to risk the situation where someone would happily accept a
GO assignment indicating a function for an otherwise uncharacterised protein, without
further questioning the assignment because they trust the judgement of Swiss-Prot
annotators and the high quality of the manual annotations.
It was only in 2003 that we felt what it became “safe” to start introducing GO terms
in Swiss-Prot. We felt that GO had indeed considerably matured and had increased its
coverage. What's more, several species-specific databases have established manually
curated mappings between GO terms and their gene catalogues. The EBI GO team has
mapped Swiss-Prot keywords to GO terms. Evidence tags are available in GO to indicate
whether an assignment has been done automatically or by manual curation. The time had
come to follow the demands, and to introduce cross-references (see 2.7.1) from Swiss-Prot
to GO. We added them in all cases where they originated from manual annotation efforts.
We also are in the process of introducing GO terms for all members of microbial protein
families that fall under the scope of the HAMAP annotation project.
Search WWH ::




Custom Search