Biology Reference
In-Depth Information
2.2.1. Sequence annotation
Since UniProtKB is a protein-centric resource, it is critical to get as much
information as possible in order to obtain the correct, biologically signif-
icant protein sequence(s). This step requires a number of distinct, yet
related annotation activities:
capturing sequencing conflicts to display, in the entry, the protein
sequence the most likely to be correct;
correcting incorrect gene models (using multiple alignments to
detect potential problems);
determining the most probable initiation codon (using multiple
alignments);
validating models using full-length cDNAs or, in some cases,
expressed sequence tag (EST) sequences;
validating protein sequences using high-quality mass spectrometry
(MS) or, more rarely, Edman sequencing data;
detecting/predicting rare sequence-modifying biological events such
as the presence of nonstandard amino acids like selenocysteines; and
annotating splice variants using published reports, full-length
cDNAs, and bioinformatics predictions from trusted sources.
For D. discoideum , unfortunately only a few full-length cDNAs are
available to confirm gene models, and in most cases sequencing at the
protein level is not available. However, we often correct exon or initia-
tion codon prediction using comparative sequence analysis, as illustrated
in Figs. 1 and 2.
In addition to the sequence itself, UniProtKB provides the protein
existence (PE) line, which is an “evidence level” for the in vivo existence
of a protein, regardless of the accuracy or correctness of the sequence
displayed. This is particularly important for model organisms such as
D. discoideum, for which a large proportion of the available sequences are
pure predictions from the genome. To gain this “level 1” (“existence at
protein level”), a protein has to be clearly observed experimentally. The cor-
responding entry should contain a characterization paper, Edman sequenc-
ing information, clear identification by MS, or an X-ray or nuclear magnetic
resonance (NMR) structure, such as the entry displayed in Fig. 3.
Search WWH ::




Custom Search