Information Technology Reference
In-Depth Information
descriptions include only the C D atoms of a protein, while surface descriptions include only
those atoms in contact with the environment (solvent). But we may choose to use higher
categories, such as domain-units instead or amino acid residues. TOPS cartoons are
simplified description in which the entities are secondary structural elements; the
relationships are topological links describing sequential or spatial vicinities.
Table 2. An example of simplified descriptions
Model
Descriptor
Position
Hydrophobicity
Hydrophobicity plot
+
Real number
Hydrophobic segments
+
Discrete (0 or 1)
Average
hydrophobicity
-
Real number
Hydrophobic character
-
“Hydrophobic”/“Hydrophilic”
Another avenue of fine-tuning consists in decreasing the detail - the resolution - of
the descriptors (Table 2). For example, residue hydrophobicity can be described in
quantitative terms, using a hydrophobicity scale (with continuous variable represented as a
real number) or qualitatively (discrete variable, represented as 0 or 1 or with categories
“hydrophobic” and “hydrophilic”).
The intuitive concept of resolution also refers to the number of categories used in a
given description. An amino acid composition is a vector in a 20-dimensional space, and
since most proteins contain all of the amino acids, all the components of the vector are non-
zero. On the other hand, we have 400 dipeptides and 8000 tripeptides. In a tripeptide-based
composition, however, many (or most) of the components would be zero or 1. Very high-
resolution descriptions are highly characteristic “fingerprints” that can be used to identify
individual structures. For example, mass spectra are efficiently identified by the
presence/absence of their constituent peaks, and similarly, small molecular structures can
be retrieved from databases using queries constructed from their constituent fragments. On
the other hand, high-resolution fingerprints cannot be easily generalized to similar
molecules, so the resolution of the descriptions has to be optimized so as to include the
right scope of similar descriptions.
1.3.2 Kinds of descriptors
Descriptors can be categorized according to their contents. On the one hand we have
various levels, such as atoms, residues, secondary structure element, domain etc. Whether
we talk about DNA or about proteins, there is an apparent lowest level that is not divided
into further categories. For example, structural biology is rarely concerned with particles
below the atomic level, while molecular biologists use nucleotides and amino acids as the
lowest level. Higher-order units can be built up from the lower levels. In most cases the
higher units are non/overlapping, i.e. one atom can be part only with one residue. On the
other hand we use overlapping fragment descriptions as well, for example nucleotide
sequences can be described in terms of overlapping di- or trinucleotide words, protein 3D
structures can be described as peptide fragments.
We use the term “ structured descriptions ” for those descriptions that contain both
entities and relationships. Protein 3-D structures and sequences are such descriptions even
though the relationships are not explicitly included in the actual descriptions found in
Search WWH ::




Custom Search