Information Technology Reference
In-Depth Information
biological sequences to biological functions on the other, has proven particularly
useful. Recent reviews of applications of linguistic approaches to computational
biology in general can be found in references [1, 2]. Thus, we will first only briefly
explain the analogy between biological sequence and natural language processing
in general and then focus the remainder of the review on the use of language
technologies to identify the functional building blocks in protein sequences, i.e.
the “words” of “protein sequence language”. Since it is not known what would
be the best word equivalent, we will first describe what types of word equivalents
and vocabularies have been explored using the example of one specific area of
application, secondary structure prediction and analysis. In some areas of ap-
plications of language technologies to language, the words are also not known,
for example in speech recognition. In these applications, identifying functional
building blocks is a signal processing task and we will describe the analogy to pro-
tein sequences from this perspective. This includes first introducing proteins and
protein structure in comparison to the terms used in speech processing, followed
by a presentation of one specific application of signal processing techniques in
computational biology, namely transmembrane helix structure prediction. This
will be brought into the broader context of other applications of language tech-
nologies to the same task. Finally, we will present a sampling of a few other
examples of applying language technologies to the computational biology of pro-
teins. Additional examples can be found referenced on the website of the Center
for Biological Language Modeling (BLM) in Pittsburgh, USA [3].
2 Use of Language Technologies
in Computational Biology
Most functions in biological systems are carried out by proteins. Typical func-
tions include transmission of information, for example in signaling pathways,
enzymatic catalysis and transport of molecules. Proteins also play structural
roles such as formation of muscular fiber. Proteins are synthesized from small
building blocks, amino acids, of which there are 20 different types (see below).
The amino acids are connected to form a linear chain that is arranged into a de-
fined three dimensional structure. The precise interactions between amino acids
in the three dimensional structure of a protein are the hallmark of the functions
that they are able to carry out. For example, these interactions allow proteins
to make contacts with small molecule ligands such as drugs. Figure 1 shows an
example protein, lysozyme, to which an inhibitor ligand is bound (shown in ma-
genta). Thus, knowing the three-dimensional shape of proteins has implications
not only for the fundamental understanding of protein function, but also for
applications such as drug design and discovery.
Obtaining three dimensional structures of proteins experimentally is not
straight forward. X-ray crystallography and Nuclear Magnetic Resonance (NMR)
spectroscopy can accurately determine protein structures; but these methods are
labor intensive, time consuming, and for many proteins are not applicable at all.
Therefore, predicting structural features of proteins from a sequence is an im-
Search WWH ::




Custom Search