Biomedical Engineering Reference
In-Depth Information
involving regulatory proteins and are the means by which intelligent properties are exhib-
ited by living cells and whole organisms. With this genomic regulatory information in
hand, the modern techniques of Molecular Biology make possible the rational modification
and augmentation of the intelligent properties of DNA and, as a result, its host organism.
In the cell, DNA is acted upon by numerous sequence-specific binding proteins as well
as enzymes that have evolved to carry out specific functions. These functions form part of
the array of intelligent properties that DNA possesses, as Figure 1.40 depicts in a schematic
fashion. In vivo, DNA is capable of copy number multiplication by multiprotein enzymatic
processes. In the laboratory setting, investigator-determined selective DNA sequence geo-
metric amplification is routinely carried out by the polymerase chain reaction (PCR) tech-
nology. Specific redundancy features can be designed into DNA regions as repeating
sequence motifs by taking advantage of modern cloning and PCR amplification tech-
niques. Through the action of specific repair protein systems, DNA is capable of self-diag-
nosis and repair functions that restore its native structure and sequence following
mutational errors. This occurs via the repair protein systems that have evolved to dis-
criminate correct from incorrect (mutations) base complementarity features of the double
helical structure. In contrast, there exist specific degradative enzymes that are capable of
completely degrading DNA, or other enzymes that can be used to make desired covalent
modifications to native DNA. One example of a naturally occurring modification is enzy-
matic methylation at the 5-position of the cytosine base located in the DNA major groove.
This DNA modification occurs naturally in bacteria and helps to protect those genomes
from bacteriophage (viruses that infect bacteria) infection. In higher organisms, methyla-
tion is used to regulate mRNA transcription of specific genes (112).
DNA forms the molecular basis of stored evolutionary control information at the cellular
and organism levels. Therefore, in the broadest sense, DNA is part of a macromolecular
system capable of learning or adaptation as a consequence of evolutionary molecular mech-
anisms. That DNA is the organism's repository of stored evolutionary information has been
known and accepted for decades. However, only within the past decade have large
numbers of DNA sequences in any given organism been characterized directly and at the
highest resolution by high-throughput sequencing. This detailed sequence knowledge has
allowed DNA to be characterized from the perspective of Information Theory. As measured
by entropylike metrics such as the Mutual Information Function, DNA in the gene-con-
taining regions from many genomes has been shown to have a significantly higher infor-
mation content than DNA in nongene-containing regions (113). This general property is
exactly the behavior one would expect since genes represent the specific information repos-
itories for protein sequences. The Mutual Information Function is a logarithmic-based
measure of the length-dependent persistence of base correlations (information) at all pair-
wise neighbor positions along a DNA sequence. It can be calculated in different DNA base
reading frames and averaged. As shown in Figure 1.41, this information correlation is high
and persists over long sequence lengths in primate exon- or gene-containing regions, but is
low in magnitude at all sequence lengths in primate nongene or intron regions. There is
nearly a threefold higher average mutual information value for exons compared with
introns. Another striking feature is the marked three-base periodicity of the exon mutual
information. This results from the nonuniform frequency of base usage in the three-base
frame positions of the nonoverlapping codons that comprise the exon's protein coding
information content. This mutual information periodicity is completely absent in the intron
sequences that lack any codon structure. Given this threefold difference in mutual infor-
mation, approaches like the Mutual Information Function metric can serve as the basis for
classifying raw DNA sequences by functional DNA sequence category—coding vs. non-
coding. In fact, we have carried out such classifications of human coding and noncoding
sequences using Information Theory-related approaches with a relatively high accuracy of
Search WWH ::




Custom Search