Biomedical Engineering Reference
In-Depth Information
0.010
FIGURE 1.41
Mutual Information calculated for pri-
mate exons (upper line) and for primate
introns (lower line) longer than 500 base
pairs (bp) as a function of separation
length k bp. Using a finite length correc-
tion, each DNA sequence was cut into a
500-bp piece and the mutual information
function value (DNA codon frame) calcu-
lated and position averaged to present
the arithmetic mean of the functions.
Reprinted from Grosse, I., Herzel, H.,
Buldyrev, S.V., Stanley, H.E. (2000).
Species Independence of Mutual
Information in Coding and Noncoding
DNA. Phys. Rev. E 61:5624-5629. With
permission of the American Physical
Society.
0.008
0.006
0.004
0.002
0.000
0
20
40
60
80
100
Distance k (bp)
correct classification (114). However, as of yet, there is no algorithmic method that can, a
priori, predict all of the gene sequence locations within a string of raw nonannotated DNA
from any organism. Therefore, the identification of all genes in an organism must still be
made ultimately by experimental-based approaches. Once genes are defined, a consider-
ably more difficult problem remains of how they are regulated within cells to carry out their
specific functions. Then, once all of the important regulatory control elements have been
identified for a given gene, the intelligent properties of prediction and notification can be
designed into specific DNA systems containing that gene, usually via combination of the
DNA with other small ligands or sequence-specific recognition and control proteins that
include reporter functions, allowing the gene system to function as part of a biosensor.
1.3.2
Redundancy of Single Base Repeating Tracts—The Simplest Repeating Sequences
For decades preceding the Human Genome Project, human DNA and many other eukary-
otic DNAs (higher organisms that possess a distinct nucleus) had been known to possess
repetitive sequences of varying unit lengths and repetition frequencies. Therefore, to a first
approximation, naturally occurring DNA possesses the property of redundancy .
Historically, repetitive sequences in eukaryotic organisms were first discovered through
analysis of the organism's DNA reassociation kinetics following DNA denaturation (115).
These studies showed that human DNA repetitive sequences were not remarkable, in that
they resembled sequence repetition frequency distributions from many other organisms
(116,117). More recently, higher resolution views of repetitive sequences have been
obtained through restriction enzyme analysis and direct DNA sequencing. By both meth-
ods, a range of repetitive sequences of varying repeat lengths (unit repeat sizes from as
few as one to hundreds of thousands of base pairs) and repetition frequencies (from a few
to millions of copies/genome) have been characterized in a wide range of eukaryotic
organisms, including humans.
The simplest possible repeating sequences in any DNA genome are the homopolymer
repeats (tracts)—where a single base is repeated on one single strand and its base com-
plement repeats on the opposite strand. In addition to the obvious minimal information
content within homopolymer tracts, these sequences have been found to possess unusual
secondary structures and physical properties (118). These and other short repeats, 2-6 bp
in unit length, are thought to arise within organisms to their equilibrium frequency
Search WWH ::




Custom Search