Biology Reference
In-Depth Information
biological approach that can be traced back to Carl von Linné's Systema
Naturae . Since classification has contributed so much to our under-
standing of the living world, the discovery of a new species or the defini-
tion of a new medical syndrome is rightly considered as a scientific
achievement in its own right. A relatively recent article in Nature on the
discovery of a new mammalian species 4 documents this view.
1.2. DNA Motifs from a Physical Perspective
To a physicist, the definition of a taxonomic entity hardly represents the
endpoint of a research project. The physical approach aims at causal rela-
tionships between observable events, and at quantitative models that
can predict the outcome of experiments. Surprisingly, DNA motif dis-
covery has found important applications in such a research setting too.
A classical example is the characterization of transcription factor binding
sites, where the DNA motif becomes a quantitative model to predict
the binding energy of a protein-DNA complex (Fig. 2). In fact, the
Pribnow box mentioned before is also a part of a DNA-protein binding
site, the one recognized by bacterial RNA polymerase. The Berg and
von Hippel 5 statistical mechanical theory of protein-DNA interactions
provides a connection between motif complexity (conservation) and
binding energy. Interestingly, the standard descriptor used for repre-
senting a protein binding site, the energy matrix, is mathematically
equivalent to the weight matrix used in de novo discovery of evolution-
arily conserved motifs. However, the logic of the scientific inference
process that leads to the definition of the matrix is reversed; here, the
starting point is a known molecular function and the endpoint is an ini-
tially unknown motif which can be considered as the “genetic code” for
the function.
From a computational chemistry viewpoint, an energy matrix for
a transcription factor binding site is a special case of quantitative
structure-activity relationship (QSAR) model. 6 There is a wealth of
literature about QSAR models that is only sparsely cited in DNA motif
discovery papers. Machine learning methods exploiting quantitative
activity data have been widely used in the QSAR field. Interestingly, the
first weight matrix-like structure used for representation of a nucleic acid
Search WWH ::




Custom Search