Information Technology Reference
In-Depth Information
Platonic Octahedron
2
Strings and Genomes
Abstract. Strings constitute the mathematical structures of informational biopoly-
mers. Genomes are long strings (hundreds of thousands, or millions, or billions
of characters) built over the four nucleotides, and many typical operations over
genomes are naturally expressed by string operations. In this chapter we present
basic concepts about DNA molecules and genomes in algorithmic terms, by empha-
sizing the roles of strings, formal languages, and multisets of strings in the anal-
ysis of typical biological and biotechnological DNA manipulations. We conclude
by outlining some research lines of genome analysis which are based on genomic
dictionaries. The chapter is mostly based on the author's published papers (see
References for Chapter 2 ).
2.1
Biological Monomers and Polymers
Words of written alphabetic languages are the most usual intuition of symbolic lin-
ear forms. However, if we were to create, at a biomolecular level, structures similar
to words, then we would experience an essential difficulty. In fact, letters are ar-
ranged over an external rigid support (paper) maintaining their linear arrangement
stable and robust despite the movements of the support, while molecules are float-
ing in a liquid environment. Therefore, we need a different way to arrange them. In
other words, the linearity has to be implemented by means of a feature internal to
molecules. This is the reason for the following structure which is common to the
most important biological monomers: body, head, tail, flag, and bridge . The body is
the component of the monomer to which head, tail, and flag are connected. The link
 
Search WWH ::




Custom Search