DNA Structure (Molecular Biology)

The central roles of deoxyribonucleic acid (DNA) in life, especially those associated with the storage and transfer of genetic information, are now well established. The discovery of the structure of DNA double helix by Watson and Crick nearly half a century ago ushered the modern biology into a new era and forever changed the landscape of biology. It became clear that the three-dimensional structure of DNA is intimately associated with its function. Recently, the rapid advances in the ability to determine the structures of biological macromolecules has produced a great wealth of information. In the field related to DNA structure, we begin not only to understand the subtle, yet important, sequence-dependent conformation associated with the canonical Watson-Crick DNA double helix, but also to discover new structural forms of DNA and to visualize how DNA interacts with important biomolecules, such as proteins and drugs. A better understanding of the structural basis for the function of DNA will facilitate the rational design of compounds that contribute to improve the quality of life. The out-pouring of the DNA sequence information through various genome projects has made the effort of getting structural information associated with DNA an urgent one.

1. Chemical Structure

DNA is a biological polymer made of the deoxynucleotide building blocks (Fig. 1). Nucleotides, nucleosides, and nucleobases are discussed in that entry. The polynucleotide chain has a directionality due to the specific internucleotide phosphodiester linkage between the O3′ and O5′ atoms from two neighboring deoxyriboses. By convention, a polynucleotide chain is described as going from the 5′ to the 3′ direction. DNA oligonucleotides can now be synthesized routinely for a wide range of applications, including structural studies (see DNA Synthesis). For convenience, DNA oligonucleotides are denoted with their sequence in a form such as d(CGCGAATTCGCG).

Figure 1. The torsion angles (a, b, g. d. P. z) of a polynucleotide sugar-phosphate backbone and the glycosyl torsion angle c are shown. The diagram show a fragment of B-DNA; therefore the deoxyribose conformation is C2′-endo, the glycosyl angle c is anti, and the a/z combination is gauche /gauche-.

DNA is a polyelectrolyte because of the negative charges associated with the phosphate groups. The negatively charged DNA is neutralized by the positive charges of metal ions, polyamines, or proteins. Metal ions, such as sodium, potassium, or magnesium ions, are used in the screening of the DNA negative charges by interactions with the phosphate oxygen atoms, in enzyme reactions (eg, 2+ Mg in DNA polymerase’s function), or in the folding of more complex structures, such the guanine quartet structure.

The sugar-phosphate polynucleotide chain consists of many single bonds around which the attached atoms may rotate. The definition of the torsion angles of the DNA backbone is shown in Figure 1. The deoxyribose ring is nonplanar, and adopts two preferred "puckers," namely, C2′- endo and C3′-endo conformations. Two parameters, the sugar pseudorotation phase angle and its maximum torsion angle (ie, pseudorotation amplitude), are normally used to describe the puckering modes available in terms of the sugar torsion angles. Two preferred orientations (anti or syn) of the base with respect to the sugar (as defined by the torsion angle about the CI’—N glycosyl bond c) are found.

DNA predominantly exists as an antiparallel double-stranded helix. The two strands are helically coiled, which maximizes the exposure of the negatively charged sugar-phosphate backbone to water and shields the hydrophobic aromatic bases in the middle from water. In the Watson-Crick base pairs of guanine with cytosine (G:C) and adenine with thymine (A:T), the specificity of base pairing is provided by hydrogen bonds. It should be noted that other types of base pairs are also known to exist, and they often are involved in unusual DNA structures.

2. DNA Double Helices

The polymorphism of DNA associated with different helical forms, under the influence of different environmental conditions, has begun to emerge in recent years. Much of the detailed structural information of DNA has been obtained through the high resolution X-ray crystallography analysis and nuclear magnetic resonance (NMR) spectroscopic analysis of DNA oligonucleotides with defined sequences. A survey of the Nucleic Acids Database (site currently unavailable) and their complexes with proteins and drugs indicates that hundreds of crystal structures are available. Additional structures solved by NMR can also be found in the Brookhaven Protein Database (http://www.pdb.bni.qov/). The structural features of the three common types of DNA helices— namely, the B-DNA, A-DNA, and Z-DNA—are illustrated in Figure 2, detailed in Table 1, and described in more detail in the individual entries.

Figure 2. End view and side view of the B-DNA, A-DNA, and Z-DNA helices. Note the decrease in diameter, as well as the relative positions of the base pairs and backbone to the helix axis.

Table 1. Structural parameters of B-DNA, A-DNA, and Z-DNA

Structural Parameter Helical Sense	B-DNA Right-handed	A-DNA Right-handed	Z-DNA Left-handed
Repeat unit (bp)	1	1	2
Base pair/turn (degrees)	10.4	11	12
Tilt (degrees)	~ 0	19	-9
Rise per base pair (A)	3.3	2.3	3.7
Helical pitch (A)	34	25.4	45
Glycosyl angle	anti	anti	anti at C/syn at G
Sugar pucker	C2 ‘ --endo	C3 -endo	C2 -endo at C
			C3 -endo atG
Phosphate conformation	-40/-98°	-88° /-44°	-146° /80° at C
(a/z)

			60° /-58° at G
Helical diameter (A)^a	~ 20	~ 25	~ 18
Major groove	Wide and deep	Narrow and	Flattened
		deep
Minor groove	Narrow and	Wide and	Narrow and
	deep	shallow	deep

^a Conversion factor for angstroms to meters is 1.0 x 10 .

3. Hydration

The hydration environment around a DNA double helix plays an important role in determining the type of conformation adopted and in determining other properties. Recent structural work suggested that some proteins, such as the Trp repressor (see TRP Operon) and the EcoRI restriction enzyme, recognize DNA sequences through direct hydrogen bonds, nonpolar contacts, indirect structural effects, and, surprisingly, water-mediated interactions. Thus specific water molecules play critical roles in the sequence-specific recognition by proteins. More recently, it was found that water molecules play a different role in that they modulate the binding of sequence-nonspecific DNA-binding proteins (eg, Sac7d) to DNA of random sequence (1). Thus it is now generally accepted that the hydration shell surrounding the DNA molecule plays an important role in DNA recognition by proteins and other ligands, such as DNA-binding anticancer drugs.

4. Novel DNA Structures

Certain sequences of DNA form three-dimensional structures that are not the common A-, B- or Z-DNA double helices. Those sequences form higher order structures such as a hairpin loop, triple-stranded structures, tetrastranded structures, and cruciform. They often involve nonstandard base pairs. For example, the self-pairing of guanine bases are found in the tetrastranded guanine-quartet structure. Specific functions have already been identified for some of these higher order structures. The known multistranded structures include the Guanine Quartet, the I-Motif, the triple helix, and the Cruciform (Holliday Junction), which are described in the individual entries. Some of the novel double-stranded structures are discussed here.

4.1. Bend DNA

Certain DNA sequences have abnormal mobilities in gel electrophoresis. For example, DNA fragments having repeats of (A)n nucleotides (with n > 4), separated by another four to five nucleotides and phased with the helical repeat, migrate in the gel significantly more slowly than do those having mixed sequences. It was discovered that the 5′-AAAAA sequence has an intrinsic bending property that can be demonstrated by the increased efficiency of cyclization of those bend DNA fragments. The molecular basis of the intrinsic bendability of the (A)n sequence has been investigated. The high propeller twist associated with the A-T base pair in the (A)n:(T)n sequence may play an important role in this property. The bend in those sequences is relatively smooth, resulting in a curved DNA structure. The opening (roll) of the bend is toward the minor groove. Many proteins induce such a smooth bending, by having many small single-step bends, which has been observed in the crystal structures of a number of protein-DNA complexes, exemplified by the structure of the 434 repressor-DNA complex (Protein Database accession number PDR015) shown in Figure 3a.

Figure 3. Two types of DNA bending modes. (a) Smooth bending found in the 434 repressor-DNA complex (Protein Database PDR015). (b) Sharp kink found in the Sac7d-DNA complex (Protein Database 1AZP).

Another type of bend DNA is found to have a sharp kink at a localized site, usually caused and stabilized by DNA-binding proteins bound to DNA. A relevant example is found in a recent structure of the complex between a chromosomal 7-kDa protein Sac7d from the hyperthermophilic archaeabacterium Sulfolobus acidocaldarius and DNA oligonucleotides (1). The DNA is kinked sharply at the C2pG3 step in the Sac7d-GCGATCGC complex (Figure 3 b). The sharp kink is caused by the intercalation of the side chains of amino acid residues Val26 and Met29 of the Sac7d protein into DNA base pairs from the minor groove direction, widening the minor groove at this step.

This type of sharp DNA kink has been observed in the complexes of TATA-box binding protein (TBP), and two HMG-box containing proteins, LEF-1 and SRY, with their cognate-specific DNA sequences. Remarkably, both Sac7d and TBP use amino acids on a beta-sheet for the intercalation, whereas in LEF-1 and SRY use the amino acids located at the corner of the alpha-helical L-shaped HMG box for the intercalation. A more thorough discussion of the structural basis of the various types of DNA bending has appeared recently (2).

4.2. Triplet-Repeat Sequences

Recently a number of human genetic diseases have been correlated with expansions of triplet repeats of the DNA sequence (CNG) The (CGG)n repeat in the X-chromosome is responsible for fragile-X syndrome, the (CAG)n repeat is associated with Huntington’s disease and spinobulbar muscular atrophy, and finally the (CTG)n repeat is associated with myotonic dystrophy. How these unusual repetitive sequences correlate with the etiology of these diseases, and the mechanism by which those repeats are expanded during DNA replication, are under intense scrutiny. Some have proposed that a "slippage" process occurs because of the ease of the formation of hairpin structures for these repeating sequences. Certain triplet repeats, such as (CAG)n and (CTG)n, but not (CGA) have a strong propensity to form hairpin structures. Therefore, a DNA duplex encoding the (CAG) n:(CTG) n repeats may easily exchange between duplex and cruciform, especially under negative supercoiling strain. If there are proteins or other ligands (eg, drugs) that can stabilize the stem of the cruciform, this process would be inhibited. The three-dimensional structures of several DNA oligonucleotides associated with those novel triplet repeats have been studied, primarily by NMR. The (CAG)n repeat can form a stable duplex structure incorporating "sheared" G-A mismatched base pairs. The duplex structure associated with the (CCG)n repeat appears to have the C nucleotides extruded from the helix. The structural and functional studies of those unusual repeats remain very active (3). (See also Trinucleotide Repeats.)

4.3. Parallel-Stranded DNA Duplex

A new addition to non-canonical DNA structures is the parallel-stranded (PS) DNA structures. The question of whether a stable DNA duplex can be parallel has been addressed previously (4). A series of A,T-containing DNA sequences was designed to form parallel duplexes using reversed Watson-Crick base pairs. The stability of those PS duplexes is modest; for example, the Tm (melting temperature) of a 21-mer PS-duplex is 15°C lower than that of the corresponding antiparallel duplex. A different motif was the non-Watson-Crick homo base-paired parallel-stranded DNA, called P-DNA. It was demonstrated that the d(CGA) sequence has a strong propensity to form the P-DNA structure (5).

An important requirement for a hetero base-paired parallel duplex is that the two glycosyl bonds within a base pair have to come from opposite directions, because of the identical chain polarity. For the normal nucleic acid bases, this can be accomplished using the reverse Watson-Crick base-pair conformation. However, A-T and G-C base pairs in a reverse Watson-Crick conformation are not isostructural, due to their hydrogen-bonding restrictions. Therefore, it has not been easy to design a stable PS duplex in which all four bases can be incorporated in random order.

This difficulty has been overcome by using alternative nucleosides, 2′-deoxyisoguanosine (iG) and 2′-deoxy-5-methyl-isocytosine (iC), which can form stable reverse Watson-Crick base pairs with the normal 2′-deoxycytosine (C) and 2′-deoxyguanosine (G), respectively. Indeed, oligodeoxynucleotides containing iG and iC can form remarkably stable parallel-stranded duplexes with the complementary (G,C)-containing DNA or RNA strands (6). The ability of (iG,iC)-containing DNA oligomers to form specific stable parallel-stranded duplexes may offer new opportunities for designing useful probes for applications such as antisense or aptamer molecules.

5. Summary

In conclusion, the importance of DNA structure and dynamics in biology is very clear. The rapid advancement in genomics, molecular biology and structural biology (including synchrotron) offers an exciting future in the investigation of the protein and nucleic acid structures associated with new and significant biological functions. One can now attack problems that were unthinkable just a few years ago. The structure of the nucleosome has been determined at 2.8 A resolution and the detailed DNA conformation has been presented (7). The structure of the ribosome particle is on its way to be elucidated at a resolution high enough to visualize individual proteins and RNA. It can be certain that many structures of novel DNA sequences and important protein-DNA complexes will be forthcoming at a rapid pace in the next few years.