Chain-Termination (Dideoxy) DNA Sequencing (Molecular Biology)

The relative length of a DNA chain is determined easily and accurately by gel electrophoresis, and the most common types of DNA sequencing use methods that map sequence information to DNA chain lengths. For chain-termination DNA sequencing, this is done by synthesizing the complement of the DNA using a DNA polymerase under conditions that terminate synthesis at sites where only one of the four bases occurs. Thus, all sequencing experiments are done in three steps. First, a single pure DNA segment is isolated for sequencing. Second, this DNA is used as a template for synthesis catalyzed by a DNA polymerase with mixtures of normal and chain-terminating nucleotides. Finally, the products of this synthesis are separated according to size by gel electrophoresis. Numerous variations on each of these steps are commonly used.

1. Isolation of Specific Segments of DNA

Most of the techniques of molecular biology rely on isolating specific segments of DNA, and sequencing is no exception. In fact, the first practical application of chain-termination sequencing relied on cloning specific DNA segments using vectors derived from M13 bacteriophage. These bacteriophages contain a single-stranded DNA chromosome that accommodates inserts of more than 5000 bases. Isolating the single-stranded DNA in pure form from these phages is simple and inexpensive, so these vectors are still commonly used. Similarly, virtually all plasmid vectors are commonly used for DNA sequencing, and essentially any clone of up to about 200 kb can be sequenced by cycle sequencing techniques, provided sufficient DNA can be purified. Another popular way of isolating DNA for sequencing is by using the polymerase chain reaction (PCR). With nested PCR primers, it is now relatively simple to amplify segments of genomic DNA for direct sequencing in a matter of hours.

2. Chain Termination Reactions

Chain termination reactions require the isolated template DNA, a suitable primer, 2′-deoxynucleoside triphosphates (dNTPs), 2′,3′-dideoxynucleoside triphosphate (ddNTP) chain terminators, and a DNA polymerase. The most critical component is the DNA polymerase. Many DNA polymerases have been used for sequencing, including those from eubacteria, such as the DNA polymerase I (Klenow fragment) of Escherichia coli, reverse transcriptases from retroviruses, such as avian myeloblastosis virus, polymerases from bacteriophage, such as T7 phage DNA polymerase, and polymerases from archaea, such as Thermococcus litoralis. Virtually all of these have been genetically or chemically modified to eliminate exonuclease activities or to improve reaction rates with the ddNTPs.

The most recent examples of polymerases specifically engineered for DNA sequencing are enzymes derived from Thermus aquaticus, Taq DNA polymerase. Two regions of this polymerase are modified to produce particularly effective polymerases for DNA sequencing. First, Taq DNA polymerase has 5′-3′ exonuclease activity, which degrades sequencing primers. The first 300 amino acid residues at the N-terminus of this enzyme are required for this exonuclease activity. Portions of this domain can be deleted, or the activity can be eliminated by point mutation. Secondly, normally Taq DNA polymerase is relatively inefficient at using ddNTPs. As discovered by Tabor and Richardson, this can be improved more than 104-fold by changing residue Phe667 to Tyr. This modification improves ddNTP usage, and it also greatly improves the quality of the sequence data obtained. Native Taq DNA polymerase produces sequencing bands that vary in intensity more than 15-fold, depending on the nearby sequence. In contrast, Tyr667 polymerase produces bands that vary in intensity by less than threefold. This makes interpretating the results of the electrophoretic separation much more accurate. A number of polymerases that have this modification are now commercially available, as is T7 DNA polymerase, which naturally has a tyrosine at the corresponding position.

3. Cycle Sequencing

Cycle sequencing is the process of using repeated cycles of thermal denaturation and polymerization to produce greater amounts of product in a DNA sequencing reaction. The amount of product DNA increases linearly with the number of cycles. (This distinguishes it from PCR, which uses two primers so that the amount of product increases exponentially with the number of cycles.) During each cycle, the thermostable DNA polymerase extends the annealed primer molecules, typically at 60° to 70°C. The mixture is heated above the melting temperature of DNA (95°C), dissociating the extended primer from the template. Then, the mixture is cooled, allowing another molecule of primer (which is present in excess) to anneal to the limited supply of template. Further cycles of extension and denaturation result in producing much more extended primer than the amount of template used. This improves the sensitivity of the sequencing experiment, and it also allows ready use of double-stranded templates for sequencing. Generally, cycle sequencing works much more reliably over a wider range of template concentrations than noncycled protocols. This accounts for its nearly universal application for large-scale DNA sequencing projects.

4. Methods for Labeling DNA Sequences

The products of the chain termination reactions must be labeled for all practical DNA sequencing methods. The original label was a- P dATP that was simply added to the chain-termination reaction. Newly synthesized DNA was labeled with radioactive phosphorous, and detected by simple autoradiography. More recently, the lower energy isotopes P and S (in the form of a-thio-dATP) have been used because they generate autoradiograms with higher resolution. These offer the advantage of using less total radioactivity than other methods. In addition, only specifically terminated, elongated DNA chains are labeled and therefore visualized by autoradiography, which eliminates the background bands and stops normally observed on DNA sequencing autoradiograms and results in extremely clean sequence data.

Automated, fluorescent DNA sequencing methods were introduced in 1987 and have become essential tools for large-scale sequencing efforts. The sequence products used by these automated systems are labeled by fluorescent primers (dye primers) or fluorescent dideoxynucleotides (dye terminators). These have been used in single-color detection instruments, and in four-color multiplex instruments in which the four bases are distinguished by color. Recent innovations include fluorescent dye-labeled DNA primers that exploit fluorescent energy transfer to optimize the absorption and emission properties of the label. These primers carry a fluorescein derivative at the 5′-end as a common donor and rhodamine derivatives attached to a modified thymidine within the primer sequence as acceptors. Adjustment of the donor-acceptor spacing by placing the modified thymidine in the primer sequence allows generating four primers. All have strong absorption at a common excitation wavelength (488 nm) and efficient fluorescent emission at 525, 555, 580, and 605 nm. These improve the sensitivity and accuracy of the automated sequencing system.

Fluorescent dye-labeled ddNTP terminators have also been used extensively for DNA sequencing, and those that use the energy-transfer principle are also commercially available. Like the radio-labeled terminators, they have the advantage of labeling only specifically terminated, elongated DNA chains, so that background bands are eliminated.

5. Electrophoresis and Automated Sequencing

The high-resolution separation of DNA fragments by size is essential for all sequencing methods. For radioactively labeled DNA sequencing experiments, this is done by using gels cast in glass plates that are 0.2 to 0.4 mm thick, 40 to 80 cm long, and wide enough to accommodate 32 to 96 samples. Typically, the gels are 4 to 8% polyacrylamide cross-linked with N,N’-methylene bisacrylamide (see Polyacrylamide) and contain tris borate buffer (0.089 M, pH 8.3) and 7 to 8 M urea. After electrophoresis for 2 to 18 hours, the gels must be removed from the glass plates for autoradiography and reading of the sequence of 200 to 400 nucleotides. Because these gels are cumbersome to make and use, considerable effort has been made to improve separation methods. The most commonly used methods involve a sensitive fluorescent detection instrument that continuously monitors the migration of fluorescent-labeled DNA past a fixed position on the gel. The results are collected and evaluated directly by computer, producing finished sequence information. This saves considerable labor in "reading" the sequence from the gels and improves the resolution sufficiently to read 500 or more bases routinely from a single sequence experiment. Noncross-linked "gels" have also been introduced that run in 50 to 100 micron diameter, 40 to 70 cm long capillaries with fluorescent detection. The efficient heat transfer of these electrophoresis media allow faster, high-resolution separations.