Real-time DNA sequencing (Genomics)

1. Introduction

DNA sequence information provides insights into a wide range of biological processes. The order of bases in DNA implies the order of bases in RNA and, consequently, the amino acid sequence of protein. DNA sequence specifies a molecular program that can lead to normal development or the manifestation of a genetic disease such as cancer. DNA sequence information also has the potential to instantly and conclusively identify a pathogen (or variation thereof), or uniquely identify and genetically characterize an individual.

The core elements required for DNA replication in a test tube include DNA polymerase, deoxynucleotide triphosphates, template, and primer in a buffer that promotes the activity. DNA synthesis occurs when the primer’s 3′-end attacks the a-phosphate of the incoming nucleotide, which is complementary to the template strand. Of the three phosphates within the nucleotide, only the a-phosphate becomes part of the DNA strand. The j- and y -phosphates (pyrophosphate, PPi) are released into the solution.

Approximately 30 years ago, Sanger and colleagues developed a sequencing method that exploited the basic biochemistry of DNA replication (Sanger et al., 1977). Of particular importance for their method are the facts that DNA polymerase can incorporate a dideoxynucleotide triphosphate (ddNTP; a nucleotide analog lacking a 3′ OH), and that, once incorporated, additional nucleotide incorporation is not possible. Importantly, ddNTPs are incorporated by the polymerase using the same base incorporation rules that dictate incorporation of natural nucleotides. The reaction products are size-separated and examined to deduce the DNA sequence information.


The first human genome was sequenced using variations of Dr. Sanger’s chemistry and important breakthroughs in instrumentation and process automation (Lander et al., 2001; Venter et al., 2001). This first human genome sequence has sparked a new era in genome analysis. Identifying differences in the genetic code that make each of us unique is the next challenge. Given current cost estimates of $10-$25 million to sequence a single human genome, it is unlikely that the large numbers of human genomes needed to identify these important differences will be completed using Sanger-based sequencing methods, and even less likely that this chemistry will be used to enable the promise of whole genome analysis for medical purposes (personalized medicine).

The desire to examine differences between genomes is so great that an industry directing analysis to regions previously associated with genetic variation has emerged. Single nucleotide polymorphism (SNP) analyses essentially involve skimming genomic information from predetermined regions owing to cost limitations and time constraints of current DNA sequencing methods. Ultrahigh throughput sequencing will enable a more comprehensive form of genetic variation detection that does not begin with assumptions. The fundamental importance of DNA sequence information drives researchers to continually strive to improve the efficiency and accuracy of sequencing methods.

2. A massively parallel, real-time sequencing strategy

We are developing a sequencing platform that will enable a more comprehensive form of genetic variation detection. Cutting-edge technologies, including single-molecule detection, fluorescent molecule chemistry, computational biochemistry, and biomolecule engineering and purification, are being combined to create this new platform. Our approach may make it easier to classify an organism or identify variations within an organism by sequencing the genome in question.

The basic biochemistry of DNA replication is being exploited in a new way to develop a radically different method to sequence DNA. DNA polymerase and nucleotides triphosphates are being engineered to act together as direct molecular sensors of DNA base identity at the single-molecule level. The general strategy involves monitoring real-time, single-pair fluorescence resonance energy transfer (spFRET) between a donor fluorophore attached to a polymerase and a color-coded acceptor fluorophore attached to the y -phosphate of a dNTP (5′ fluorescently modified y -phosphate) during nucleotide incorporation and pyrophosphate release (Figure 1). The purpose of the donor is to stimulate an acceptor to produce a characteristic fluorescent signal that indicates base identity (emission wavelength and intensity provide a unique signature of base identity). Equally important to our technology are the massively parallel arrays of nanomachines created to produce the unprecedented throughput of the sequencing system. Projected sequencing rates approach 1 million bases per second – rather than per day – per instrument; almost a 100 000-fold increase over current throughput.

The sequencing platform incorporates a laser that is tuned to excite the donor fluorophore. A spFRET-based strategy increases signal-to-noise by minimizing acceptor emission until the acceptor fluorophore is sufficiently close to a donor fluorophore to accept energy. Incorporating total internal reflectance fluorescence (TIRF) into the platform further increases signal-to-noise, since most of the labeled dNTPs in solution are not within the TIRF excitation volume and are, therefore, not directly excited by the incident light.

As an acceptor-labeled dNTP approaches the donor-labeled polymerase, it begins to emit its signature wavelength of light owing to energy transfer from the donor (they participate in spFRET), and the intensity of this fluorescence increases throughout the nucleotide’s approach. The molecules are engineered to maximally

Real-time detection of dNTP incorporation. Components of the VisiGen Sequencing System include modified polymerase, color-coded nucleotides, primer, and template. Energy transfers from a donor fluorophore within polymerase to an acceptor fluorophore on the y-phosphate of the incoming dNTP, stimulating acceptor emission, fluorescence detection, and incorporated nucleotide identification. Fluorescently tagged pyrophosphate leaves the complex, producing natural DNA. This nonserial approach enables rapid detection of subsequent incorporation events. Time-dependent fluorescent signals emitted from each complex are monitored in massively parallel arrays and analyzed to determine DNA sequence information.

Figure 1 Real-time detection of dNTP incorporation. Components of the VisiGen Sequencing System include modified polymerase, color-coded nucleotides, primer, and template. Energy transfers from a donor fluorophore within polymerase to an acceptor fluorophore on the y-phosphate of the incoming dNTP, stimulating acceptor emission, fluorescence detection, and incorporated nucleotide identification. Fluorescently tagged pyrophosphate leaves the complex, producing natural DNA. This nonserial approach enables rapid detection of subsequent incorporation events. Time-dependent fluorescent signals emitted from each complex are monitored in massively parallel arrays and analyzed to determine DNA sequence information.

FRET after the acceptor-dNTP docks at the active site of the polymerase (within the nucleotide binding pocket). During nucleotide insertion, the 3′ end of the primer attacks the a-phosphate within the dNTP, cleaving the bond between the a- and j-phosphates, and changing the spectral properties of the fluorophore (which remains attached to the PPi). Donor fluorescence is also informative, as it undergoes anticorrelated intensity changes throughout the incorporation reaction. As the acceptor-tagged PPi is released from the polymerase, the distance between it and the donor fluorophore increases, causing the intensity of the acceptor’s fluorescence to decrease and that of the donor’s to simultaneously increase. After an spFRET event, the donor’s emission returns to its original state and is ready to undergo a similar intensity oscillation cycle with the next acceptor-tagged nucleotide. In this way, the donor fluorophore acts as a punctuation mark between incorporation events. The increase in donor fluorescence between incorporations is especially important during analysis of homopolymeric sequences.

Next post:

Previous post: