Biomedical Engineering Reference
In-Depth Information
In automated gene sequencing, purified genomic or complementary DNA is first fragmented by
restriction enzymes, and these fragments are separated by size on a gel. This is followed by isolating
single fragments and using the Sanger chain-termination method to sequence each fragment
individually with chain-terminating ddNTS (dideoxy nucleoside triphosphates) labeled with
fluorochromes according to the base present. For example, a green fluorochome is typically used for
Adenine (A), red for Tyrosine (T), blue for Cytosine (C), and yellow for Guanine (G).
The fragments arising from the Sanger method are then separated by size through polyacrylamide
gel electrophoresis. A scanning argon laser is used to excite the fluorochromes attached to the
ddNTP, terminating the different fragments and thus identifying a sequence for the fragment. The
sequence data are then stored in a database for later analysis.
Before examining the gene sequencing process in terms of computer-mediated and enabled control,
information archiving, asynchronous communications, and numerical processing, consider that the
data can be characterized as:
Valuable — Because the sequencing data are valuable, they are worth archiving for future
use and for sharing with others, whether internal to the R&D laboratory or through worldwide
publication. In this example, the equipment for sequencing is typically in the $300K range,
with additional funds required for trained personnel and supplies. As such, the replacement
cost for data inadvertently lost can be significant.
l
Plentiful — A single gene sequencing run can produce thousands of data points, and
sequencing a gene can result in millions of data points.
l
Incomplete — Even though data are plentiful, they are often considered incomplete because
even though the nucleotide sequence of a genome may be nearly complete, there are
typically major gaps in data on the proteins that code from the DNA or RNA sequences.
l
Of questionable quality — Even though the sequencing process may be under computer
control, there are limits of data accuracy, repeatability, precision, and reliability. There is a
variety of potential error sources that can affect the quality of data, from failure of the
detector to register florescent dyes correctly to inconsistencies in pattern matching.
l
Now, consider the four basic application areas of computers in bioinformatics, summarized in Table 1-
1 and described in more detail there.
Control
As noted in Table 1-1 , control encompasses technologies including equipment control, robotics, and
automatic data collection. For example, the typical gene sequencing machine, like most automated
laboratory equipment, is under the control of an embedded computer. Everything from timing the
overall process to recording the fluorescing colors as the dyes on the DNA fragments are excited by
the laser is controlled by a computer that is an integral part of the underlying electronics. Not only
would it be practically impossible to manually track the tens of thousands of base sequences as they
are read by the optical scanner, but the computer-enabled pattern-matching function makes the
system tenable. Although a desktop computer can be used in control applications, most often
computer controllers are integrated or embedded into the device, and support a standard interface
for communications with an external PC.
Table 1-1. Application Areas of Computers in Bioinformatics. There is
considerable overlap in the technologies associated with each application
area.
Search WWH ::




Custom Search