Biomedical Engineering Reference
In-Depth Information
Table 7.1 Quality control of sequence
data from 35 laboratories engaged in
sequencing yeast chromosome XI.
(Data from Dujon et al . 1994 with
permission from Nature , © Macmillan
Magazines Ltd.)
Total number
Total bp
Error %
Method of verification
of fragments
verified
detected
Original overlap between cosmids
28
63 424
0.02
Resequencing of selected segments
(3-5 kb long)
21
72 270
0.03
Resequencing of random segments
( 300 bp long)
71
18 778
0.05
Resequencing of suspected segments
( 300 bp long) from designed
oligonucleotide pairs
60
17 035
0.03
Total
180
171 507
Average error rate
0.03
accuracy have been summarized by Yager et al.
(1997). In general, a sequence read of
Analysing sequence data
350 nucleo-
tides at 99% accuracy can be expected using current
ultrathin-slab gel technology. Reading lengths in
excess of 1000 nucleotides have been reported
(Noolandi et al. 1993; Voss et al. 1995).
>
DNA sequence databases
Since the current DNA sequencing technology was
developed, a large amount of DNA sequence data
has accumulated. These data are maintained in
three databases: the National Center for Biotechno-
logy Information in the USA, the DNA Databank
of Japan and the European Bioinformatic Institute
in the UK (Benson et al. 1996, 1997, Stoesser et al.
1997, Tateno & Gojobori 1997). Each of these
three groups collects a portion of the total sequence
data reported worldwide and all new and updated
database entries are exchanged between the groups
on a daily basis. In addition, several specialized
genome databases exist, including seven for bac-
terial genomes: four for E. coli , two for B. subtilis and
one at the Institute for Genome Research, an organ-
ization responsible for the complete sequencing of a
number of genomes. Users worldwide can access
these databases directly via the Worldwide Web or
receive the information on CD-ROMs. The former
option is the best because it ensures that an up-to-
date database is being used. There are a number of
different sequence-retrieval systems and the best of
these are Network Entrez and DNA Workbench
(Brenner 1995).
Whole-genome sequencing
As noted in Chapter 1 (p. 2), many different genomes
have been completely sequenced and the list includes
viruses, bacteria, yeast, Caenorhabditis, Drosophila,
Arabidopsis and humans. A detailed description of
the methodology used for sequencing these genomes
is outside the scope of this topic and the interested
reader is referred to the sister publication Principles
of Genome Analysis (Primrose & Twyman 2002).
Suffice is to say that the underlying principle is to
subdivide the genome into small fragments of a
size suitable for sequencing by the methods just
described. Provided that the fragments overlap, the
individual sequences can then be assembled into
the complete genome sequence. However, the scale
of the task of complete sequence assembly can be
gauged from a comparison of the length of frag-
ment that can be sequenced (600 -1000 nucleotides)
with the number of such fragments in the genome
(
>
3 million for the human genome)!
 
Search WWH ::




Custom Search