Biomedical Engineering Reference
In-Depth Information
small parts of the genome. However, in the last couple of years huge
advances have been made in the science of DNA sequencing; collectively
called NGS (next-generation sequencing [5]).
There are several types of NGS technology, and heavy innovation and
competition in the area is driving down sequencing costs rapidly. Most
NGS technology relies on a process known as 'shotgun' sequencing. In
simple terms, the DNA to be sequenced is broken up into millions of
small pieces. Through a number of innovative processes, the sequences of
these short pieces of DNA are derived. However, these overlapping short
'reads' must be re-assembled into the correct order, to produce a fi nal
full-length sequence of DNA. If each of the short sequences is 100 bp
long, that would mean an entire human genome could be constructed
from 30 million of these. However, to successfully perform the assembly,
overlapping sequences are required, allowing gradual extension of the
master DNA sequence. Thus, at least one more genomes' worth of short
sequences is required, and in practice this is usually somewhere between
fi ve and ten genome's worth - known as the coverage level. At ten-fold
coverage, there would be a total of 300 000 000 short sequences or
30 000 000 000 base pairs to align. Even that is not the end of the story,
each of those 30 billion base pairs has confi dence data attached to it; a
statistical assessment of whether the sequencing machine was able to
identify the base pair correctly. This information is critical in shielding
the assembly process from false variations that are due to the sequencing
process rather than real human genetic variation. For an informatician,
the result of all of this is that each sample run through the NGS procedure
generates a vast amount of data (∼100 GB), ultimately creating
experimental data sets in excess of TB of data. Effi cient storage and
processing of these data is of great importance to academic and industrial
researchers alike (see also Chapter 10 by Holdstock and Chapter 11 by
Burrell and MacLean for other perspectives).
￿ ￿ ￿ ￿ ￿
22.3 Open source innovation
Many of the large projects that are starting to tackle the challenge of
unravelling human genetic variation are publicly funded. A great example
of such an effort is the '1000 genomes project' [6], which is looking to
sequence the genomes of well over a thousand people from many ethnic
backgrounds in order to give humanity its best insight yet into the genetic
variation of our species. Collecting the samples, preparing them and
 
Search WWH ::




Custom Search