Information Technology Reference
In-Depth Information
that the corresponding gene is activated in the sample in question. Brighter fl uores-
cence indicated a higher degree of activation. Spots that do not light up, because no
labeled RNA stuck, indicate that the corresponding gene was not active in the given
sample. In this way, researchers can compare the cellular activity of, for example,
skin vs. muscle, or healthy lung vs. cancerous lung tissue. Functional relatedness
can also be hypothesized for genes that are expression at the same time, under the
same conditions.
A similar technique can be used to determine specifi c genotypes. Both versions
of the variable portion of a gene are printed on an array, and then the relative levels
of bound RNA compared, suggesting heterozygosity (two different nucleotides, one
from each parent, e.g. AG) or homozygosity (two of the same nucleotide), and for
which base (i.e. AA vs. GG). This information can then be used for genome-wide
association studies (GWAS) in which a population with a given phenotype is com-
pared to a control population. Each observed SNP (single nucleotide polymorphism)
is evaluated for statistical enrichment in cases versus controls. Enrichment for a
given genotype in one group or the other suggests a causal mutation in that area of
the genome. This approach has a number of limitations. One is that only a fi nite
number of SNPs are printed on a given array (on the order of one to two million),
generally refl ecting relatively common variants that have already been identifi ed as
polymorphic. Only those SNPs that are printed on the array can be directly observed,
but they may not be the actual causal mutation, or even in the same gene as the true
mutation. In this sense, GWAS only provides a “guilt-by-association” approach to
deciphering the underlying mechanism of disease. Due in part to this drawback, as
well as the rapidly dropping cost of sequencing, genotyping is increasingly being
performed through sequencing as an alternative to the chip-based approach. In addi-
tion, GWAS suffers from what is known in quantitative sciences as the “curse of
dimensionality.” That is, by its very nature it entails multiple hypothesis testing—as
many as one to two million hypotheses, in fact. Correcting for this degree of multi-
ple hypotheses can make it very diffi cult to detect actual signal.
Another key technological advance, widely adopted beginning in 2004, was next
generation (“NextGen”) sequencing. NextGen sequencing offers signifi cant advan-
tages over Sanger sequencing, the method used for both early human genome
sequencing projects. In both cases, DNA is amplifi ed and then cut into millions of
short, overlapping fragments. These individual fragments are sequenced, and then
informatics techniques are used to “stitch” together the “reads,” or short sequences,
into long contiguous sequences. Sanger sequencing is based on “DNA chain termi-
nation,” which relies on selective incorporation of chain-terminating bases, and then
separation of different sized molecules using gel electrophoresis. The process is
relatively slow and expensive, but it is still used for small-scale projects and in cases
where long contiguous reads are desired since the Sanger approach can produce
reads up to 1,000 bases in length. NextGen sequencing enables sequencing to be
done in a massively parallel manner, speeding up the process to where an entire
human genome may be sequenced in a matter of days. Instead of identifying the
sequence of bases through chain-termination followed by separation in a gel, the
various fl avors of NextGen sequencing identify nucleotide sequences by synthesiz-
Search WWH ::




Custom Search