Database Reference
In-Depth Information
The Human Genome Project and Reference Genomes
In 1953, Watson and Crick discovered the structure of DNA, and in 1965 Nirenberg, with
help from his NIH colleagues, cracked the genetic code, which expressed the rules for
translating DNA or mRNA into proteins. Scientists knew that there were millions of human
proteins but didn't have a complete survey of the human genome, which made it impossible
to fully understand the genes responsible for protein synthesis. For example, if each protein
was created by a single gene, that would imply millions of protein-coding genes in the hu-
man genome.
In 1990, the Human Genome Project set out to determine all the chemical base pairs that
make up human DNA. This collaborative, international research program published the
first human genome in April of 2003, [ 159 ] at an estimated cost of $3.8 billion. The Human
Genome Project generated an estimated $796 billion in economic impact, equating to a re-
turn on investment (ROI) of 141:1. [ 160 ] The Human Genome Project found about 20,500
genes — significantly fewer than the millions you would expect with a simple 1:1 model of
gene to protein, since proteins can be assembled from a combination of genes, post-transla-
tional processes during folding, and other mechanisms.
While this first human genome took over a decade to build, once created, it made “boot-
strapping” the subsequent sequencing of other genomes much easier. For the first genome,
scientists were operating in the dark. They had no reference to search as a roadmap for con-
structing the full genome. There is no technology to date that can read a whole genome
from start to finish; instead, there are many techniques that vary in the speed, accuracy, and
length of DNA fragments they can read. Scientists in the Human Genome Project had to se-
quence the genome in pieces, with different pieces being more easily sequenced by differ-
ent technologies. Once you have a complete human genome, subsequent human genomes
become much easier to construct; you can use the first genome as a reference for the
second. The fragments from the second genome can be pattern matched to the first, similar
to having the picture on a jigsaw puzzle's box to help inform the placement of the puzzle
pieces. It helps that most coding sequences are highly conserved, and variants only occur at
1 in 1,000 loci.
Shortly after the Human Genome Project was completed, the Genome Reference Consorti-
um (GRC) , an international collection of academic and research institutes, was formed to
improve the representation of reference genomes. The GRC publishes a new human refer-
ence that serves as something like a common coordinate system or map to help analyze
new genomes. The latest human reference genome, released in February 2014, was named
GRCh38 ; it replaced GRCh37 , which was released five years prior.
Search WWH ::




Custom Search