Database Reference
In-Depth Information
Join In
While we're off to a great start, the ADAM project is still an experimental platform and
needs further development. If you're interested in learning more about programming on
ADAM or want to contribute code, take a look at Advanced Analytics with Spark: Patterns
for Learning from Data at Scale by Sandy Ryza et al. (O'Reilly, 2014), which includes a
chapter on analyzing genomics data with ADAM and Spark. You can find us at ht-
tp://bdgenomics.org , on IRC at #adamdev, or on Twitter at @bigdatagenomics .
[ 152 ] See Antonio Regalado, “EmTech: Illumina Says 228,000 Human Genomes Will Be Sequenced This
Year,” September 24, 2014.
[ 153 ] This process is called protein folding . The Folding@home allows volunteers to donate CPU cycles to
help researchers determine the mechanisms of protein folding.
[ 154 ] There are also a few nonstandard amino acids not shown in the table that are encoded differently.
[ 155 ] See Eva Bianconi et al., “An estimation of the number of cells in the human body,” Annals of Human
Biology , November/December 2013.
[ 156 ] You might expect this to be 6.4 billion letters, but the reference genome is, for better or worse, a haploid
representation of the average of dozens of individuals.
[ 157 ] Only about 28% of your DNA is transcribed into nascent RNA, and after RNA splicing, only about
1.5% of the RNA is left to code for proteins. Evolutionary selection occurs at the DNA level, with most of
your DNA providing support to the other 0.5% or being deselected altogether (as more fitting DNA evolves).
There are some cancers that appear to be caused by dormant regions of DNA being resurrected, so to speak.
[ 158 ] There is actually, on average, about 1 error for each billion DNA “letters” copied. So, each cell isn't ex-
actly the same.
[ 159 ] Intentionally 50 years after Watson and Crick's discovery of the 3D structure of DNA.
[ 160 ] Jonathan Max Gitlin, “Calculating the economic impact of the Human Genome Project,” June 2013.
[ 161 ] There is also a second approach, de novo assembly, where reads are put into a graph data structure to
create long sequences without mapping to a reference genome.
[ 162 ] Each base is about 3.4 angstroms, so the DNA from a single human cell stretches over 2 meters end to
end!
[ 163 ] Unfortunately, some of the more popular software in genomics has an ill-defined or custom, restrictive
license. Clean open source licensing and source code are necessary for science to make it easier to reproduce
and understand results.
[ 164 ] BAM is the compressed binary version of the SAM format.
Search WWH ::




Custom Search