Biology Reference
In-Depth Information
mulated. This topic has focused on genomes themselves, and especially
on human genomes. But human genomes are increasingly understood
by comparing them with the genomes of the hundreds of other fully
sequenced organisms—and not just organisms that are perceived to be
similar to humans (such as apes and monkeys). Rapid sequencing and
data analysis have made it possible to sequence the thousands of organ-
isms that inhabit the human gut, the oceans, the soil, and the rest of the
world around us. In 2012, the Human Microbiome Project reported
that each individual is home to about 100 trillion microscopic cells,
including up to ten thousand different types of bacteria. 2 The Earth
Microbiome Project estimates that we have sequenced only about one-
hundredth of the DNA found in a liter of seawater or a gram of soil. 3
Many such studies are based on “metagenomic” approaches, in
which sequencing is performed on samples taken from the environment
without worrying about the details of which sequence fragments belong
to which organisms. By showing our dependence on a massive variety of
microorganisms living on us and around us, metagenomics is providing
a new perspective on biology. These techniques would be not only im-
practical, but also incomprehensible, without the databases, software,
and hardware described here. It is the possibility of using software to
pick out genes in fragments of sequence, and the possibility of storing
and sharing this information in worldwide databases, that makes this
new fi eld make sense.
Further, there are vast amounts of non-sequence data being gathered
about human biology as well. These include data on proteins and their in-
teractions (proteomics, interactomics), metabolites (metabolomics), and
epigenetic modifi cations to genomes (epigenomics). High-throughput
techniques such as yeast two-hybrid screening, ChIP-chip (chromatin
immune-precipitation on a microarray chip), ChIP-seq (chromatin im-
munoprecipitation and sequencing), and fl ow cytometry produce large
amounts of nongenomic data. Understanding our genomes depends on
understanding these other data as well, and on understanding how they
relate to, modify, and interact with sequence data. However, part of
the role that bioinformatics tools play is to structure these other data
around genomes. We have seen how databases like Ensembl and visual-
ization tools like the UCSC Genome Browser use the genome sequence
as a scaffold on which to hang other forms of data. Sequence remains at
the center, while proteins and epigenetic modifi cations are linked to it;
the genome provides the principle on which other data are organized.
This hierarchy—with sequence at the top—is built into the databases,
software, and hardware of biology.
Search WWH ::




Custom Search