Biology Reference
In-Depth Information
gies, and databases. These structures impose constraints on what bio-
logical objects are and how they can be talked about.
Pipelines
At the center of bioinformatics is a process that transforms the real into
the virtual—that renders biological samples into data. The “pipeline” is
a spatial metaphor for a process that recurs often in bioinformatics. As
such, it is also an appropriate metaphor to use for describing the transi-
tion from old to new forms of biological work. The pipeline moves us
from the material to the virtual and from the lab bench to the computer
network.
In biology, “pipeline” is a word used to describe the series of pro-
cesses applied to an object in order to render it into some appropri-
ate fi nal form. Pipelines can be either physical (involving transmission
and transformation of actual objects) or virtual (involving transmission
and transformation of data), but they are often both, and they most
often describe the processes through which actual objects (DNA from
an organism) are rendered into virtual form (DNA in a database). The
pipeline is the method through which the actual becomes the virtual,
through a carefully ordered set of movements through physical and vir-
tual space.
When we think of a pipeline, we might immediately think of an oil
pipeline or a water pipeline, in which a liquid is transported directly
along its length. Readers familiar with computers might also think of
the ubiquitous “pipe” command in Unix (represented by “|”) by which
one program or command takes as its input the output from the previ-
ous program; by “piping” several programs together, the user creates a
space through which data can fl ow. In both these cases, there are two
key concepts. First, pipelines are directional: the water or the data must
fl ow down the pipe, following the single path laid out for them and
never moving side to side or going backward. Second, pipelines require
liquidity: pipes are generally inappropriate for moving solid objects, and
piping programs together requires that inputs and outputs of adjacent
programs in the pipe be designed to match one another.
In 2008, I spent several months observing the “sequencing pipeline”
in action at the Broad Institute. At the Broad Sequencing Center, almost
all the work is based around the pipeline—it is the metaphorical ob-
ject around which work is ordered. The pipeline is the set of methods
and processes that are used to transform molecular DNA into sequence
data in appropriate forms for submission to an online database such as
Search WWH ::




Custom Search