Biomedical Engineering Reference
In-Depth Information
purpose software projects such as Linux, GCC, Apache, and MySQL are the
direct result of this new freedom to cooperate. Development is “ crowdsourced ” —
there are often one or more core teams of developers with fi nancial backing
from commercial users of the software, and a large community of end users
also contribute improvements that advance the projects much faster than
would be possible with a single traditional developer team at work. For
example, about 75% of new Linux kernel code is generated by teams inside
normally competitive companies like IBM and Intel, and individual contribu-
tors account for at least 18% of ongoing efforts [1].
A grand tradition of academic thriftiness has led to the widespread adop-
tion of these no-cost tools in the research community, and the “crowdsourcing”
ethic has rubbed off onto scientists' thinking about their own work. In recent
years there has been a fl owering of open-source bioinformatics software and
a move toward more open sharing of data. Indeed, many granting agencies
now require an explicit plan for data sharing, although there is little agreement
about what constitutes a reasonable plan [2].
14.4 OPEN DATA STANDARDS: ONTOLOGIES AND
INTERCHANGE FORMATS
Sharing data requires mutual understanding of the content and format of the
data, but achieving this understanding can be nontrivial. This is especially so
when dealing with unprocessed, or “raw,” data, which is typically written in
some mysterious binary format closely held by each instrument manufacturer.
The use of such closed formats is technically defensible as they are often the
most effi cient for rapidly storing data as it streams off an instrument, and they
can be altered as needed by the manufacturer without worry of disrupting
other software systems that read the data, since none exist. Of course, the fact
that an ever-shifting and undocumented data format also binds the user to the
data processing software sold by the instrument maker has long been seen as
a happy side effect by the instrument makers, but not by instrument users.
Increasingly, users are demanding and helping defi ne open standards to allow
the data they collect to be read and written by software agents other than
those provided by the equipment manufacturer, and in many cases the manu-
facturers are now supporting these efforts lest a lack of openness become a
competitive disadvantage. Developing open standards for describing pro-
cessed data and results presents an even greater challenge as the very idea of
“processing” and “results” is a rapidly moving target in the research world,
and there is often little agreement in the terms of speech used in describing
the domains themselves.
The fi rst step in creating a data standard is to disambiguate the terminology
used in the area of endeavor. This is most properly done by developing a
structured, rigorous, and thorough description of the knowledge domain, or
“ontology,” while avoiding duplication of or confl icts with ontologies in related
areas. This is a nontrivial and open-ended task requiring cooperation within
Search WWH ::




Custom Search