Biology Reference
In-Depth Information
Cloud facilities tailored to the users' needs. 18 Cloud computing means that data-
intensive investigations such as genome-scale comparative and metagenomic ana-
lyses can be performed in a timely and cost-effective manner ( Wilkening, 2009 ).
An innovative, recent approach to biomedical data mining, made possible by the
Internet and the large amounts of Cloud storage space currently available, is crowd-
sourcing. Crowdsourcing is “the practice of obtaining needed services, ideas or con-
tent by soliciting contributions from a large group of people and especially from the
online community rather than from traditional employees or suppliers.” 19 This type
of approach has been applied to scientific problems ever since the Internet became
ubiquitous; perhaps the most well-known examples are the Search for Extraterres-
trial Intelligence (SETI) 20 and Folding@Home. 21
The SETI Institute was founded in 1984, and uses crowdsourcing to examine
radio frequency signals from the SETI Institute's Allen Telescope Array for indica-
tions of possible alien civilizations. Users donate spare domestic CPU cycles to run
the analysis software. Folding@home is more biologically oriented, using volun-
teers' computers to run protein-folding simulations. Both of these projects rely more
upon users providing computational power than intellectual power, but as computa-
tion becomes increasingly cost-effective, the focus has turned to true crowdsourcing.
In the field of astronomy, Galaxy Zoo 22 uses volunteers to classify galaxies in images
from a number of telescopes, including the Hubble Space Telescope, in order to
investigate how galaxies form and evolve. Over 60 million classifications have been
made to date, and overall the classifications are as good as those made by profes-
sional astronomers ( Lintott et al. , 2008 ).
Crowdsourcing does not appear to have been widely used in microbiology until
recently. In 2011 Germany experienced an outbreak of haemolytic uraemic syn-
drome with bloody diarrhoea. It was caused by the virulent E. coli strain O104:
H4. By the time the organism was identified and sequenced 845 cases and 54 deaths
had occurred ( Bielaszewska et al. , 2011 ). Nature Biotechnology described it as “the
most deadly E. coli outbreak on record” ( Outbreak Genomics [Editorial], 2011 ).
Researchers from the Beijing Genomics Institute sequenced the organism on an
Ion Torrent Personal Genome Machine, 23 and made the data freely available on
the github Web site. 24 Bioinformaticists all around the world tackled the data; a
de novo assembly was completed the following day, and the full annotation of the
draft genome was completed within a week ( Rohde et al. , 2011 ).
During this time additional data became available as a number of different centres
re-sequenced using a range of “Next Generation Sequencing” platforms. Although
18 http://cloud.google.com/ .
19 http://www.merriam-webster.com/dictionary/crowdsourcing .
20 www.seti.org .
21 http://folding.stanford.edu/English/HomePage .
22 http://www.galaxyzoo.org/ .
23 Life Technologies www.lifetechnologies.com .
24 https://github.com .
Search WWH ::




Custom Search