Biomedical Engineering Reference
In-Depth Information
and thanks to the software and hardware infrastructure the
bioinformatician can have much more time to pursue their own research,
such as developing new methods. As the approach stands as well for one
user as it does for one thousand, then a sensible infrastructure is scalable
to the coming demands of our data-fl ooded fi eld.
It's not all easy when relying on free and open source software. As
much of it is developed for other people's purposes, there can be signifi cant
shortcomings if your immediate purpose is slightly different from the
creators. The fi rst concern when sourcing software is 'does it do what I
want?', and all too often the best answer after surveying all the options is
'nearly'. As plant scientists and microbiologists, our model organisms
often do not fi t some of the assumptions made by analysis software, for
example SNP fi nding software that assumes a diploid population cannot
work well in reads generated from an allotetraploid plant or the formats
in which we receive data from genome databases are somewhat different
from those the software expects. One feature we would love to have but
never do is BioMart-style [30] automatic grabbing of data over the web,
such software never supports our favourite databases. This refl ects the
fact that the main source of investment in bioinformatics is from those
working in larger communities than ours, but, in general, lack of exactly
the right feature is an issue everyone will come across at some point.
Typically we fi nd ourselves looking for a piece of software that can handle
our main task and end up bridging the gaps with bits of scripts and
middleware of our own, one of the major advantages of Galaxy is that it
makes this easy.
It is surprising to us that there is lack of useable database software with
simple pre-existing schemas for genomics data. There are of course the
database schemas provided by the large bioinformatics institutes like
EMBL or the SeqFeature/GFF databases in the Open Bioinformatics
Foundation [31] projects, but these are either large and diffi cult to work
with because they are tied into considerable other software projects like
GBrowse [16] or ENSEMBL [32] browsers or just complicated. Often,
the schema seems obfuscated making it diffi cult to work with on a day-
to-day basis. Others, like CHADO [16], have been a nightmare to just
start to understand and we have given up before we begin. In this case we
felt we really needed to go back to the start and create our own solution,
the Gee Fu tool [21] we described earlier. We cannot always take this
approach, when we are stuck, we are stuck. It is not in the scope of our
expertise to re-code or extend open source software. Our team has
experience in Java and most scripting languages, but the time required to
become familiar with the internals of a package is prohibitive, with busy
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search