Biomedical Engineering Reference
In-Depth Information
needing the Galaxy processes to be involved again. The apache_xsendfi le
module does not currently have an upload feature. Because we use a
closed system we do not provide for upload of large fi les into Galaxy as
this often results in data-duplication (instead users requiring large fi les to
be put into the system must seek the bioinformatician and have the fi les
put into a data library). The Galaxy team recommend the use of nginx
[28] as a server for the transfer of large fi les up to Galaxy. An FTP server
solution is also provided.
During analyses, Galaxy creates large data fi les that can only be disposed
of when the user decides that they are fi nished with. Thus in a production
environment Galaxy can use a lot of disk space. Our instance runs
comfortably in 1.5 TB of allotted disk space provided that the cleanup
scripts are run nightly. The timing of cleanup runs will depend on use but
sooner is better than later as running out of disk space causes Galaxy to
stop dead and lose all running jobs. When running scripts weekly we
found that 3 TB of disk space was not enough to prevent a weekly halt.
11.9 Helping the user to understand
the details
With all these new powerful tools at their disposal, it would be remiss of
us not to teach the biologists how to understand the settings and how to
interpret the output, and, most importantly, what are the technical
caveats of each data type. It is quite possible to train biologists to do their
own analyses and they can quickly get the hang of command-line
computer operation and simple scripting tasks. Surprisingly though, a
common faltering point is that biologists often come to see the methods
as a 'black-box' that produces results but do not see how to criticise
them. It is often counterproductive at early stages to drag a discussion
down to highly technical aspects, instead introducing simple control
experiments can work well to convince biologists to take a more
experimental approach and encourage them to perform their own
controls and optimisations. A great example comes from our next-
generation sequence-based SNP fi nding pipelines. By adding known
changes to the reference genomes that we use and running our pipelines
again, we can demonstrate to the biologist how these methods can
generate errors. This insight can be quite freeing and convinces the
biologist to take the result they are getting and challenge it, employing
controls wherever possible.
Search WWH ::
Custom Search