Biomedical Engineering Reference
In-Depth Information
under test with a limited number of users but reached a point where it
started to cause unexpected behaviours and errors in Galaxy (behaviours
one would not expect from a slow database), switching to PostGreSQL
fi xed all previous odd behaviour. Be warned, there is no migration script
from SQLite to PostGreSQL in Galaxy and upgrading in this way is not
supported at all by the Galaxy team. We had an arduous week testing
and re-testing a custom migration script to switch our precious two-man
month's worth of work to a new database management system.
According to its development team, Galaxy is always at version 1. This
refl ects a commitment to backward compatibility that results in daily
incremental updates to the codebase on their server, rather than signifi cant
new releases. Often these updates will be as little as a few lines. Sometimes
many megabytes of code will change. Most importantly, the database
schema can change. Galaxy provide database migration scripts between
updates, but do not provide scripts for arbitrary jumps, say from schema
27 to schema 77. The practical implication of this is that it is wise to
update often. The community at large seem to update on average once
every 12 weeks, to provide a good balance between workload and ease of
upgrading. Leaving upgrades too long can make Galaxy painful to
upgrade as many merges and schema changes and update scripts need to
be run and tested sequentially to ensure a smooth running path.
Running Galaxy, essentially a job server-based system, on a compute
cluster requires a touch of planning but is made easier by the fact that
most cluster systems are supported by reliable libraries. In our cluster
Galaxy itself runs on a head-node, which is visible to the outside world
and the machines that accept jobs from the queue. We used the free
DRMAA libraries from Fedstage [27] compiled against LSF (a Sun Grid
Engine version exists too) and merely had to confi gure job-runners to
ensure that Galaxy jobs went into the cluster rather than being executed
on the head-node machine.
When dealing with big fi les, as is inevitable in NGS analyses, it is best
to ensure that upload and download do not run in the cluster as the client
(web browser in this case) must be able to connect to the machine doing
the job. Galaxy generates large output fi les, which end-users take as their
results. Galaxy's facility for allowing users to download data occupies a
lot of processing time within the main Python process, which can cause
Galaxy to slow and fail when sending data to the web browser. The
solution is to use the apache_xsendfi le module, which provides a
mechanism for serving large static fi les from web applications. When a
user requests a large fi le for download, Galaxy authenticates the request
and hands the work of sending the fi le to the user to Apache, without
￿ ￿ ￿ ￿ ￿
 
Search WWH ::




Custom Search