Biomedical Engineering Reference
In-Depth Information
We have constructed a GA to determine an efficient set of SnB input parameters in
an effort to reduce the time-to-solution for determining a molecular crystal structure
from X-ray diffraction data. We use a population of candidate SnB input parame-
ters. Each member of the population is represented as a string in the population,
and a fitness function is used to assign a fitness (quality) value for each member.
The members in the population obtain their fitness values by executing the SnB
programwith the input parameter values represented by their strings. Using “survival-
of-the-fittest” selection, strings from the old population are used to create a new
population based on their fitness values. The member strings selected can recombine
using crossover and / or mutation operators. A crossover operator creates a new mem-
ber by exchanging substrings between two candidate members, whereas a mutation
operator randomly modifies a piece of an existing candidate. This procedure of com-
bining and randomly perturbing member strings has, in many cases, been shown to
produce stronger (i.e., more fit) populations as a function of time (i.e., number of
generations).
We use the Sugal [99] (sequential execution) and PGAPack [100, 101] (parallel
and sequential execution) GA libraries. The Sugal library provided a sequential
GA and has additional capabilities, including a restart function, which proved to
be very important when determining fitness values for large molecular structures.
The PGAPack library provided a parallel master / slave MPICH / MPI implementa-
tion, which proved very efficient on distributed- and shared-memory ACDC-Grid
compute platforms. Other key features include C and Fortran interfaces, binary-,
integer-, real-, and character-valued native data types, object-oriented design,
and multiple choices for GA operators and parameters. In addition, PGAPack
is quite extensible. The PGAPack library was extended to include restart func-
tionality and is currently the only library used for the ACDC-Grid production
work.
The SnB computer program has approximately 100 input parameters, although
not all parameters can be optimized. For the purpose of this study, 17 critical param-
eters were identified for participation in the optimization procedure. Eight known
molecular structures were initially used to evaluate the GA evolutionary molecular
structure determination framework performance. These structures are 96016c [102],
96064c [103], crambin [104], gramicidin A [105], isoleucinomycin [106], pr435
[107], triclinic lysozyme [108], and triclinic vancomycin [109].
To efficiently utilize the computational resources of the ACDC-Grid, an accurate
estimate must be made in the resource requirements for SnB jobs that are necessary
for the GA optimization. This includes runs with varying parameter sets over the
complete set of eight known structures from our initial database.
This is accomplished as follows. First, a small number of jobs are run to determine
the required running time for each of the necessary jobs. Typically, this consists of
running a single trial for each of the job to predict the time required for the required
number of trials for the job under consideration.
Approximately 25,000 population members were evaluated for the eight known
molecular structures and stored in a MySQL database table, as shown in
Figure 24.7.
Search WWH ::




Custom Search