Geoscience Reference
In-Depth Information
micro-simulation and one thread for each of the default event handlers. However,
only a selected number of configurations and thread combinations are simulated
and discussed here.
The first two configurations look at JQueueSim without PEH and with and
without EWT. The comparison of these two configurations highlights the overhead
due EWT (43 %). Configuration three and four look at JQueueSim for the case
where PEH is turned on, again with and without EWT. The experiments show that
the newly implemented parallel event handling reduces the time of an iteration by
around 26 % for the case where writing out events is turned on (configuration one
vs. three) and by around 13 % for the case where writing out events is turned off
(configuration two vs. four). The higher gain for the case where events are written
out to the hard drive is expected; as in this case parallel event handling successfully
decouples the micro-simulation from I/O operations.
Configurations one to four and five to eight correspond to each other, and
only the latter uses JDEQSim instead of JQueueSim. In order to measure possible
performance gains due to the implementations made in this chapter, JQueueSim and
JDEQSim need to be compared to the case where writing out events is turned off
and the latter uses PEH (configuration four vs. eight). The reason for comparing the
case where event writing is turned off is important because in most iterations, this
configuration is run. In this case, the runtime is reduced by around four times for the
given scenario. The major part of this speedup (ca. 76 %) is due to the differences in
models of JDEQSim and JQueueSim (event-based vs. fixed time steps). This can be
seen when comparing configuration two and six where only a single thread is used
both for JDEQSim and JQueueSim. The remaining performance gain is due to the
parallelization of event handling (configuration six vs. eight).
Configuration nine shows DEQSim runs for various numbers of CPUs used.
While the runtimes of JDEQSim and DEQSim are similar, when PEH is not used
and EWT is turned on (configuration five vs. nine), already turning off EWT leads
to a major performance gain for JDEQSim compared to DEQSim (50 %, compare
configuration five vs. six). This gap between DEQSim and JDEQSim even builds
up further, when turning on PEH, such that JDEQSim always performs better than
DEQSim up to eight CPUs (configuration seven/eight vs. nine). Furthermore, as the
flattening of the runtime curves suggests, it might be quite difficult for DEQSim to
reach a runtime lower than that of JDEQSim, even if using a higher number of CPUs
as is explained using Amdahl's law in the next section (Amdahl 1967 ).
9.4.1.3
Amdahl's Law and Its Implications
Amdahl's law describes the maximum achievable speedup of a parallel program. It
says that if a certain portion of a program cannot be parallelized, then the maximum
achievable speedup is limited - even with unbounded computation power. The
maximum achievable speedup with n threads for a program where fraction b of
the program cannot be parallelized can be calculated using Eq. ( 9.1 ).
Search WWH ::




Custom Search