Information Technology Reference
In-Depth Information
3. How much performance can achieve a distributed memory/shared memory
model?
These versions plus the serial version were executed in a cluster architecture
that is composed of seven Symmetric Multi-Processing (SMP) heterogeneous
nodes (three nodes with Intel Xeon SL8SV dual-core processors with Hyper-
threading technology 16 and four nodes with two AMD Opteron250 single-core
processors.) Every node has 64-bit Scientific Linux 5.6 executing runlevel 3 (text
mode) to decrease the system workload caused by the GUI. Of these nodes, one
node is used as Master and another is used as a Storage Node. The nodes are
connected via a dedicated 8-port Gigabit switch. To measure the elapsed time,
the time command on Linux is used. Unfortunately, no profilers were executed
at runtime due to the high volume of operations. To reduce the time error mea-
surement, the elapsed time was measured repeatedly (10 trials). Valgrind was
used during the debugging phase only to compute the percentage of the serial
and parallel sections. To minimize unexpected behaviors in performance, CPU
throttling was disabled during the measurement process.
To validate the numerical results of the proposed algorithm, the data was
compared with analytical results from [9]; it has a sound research in bipartite
systems of quantum dots. From there, eight experiments were performed to val-
idate QDsim using different configurations. Each simulation changes a specific
parameter of the quantum system such as the Purity level, the QD decay rate,
the amplitude of probability, the radiative decay rate and the quotient between
photon emission decay rate and radiative quantum dot decay rate. The physical
parameters to feed the model are based on a InAs/GaAs semiconductor.
7R su s
From a partitioning viewpoint there are four implementations:
- Loop Partitioning
on a shared memory model using OpenMP (LPMC)
on a distributed memory model using OpenMPI (LPMP)
on a hybrid (distributed/shared memory) version using OpenMPI &
OpenMP (LPMP + LPMC)
- Loop/Task Partitioning
on a hybrid (distributed/shared memory) version using OpenMPI &
OpenMP (LPMP + TPMC)
Performance is measured using speedup as a metric. The speedup S is the
quotient between the time of the serial algorithm Ts and the time of the par-
allel version Tp ( S = Ts/Tp ). The time of the parallel version is a function
of the number of computer nodes np and the number of cores nt used in the
computation.
16 Increases the performance of physical cores by using abstract cores called Logical
Cores .
 
Search WWH ::




Custom Search