A Numerical Solution for Wootters Correlation - High Performance Computing

Information Technology Reference

In-Depth Information

3. How much performance can achieve a distributed memory/shared memory

model?

These versions plus the serial version were executed in a cluster architecture

that is composed of seven Symmetric Multi-Processing (SMP) heterogeneous

nodes (three nodes with Intel Xeon SL8SV dual-core processors with Hyper-

threading technology 16 and four nodes with two AMD Opteron250 single-core

processors.) Every node has 64-bit Scientific Linux 5.6 executing runlevel 3 (text

mode) to decrease the system workload caused by the GUI. Of these nodes, one

node is used as Master and another is used as a Storage Node. The nodes are

connected via a dedicated 8-port Gigabit switch. To measure the elapsed time,

the time command on Linux is used. Unfortunately, no profilers were executed

at runtime due to the high volume of operations. To reduce the time error mea-

surement, the elapsed time was measured repeatedly (10 trials). Valgrind was

used during the debugging phase only to compute the percentage of the serial

and parallel sections. To minimize unexpected behaviors in performance, CPU

throttling was disabled during the measurement process.

To validate the numerical results of the proposed algorithm, the data was

compared with analytical results from [9]; it has a sound research in bipartite

systems of quantum dots. From there, eight experiments were performed to val-

idate QDsim using different configurations. Each simulation changes a specific

parameter of the quantum system such as the Purity level, the QD decay rate,

the amplitude of probability, the radiative decay rate and the quotient between

photon emission decay rate and radiative quantum dot decay rate. The physical

parameters to feed the model are based on a InAs/GaAs semiconductor.

7R su s

From a partitioning viewpoint there are four implementations:

- Loop Partitioning

•

on a shared memory model using OpenMP (LPMC)

•

on a distributed memory model using OpenMPI (LPMP)

on a hybrid (distributed/shared memory) version using OpenMPI &

OpenMP (LPMP + LPMC)

- Loop/Task Partitioning

•

on a hybrid (distributed/shared memory) version using OpenMPI &

OpenMP (LPMP + TPMC)

Performance is measured using speedup as a metric. The speedup S is the

quotient between the time of the serial algorithm Ts and the time of the par-

allel version Tp ( S = Ts/Tp ). The time of the parallel version is a function

of the number of computer nodes np and the number of cores nt used in the

computation.

16 Increases the performance of physical cores by using abstract cores called Logical

Cores .

High Performance Computing

Search WWH ::

Custom Search

Home