Information Technology Reference
In-Depth Information
Computer 4
Computer 3
VUL
Computer 2
Computer 1
VX
TLL
Time
VLL
TUL
Fig. 5. Partitioning using domain decomposition. Every processing element computes
a non-dependent chunk.
The latter method was chosen for all of the experiments due to its good nu-
merical convergence in the analysis of quantum concurrence. Despite the Euler
is strictly serial, the large number of equations allowed the parallelization using
only a shared memory scheme. Besides, the TP partitioning leads to a cumber-
some implementation in a distributed memory scheme since the high quantity
of data dependencies will increase the execution time due to a higher latency in
the interconnection network. For these reasons, a shared memory scheme is used
for this partitioning.
Figure 6 shows the Euler method for a linear system where ˁ ( i,j )isthe
dependent variable for the i -thequationinthe j -th iteration, ʔt is the time
step, F is an ODE function that depends on the previous values of ˁ ( i,j )and
the current time t . The length of the vector ˁ depends on the degree of freedom
of the light modes (ideally, it is infinity) and is limited to n modes by statistical
experiences. The number of equations increases proportionally to the number of
modes and it is limited to n equations (in Fig. 6, n = 40.)
Since the number of equations is large, the domain of n is split into blocks.
Each block has a number of pieces that is equal to the number of cores; a block
is then assigned to each core. The problem arises during the evaluation of the
function F where the system equations are produced. For example, the k -th
derivate -see (5)- is compound of j factors, whose number varies significantly
from 0 to many factors. Worst yet, a factor may be a function.
Since the complexity of evaluating each equation is not even from a computa-
tional viewpoint, the performance of the system is reduced significantly because
of this imbalance. For this reason, to keep the load balancing, it is not enough
to distribute an equal number of equations among the cores. Ideally, to get a
perfect balance one should distribute a number of operations per core such that
the computational cost in every core is the same. This of course, is unfeasible in
this particular case. A pragmatic solution was to attempt to distribute an equal
 
Search WWH ::




Custom Search