A Numerical Solution for Wootters Correlation - High Performance Computing

Information Technology Reference

In-Depth Information

Computer 4

Computer 3

VUL

Computer 2

Computer 1

VX

TLL

Time

VLL

TUL

Fig. 5. Partitioning using domain decomposition. Every processing element computes

a non-dependent chunk.

The latter method was chosen for all of the experiments due to its good nu-

merical convergence in the analysis of quantum concurrence. Despite the Euler

is strictly serial, the large number of equations allowed the parallelization using

only a shared memory scheme. Besides, the TP partitioning leads to a cumber-

some implementation in a distributed memory scheme since the high quantity

of data dependencies will increase the execution time due to a higher latency in

the interconnection network. For these reasons, a shared memory scheme is used

for this partitioning.

Figure 6 shows the Euler method for a linear system where ˁ ( i,j )isthe

dependent variable for the i -thequationinthe j -th iteration, ʔt is the time

step, F is an ODE function that depends on the previous values of ˁ ( i,j )and

the current time t . The length of the vector ˁ depends on the degree of freedom

of the light modes (ideally, it is infinity) and is limited to n modes by statistical

experiences. The number of equations increases proportionally to the number of

modes and it is limited to n equations (in Fig. 6, n = 40.)

Since the number of equations is large, the domain of n is split into blocks.

Each block has a number of pieces that is equal to the number of cores; a block

is then assigned to each core. The problem arises during the evaluation of the

function F where the system equations are produced. For example, the k -th

derivate -see (5)- is compound of j factors, whose number varies significantly

from 0 to many factors. Worst yet, a factor may be a function.

Since the complexity of evaluating each equation is not even from a computa-

tional viewpoint, the performance of the system is reduced significantly because

of this imbalance. For this reason, to keep the load balancing, it is not enough

to distribute an equal number of equations among the cores. Ideally, to get a

perfect balance one should distribute a number of operations per core such that

the computational cost in every core is the same. This of course, is unfeasible in

this particular case. A pragmatic solution was to attempt to distribute an equal

Search WWH ::

Custom Search

Home