Database Reference
In-Depth Information
between two machines is dependent on their relative locations in the network topol-
ogy. For instance, machines that are on the same rack have higher bandwidth between
them as opposed to machines that are off-rack. As such, it pays to minimize network
traffic across racks. If P 1 , P 2 , and P 3 are mapped to M 1 , M 2 , and M 3 , respectively, less
network latency will be incurred when P 1 , P 2 , and P 3 communicate vs. if they are
mapped across the two racks. More precisely, for P 1 to communicate with P 2 on the
same rack, only one hop is incurred to route a message from P 1 to P 2 . In contrast, for
P 1 to communicate with P 2 on different racks, two hops are incurred per each mes-
sage. Clearly, a less number of hops results in a better network latency and improved
overall performance. Unfortunately, this is not as easy as it might appear to achieve
on clouds, especially on public clouds, for one main reason. That is, clouds such as
Amazon EC2 do not expose their network topologies. Nevertheless, the network
topology can still be learned (though not very effectively) using a benchmark like
Netperf [54] to measure point-to-point TCP stream bandwidths between all pairs of
cluster nodes [32]. This enables estimating the relative locality of nodes and arriving
at a reasonable inference regarding the rack topology of the cluster.
1.6.4 s ynChronization
Distributed tasks should be allowed to simultaneously operate on shared data with-
out corrupting data or causing any inconsistency. For instance, GraphLab allows
multiple tasks to operate on different vertices of the same graph simultaneously.
This might lead to race-conditions whereby two tasks might try to modify data on a
shared edge at the same time, resulting in a corrupted value. Consequently, synchro-
nization solutions for providing distributed mutual exclusive accesses by tasks will
be required. Synchronization acts as a mechanism through which programmers can
control the sequence of operations (reads and writes) that are performed by tasks. As
discussed in Section 1.4.2.1, there are three types of synchronization methods that
are in wide use, semaphores, locks, and barriers. The efficiency of such methods
is a critical goal in developing distributed programs. For instance, as pointed out
in Section 1.4.2.1 and exemplified by the BSP model (see Section 1.4.3), a barrier
defines a point at which no task is allowed to continue unless all other tasks reach
that point. While this is easy to implement, the whole execution time of a distributed
program becomes dependent on the slowest task. In distributed systems such as the
cloud, where heterogeneity is the norm, this can cause serious performance degrada-
tion. The challenge becomes how to apply synchronization methods and at the same
time avert performance degradation.
In addition to ensuring mutual exclusion, there are other properties that need to
be guaranteed for distributed programs when it comes to synchronization. To start
with, if one task shows interest in getting access to a critical section, eventually
it should succeed. If two tasks show interest in getting access to a critical section
simultaneously, only one of them should succeed. This is denoted as the deadlock-
free property and has to be delivered by any mutual exclusion mechanism. Things,
however, might not go always as expected. For instance, if task A succeeds in acquir-
ing lock1 and, at about the same time, task B succeeds in acquiring lock2; then if
task A attempts to acquire lock2 and task B attempts to acquire lock1, we end up
Search WWH ::




Custom Search