Distributed Programming for the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

between two machines is dependent on their relative locations in the network topol-

ogy. For instance, machines that are on the same rack have higher bandwidth between

them as opposed to machines that are off-rack. As such, it pays to minimize network

traffic across racks. If P 1 , P 2 , and P 3 are mapped to M 1 , M 2 , and M 3 , respectively, less

network latency will be incurred when P 1 , P 2 , and P 3 communicate vs. if they are

mapped across the two racks. More precisely, for P 1 to communicate with P 2 on the

same rack, only one hop is incurred to route a message from P 1 to P 2 . In contrast, for

P 1 to communicate with P 2 on different racks, two hops are incurred per each mes-

sage. Clearly, a less number of hops results in a better network latency and improved

overall performance. Unfortunately, this is not as easy as it might appear to achieve

on clouds, especially on public clouds, for one main reason. That is, clouds such as

Amazon EC2 do not expose their network topologies. Nevertheless, the network

topology can still be learned (though not very effectively) using a benchmark like

Netperf [54] to measure point-to-point TCP stream bandwidths between all pairs of

cluster nodes [32]. This enables estimating the relative locality of nodes and arriving

at a reasonable inference regarding the rack topology of the cluster.

1.6.4 s ynChronization

Distributed tasks should be allowed to simultaneously operate on shared data with-

out corrupting data or causing any inconsistency. For instance, GraphLab allows

multiple tasks to operate on different vertices of the same graph simultaneously.

This might lead to race-conditions whereby two tasks might try to modify data on a

shared edge at the same time, resulting in a corrupted value. Consequently, synchro-

nization solutions for providing distributed mutual exclusive accesses by tasks will

be required. Synchronization acts as a mechanism through which programmers can

control the sequence of operations (reads and writes) that are performed by tasks. As

discussed in Section 1.4.2.1, there are three types of synchronization methods that

are in wide use, semaphores, locks, and barriers. The efficiency of such methods

is a critical goal in developing distributed programs. For instance, as pointed out

in Section 1.4.2.1 and exemplified by the BSP model (see Section 1.4.3), a barrier

defines a point at which no task is allowed to continue unless all other tasks reach

that point. While this is easy to implement, the whole execution time of a distributed

program becomes dependent on the slowest task. In distributed systems such as the

cloud, where heterogeneity is the norm, this can cause serious performance degrada-

tion. The challenge becomes how to apply synchronization methods and at the same

time avert performance degradation.

In addition to ensuring mutual exclusion, there are other properties that need to

be guaranteed for distributed programs when it comes to synchronization. To start

with, if one task shows interest in getting access to a critical section, eventually

it should succeed. If two tasks show interest in getting access to a critical section

simultaneously, only one of them should succeed. This is denoted as the deadlock-

free property and has to be delivered by any mutual exclusion mechanism. Things,

however, might not go always as expected. For instance, if task A succeeds in acquir-

ing lock1 and, at about the same time, task B succeeds in acquiring lock2; then if

task A attempts to acquire lock2 and task B attempts to acquire lock1, we end up

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home