Distributed Programming for the Cloud - Large Scale and Big Data: Processing and Management

Database Reference

In-Depth Information

While the formula is apparently simple, it exhibits a crucial implication. In par-

ticular, if we assume a cluster with an unlimited number of machines and a constant

s , we can use the formula to express the maximum speedup that can be achieved by

simply computing the lim n →∞ Speedup p as follows:

lim n →∞ Speedup p = lim n →∞ 1/( s + (1 − s )/ n ) = 1/ s .

To understand the essentiality of the formula's implication, let us assume a serial

fraction s of only 2%. Applying the formula with an assumingly unlimited number

of machines will result in a maximum speedup of only 50. Reducing s to 0.5% would

result in a maximum speedup of 200. Consequently, we realize that attaining scal-

ability in distributed systems is quite challenging, as it requires s to be almost 0, let

alone the effects of load imbalance, synchronization, and communication overheads.

In practice, synchronization overheads (e.g., performing barrier synchronization

and acquiring locks) increase with an increasing number of machines, often super-

linearly [67]. Communication overheads also grow dramatically since machines in

large-scale distributed systems cannot be interconnected with very short physical

distances. Load imbalance becomes a big factor in heterogeneous environments

as explained shortly. While this is truly challenging, we point out that with Web-

scale input data, the overheads of synchronization and communication can be highly

reduced if they contribute way less toward the overall execution time as compared

with computation. Fortunately, this is the case with many Big Data applications.

1.6.3 C ommuniCation

As deined in Section 1.4.1, distributed systems are composed of networked com-

puters that can communicate by explicitly passing messages or implicitly access-

ing shared memories. Even with distributed shared memory systems, messages are

internally passed between machines, yet in a manner that is totally transparent to

users. Hence, it all boils down essentially to passing messages. Consequently, it can

be argued that the only way for distributed systems to communicate is by passing

messages. In fact, Coulouris et al. [16] adopts such a deinition for distributed sys-

tems. Distributed systems such as the cloud rely heavily on the underlying network

to deliver messages rapidly enough to destination entities for three main reasons,

performance , cost , and quality of service (QoS). Speciically, faster delivery of mes-

sages entails minimized execution times, reduced costs (as cloud applications can

commit earlier), and higher QoS, especially for audio and video applications. This

makes the issue of communication a principal theme in developing distributed pro-

grams for the cloud. Indeed, it will not be surprising if some people argue that com-

munication is at the heart of the cloud and is one of its major bottlenecks.

Distributed programs can mainly apply two techniques to address the commu-

nication bottleneck on the cloud. First, the strategy of distributing/partitioning the

work across machines should attempt to co-locate highly communicating entities

together . This can mitigate the pressure on the cloud network and subsequently

improve performance. Such an aspired goal is not as easy as it might appear, though.

For instance, the standard edge cut strategy seeks to partition graph vertices into p

Search WWH ::

Custom Search

Home