Database Reference
In-Depth Information
While the formula is apparently simple, it exhibits a crucial implication. In par-
ticular, if we assume a cluster with an unlimited number of machines and a constant
s , we can use the formula to express the maximum speedup that can be achieved by
simply computing the lim n →∞ Speedup p as follows:
lim n →∞ Speedup p = lim n →∞ 1/( s + (1 − s )/ n ) = 1/ s .
To understand the essentiality of the formula's implication, let us assume a serial
fraction s of only 2%. Applying the formula with an assumingly unlimited number
of machines will result in a maximum speedup of only 50. Reducing s to 0.5% would
result in a maximum speedup of 200. Consequently, we realize that attaining scal-
ability in distributed systems is quite challenging, as it requires s to be almost 0, let
alone the effects of load imbalance, synchronization, and communication overheads.
In practice, synchronization overheads (e.g., performing barrier synchronization
and acquiring locks) increase with an increasing number of machines, often super-
linearly [67]. Communication overheads also grow dramatically since machines in
large-scale distributed systems cannot be interconnected with very short physical
distances. Load imbalance becomes a big factor in heterogeneous environments
as explained shortly. While this is truly challenging, we point out that with Web-
scale input data, the overheads of synchronization and communication can be highly
reduced if they contribute way less toward the overall execution time as compared
with computation. Fortunately, this is the case with many Big Data applications.
1.6.3 C ommuniCation
As deined in Section 1.4.1, distributed systems are composed of networked com-
puters that can communicate by explicitly passing messages or implicitly access-
ing shared memories. Even with distributed shared memory systems, messages are
internally passed between machines, yet in a manner that is totally transparent to
users. Hence, it all boils down essentially to passing messages. Consequently, it can
be argued that the only way for distributed systems to communicate is by passing
messages. In fact, Coulouris et al. [16] adopts such a deinition for distributed sys-
tems. Distributed systems such as the cloud rely heavily on the underlying network
to deliver messages rapidly enough to destination entities for three main reasons,
performance , cost , and quality of service (QoS). Speciically, faster delivery of mes-
sages entails minimized execution times, reduced costs (as cloud applications can
commit earlier), and higher QoS, especially for audio and video applications. This
makes the issue of communication a principal theme in developing distributed pro-
grams for the cloud. Indeed, it will not be surprising if some people argue that com-
munication is at the heart of the cloud and is one of its major bottlenecks.
Distributed programs can mainly apply two techniques to address the commu-
nication bottleneck on the cloud. First, the strategy of distributing/partitioning the
work across machines should attempt to co-locate highly communicating entities
together . This can mitigate the pressure on the cloud network and subsequently
improve performance. Such an aspired goal is not as easy as it might appear, though.
For instance, the standard edge cut strategy seeks to partition graph vertices into p
Search WWH ::




Custom Search