Database Reference
In-Depth Information
middleware. SOAP defines a scheme for using Extensible Markup Language (XML),
a textual self-describing format, to represent contents of messages and allow distrib-
uted tasks at diverse machines to interact.
In general, code suitable for one machine might not be suitable for another
machine on the cloud, especially when instruction set architectures (ISAs) vary
across machines. Ironically, the virtualization technology, which induces heteroge-
neity, can effectively serve in solving such a problem. Same VMs can be initiated
for a user cluster and mapped to physical machines with different underlying ISAs.
Afterward, the virtualization hypervisor will take care of emulating any difference
between the ISAs of the provisioned VMs and the underlying physical machines
(if any). From a user's perspective, all emulations occur transparently. Lastly, users
can always install their own OSs and libraries on system VMs, like Amazon EC2
instances, thus ensuring homogeneity at the OS and library levels.
Another serious problem that requires a great deal of attention from distributed
programmers is performance variation [20,60] on the cloud. Performance vari-
ation entails that running the same distributed program on the same cluster twice
can result in largely different execution times. It has been observed that execution
times can vary by a factor of 5 for the same application on the same private cluster
[60]. Performance variation is mostly caused by the heterogeneity of clouds imposed
by virtualized environments and resource demand spikes and lulls typically expe-
rienced over time. As a consequence, VMs on clouds rarely carry work at the same
speed, preventing thereby tasks from making progress at (roughly) constant rates.
Clearly, this can create tricky load imbalance and subsequently degrade overall per-
formance. As pointed out earlier, load imbalance makes a program's performance
contingent on its slowest task. Distributed programs can attempt to tackle slow tasks
by detecting them and scheduling corresponding speculative tasks on fast VMs so as
they finish earlier. Specifically, two tasks with the same responsibility can compete
by running at two different VMs, with the one that finishes earlier getting commit-
ted and the other getting killed. For instance, Hadoop MapReduce follows a similar
strategy for solving the same problem, known as speculative execution (see Section
1.5.5). Unfortunately, distinguishing between slow and fast tasks/VMs is very chal-
lenging on the cloud. It could happen that a certain VM running a task is temporar-
ily passing through a demand spike, or it could be the case that the VM is simply
faulty. In theory, not any detectably slow node is faulty and differentiating between
faulty and slow nodes is hard [71]. Because of that, speculative execution in Hadoop
MapReduce does not perform very well in heterogeneous environments [11,26,73].
1.6.2 s Calability
The issue of scalability is a dominant subject in distributed computing. A distributed
program is said to be scalable if it remains effective when the quantities of users,
data and resources are increased significantly. To get a sense of the problem scope
at hand, as per users, in cloud computing, most popular applications and platforms
are currently offered as Internet-based services with millions of users. As per data,
in the time of Big Data, or the Era of Tera as denoted by Intel [13], distributed pro-
grams typically cope with Web-scale data in the order of hundreds and thousands
Search WWH ::




Custom Search