Database Reference
In-Depth Information
over distributed systems (e.g., the cloud). A distributed programming model defines
how easily and efficiently algorithms can be specified as distributed programs.
For instance, a distributed programming model that highly abstracts architectural/
hardware details, automatically parallelizes and distributes computation, and trans-
parently supports fault tolerance is deemed an easy-to-use programming model. The
efficiency of the model, however, depends on the effectiveness of the techniques that
underlie the model. There are two classical distributed programming models that are
in wide use, shared memory and message passing . The two models fulfill different
needs and suit different circumstances. Nonetheless, they are elementary in a sense
that they only provide a basic interaction model for distributed tasks and lack any
facility to automatically parallelize and distribute tasks or tolerate faults. Recently,
there have been other advanced models that address the inefficiencies and challenges
posed by the shared-memory and the message-passing models, especially upon port-
ing them to the cloud. Among these models are MapReduce [17], Pregel [49], and
GraphLab [47]. These models are built upon the shared-memory and the message-
passing programming paradigms, yet are more involved and offer various properties
that are essential for the cloud. As these models highly differ from the traditional
ones, we refer to them as distributed analytics engines .
1.5.2.1 The Shared-Memory Programming Model
In the shared-memory programming model, tasks can communicate by reading
and writing to shared memory (or disk) locations. Thus, the abstraction provided
by the shared-memory model is that tasks can access any location in the distributed
memories/disks. This is similar to threads of a single process in operating systems,
whereby all threads share the process address space and communicate by reading
and writing to that space (see Figure 1.4). Therefore, with shared-memory, data is not
explicitly communicated but implicitly exchanged via sharing. Due to sharing, the
shared-memory programming model entails the usage of synchronization mecha-
nisms within distributed programs. Synchronization is needed to control the order
in which read/write operations are performed by various tasks. In particular, what
is required is that distributed tasks are prevented from simultaneously writing to a
shared data, so as to avoid corrupting the data or making it inconsistent. This can
be typically achieved using semaphores , locks , and/or barriers . A semaphore is
a point-to-point synchronization mechanism that involves two parallel/distributed
Spawn
S1
T1
T2
T3
T4
Join
S2
Shared address space
FIGURE 1.4 Tasks running in parallel and sharing an address space through which they
can communicate.
 
Search WWH ::




Custom Search