Distributed Programming for the Cloud - Large Scale and Big Data: Processing and Management - page 18

Database Reference

In-Depth Information

graph is partitioned into k partitions using a hash-based random algorithm [47], with

k being much larger than the number of cluster machines. A partition in GraphLab

is called an atom . GraphLab does not store the actual vertices and edges in atoms,

but commands to generate them. In addition to commands, GraphLab maintains in

each atom some information about the atom's neighboring vertices and edges. This

is denoted in GraphLab as ghosts . Ghosts are used as a caching capability for effi-

cient adjacent data accessibility. In the second phase of the two-phase partitioning

strategy, GraphLab stores the connectivity structure and the locations of atoms in

an atom index file referred to as metagraph . The atom index file encompasses k

vertices (with each vertex corresponding to an atom) and edges encoding connectiv-

ity among atoms. The atom index file is split uniformly across the cluster machines.

Afterward, atoms are loaded by cluster machines and each machine constructs its

partitions by executing the commands in each of its assigned atoms. By generating

partitions via executing commands in atoms (and not directly mapping partitions to

cluster machines), GraphLab allows future changes to graphs to be simply appended

as additional commands in atoms without needing to repartition the entire graphs.

Furthermore, the same graph atoms can be reused for different sizes of clusters by

simply re-dividing the corresponding atom index file and re-executing atom com-

mands (i.e., only the second phase of the two-phase partitioning strategy is repeated).

In fact, GraphLab has adopted such a graph partitioning strategy with the elasticity

of clouds being in mind. Clearly, this improves upon the direct and non-elastic hash-

based partitioning strategy adopted by Pregel. Specifically, in Pregel, if graphs or

cluster sizes are altered after partitioning, the entire graphs need to be repartitioned

prior to processing.

1.5.5 s ymmetriCal anD a symmetriCal a rChiteCtural m oDels

From architectural and management perspectives, a distributed program can be typi-

cally organized in two ways, master/slave (or asymmetrical ) and peer-to-peer (or

symmetrical ) (see Figure 1.12). There are other organizations, such as hybrids of

asymmetrical and symmetrical, which do exist in literature [71]. For the purpose of

our chapter, we are only concerned with the master/slave and peer-to-peer organiza-

tions. In a master/slave organization, a central process known as the master handles

Master

Master

(Not necessary)

Slave

Slave

Slave

Slave

Slave

Slave

(a)

(b)

FIGURE 1.12 (a) A master/slave organization. (b) A peer-to-peer organization. The master

in such an organization is optional (usually employed for monitoring the system and/or inject-

ing administrative commands).

Next Page

Large Scale and Big Data: Processing and Management

Search WWH ::

Custom Search

Home