Database Reference
In-Depth Information
Set A and Set B. The comparison result is an output matrix M . It is also called the
Cartesian product or cross join of Set A and Set B.
All-Pairs is implemented in four phases: system modeling, input data distribu-
tion, batch job management, and result collection. In Phase I, an approximation
model of system performance will be built to assist in deciding how much CPU
is needed and how to conduct job partition. In Phase II, a spanning tree is built
for data transmission, which is completed within logarithmic time. This way, the
workload of every partition may effectively get input data. In Phase III, after the data
flow is delivered to proper nodes, the All-Pairs engine will build a batch-processing
submission for jobs in partitions, sequence them in the batch processing system, and
formulate a node running command to acquire data in nodes. The user-defined items
will be executed in proper partition jobs to generate results in batch, with the results
displayed in the output matrix. In the last phase, as the batch processing system
completes its jobs, the extraction engine will collect results and combine them in a
proper structure, which is generally a single file list. In the list, all results are put in
order. If the system hides the execution details from the user, the user may define
the data and calculation requirements using the given interfaces.
4.3.3.4
Pregel
The Pregel [ 34 ] system of Google facilitates the processing of large-sized graphs,
e.g., analysis of network graphs and social networking services. A computational
task is expressed by a directed graph constituted by vertexes and directed edges,
in which every vertex is related to a modifiable and user-defined value. Directed
edges are related to their source vertexes and every edge is constituted by a
modifiable and user-defined value and an identifier of a target vertex. After the
graph is built, the program conducts iterative calculations, which is called supersteps
among which global synchronization points are set until algorithm completion and
output completion. In every superstep, vertex computations are parallel and every
vertex executes the same user-defined function to express a given algorithm logic.
Every vertex may modify its status and the status of its output edges, receive a
message sent from the previous superstep, send the message to other vertexes, and
even modify the topological structure of the entire graph. Edges are not provided
with corresponding computations. Functions of every vertex may be removed by
suspension. When all vertexes are in an inactive status without any message to
transmit, the entire program execution is completed. The Pregel program output
is a set consisting of the values output from all the vertexes. Generally speaking, the
Pregel program output and input are an isomorphic directed graph.
Inspired by the aforementioned programming models, other researches have also
focused on programming modes for more complex computational tasks, e.g., iter-
ative computations [ 35 , 36 ], fault-tolerant memory computations [ 37 ], incremental
computations [ 38 ], and flow control decision-making related to data [ 39 ].
Search WWH ::




Custom Search