Database Reference
In-Depth Information
Over the past decades, people have widely utilized the traditional relational
databases to manage datasets. Consequently, programmers are familiar with the
advanced declarative language of SQL, a relational database, for task description
and dataset analysis. However, the succinct MapReduce framework only provides
two nontransparent functions without the common operations (e.g., projects and
filters). Therefore, programmers have to spend time on programming the basic
functions, which are generally hard to maintain and reuse. Incorporating the SQL
style in the MapReduce framework would be a promising solution. To this end,
some advanced language systems have been proposed, e.g., the Sawzall [ 28 ]of
Google, the Pig Latin [ 29 ] of Yahoo!, the Hive [ 30 ] of Facebook, and the Scope [ 3 ]
of Microsoft, so as to improve the programming efficiency and user friendliness.
4.3.3.2
Dryad
Dryad [ 31 ] is a general-purpose distributed execution engine for processing parallel
applications of coarse-grained data. The operational structure of Dryad is a directed
acyclic graph, in which vertexes represent programs and edges represent data
channels. Dryad executes operations on the vertexes in computer clusters and
transmits data via data channels, including documents, TCP connections, and
shared-memory FIFO. During operation, resources in a logic operation graph are
automatically map to physical resources.
The operation structure of Dryad is coordinated by a central program called job
manager, which can be executed in clusters or workstations of users. The user
workstations can access clusters through the network. A job manager includes
application codes and program library codes, in which application codes are used to
build a job communication graph and the program library codes are used to arrange
available resources. All kinds of data are directly transmitted between vertexes.
Therefore, the job manager is only responsible for decision-making, which does
not obstruct any data transmission.
In Dryad, application developers can flexibly choose any directed acyclic
graph to describe the communication modes of the application and express data
transmission mechanisms. In addition, Dryad allows vertexes to use any amount
of input and output data, while MapReduce supports limited computing, with only
one input set and generating only one output set. DryadLINQ [ 32 ] is the advanced
language of Dryad and is used to integrate the aforementioned SQL-like language
execution environment.
4.3.3.3
All-Pairs
All-Pairs [ 33 ] is a system specially designed for biometrics, bio-informatics, and
data mining applications. It focuses on comparing element pairs in two datasets by
a given function. The All-Pairs problem may be expressed as a three-tuples (Set A,
Set B, and Function F), in which Function F is utilized to compare all elements in
Search WWH ::




Custom Search