Database Reference
In-Depth Information
FIGURE 2.17
Sample Jaql script. (From K. S. Beyer et al., PVLDB , 4(12), 1272-1283, 2011.)
systems (e.g., Hadoop's HDFS), database systems (e.g., DB2, Netezza, HBase), or
from streamed sources like the Web. Unlike federated databases, however, most of
the accessed data is stored within the same cluster and the I/O API describes data par-
titioning, which enables parallelism with data affinity during evaluation. Jaql derives
much of this flexibility from Hadoop's I/O API. It reads and writes many common
file formats (e.g., delimited files, JSON text, Hadoop sequence files). Custom adapt-
ers are easily written to map a data set to or from Jaql's data model. The input can
even simply be values constructed in the script itself. The Jaql interpreter evaluates
the script locally on the computer that compiled the script, but spawns interpreters
on remote nodes using MapReduce. The Jaql compiler automatically detects paral-
lelization opportunities in a Jaql script and translates it to a set of MapReduce jobs.
2.5 SAMPLE MapReduce-BASED APPLICATIONS
MapReduce-based systems are increasingly being used for large-scale data analysis.
There are several reasons for this such as [77]
The interface of MapReduce is simple yet expressive . Although MapReduce
only involves two functions map and reduce, a number of data analytical
tasks including traditional SQL query, data mining, machine learning, and
graph processing can be expressed with a set of MapReduce jobs.
MapReduce is flexible. . It is designed to be independent of storage systems
and is able to analyze various kinds of data, structured, and unstructured.
MapReduce is scalable . Installation of MapReduce can run over thousands
of nodes on a shared-nothing cluster while keeping to provide fine-grain
fault tolerance whereby only tasks on failed nodes need to be restarted.
These main advantages have triggered several research efforts with the aim of
applying the MapReduce framework for solving challenging data-processing prob-
lems on large-scale data sets in different domains. For example, [53] have proposed
an SQL-like query language for large-scale analysis of XML data on a MapReduce
platform, called MRQL (the Map - Reduce Q uery L anguage). The evaluation sys-
tem of MRQL leverages the relational query optimization techniques and compiles
Search WWH ::




Custom Search