Database Reference
In-Depth Information
12.5 DEDUCE
In general, while MapReduce offers the capability to analyze several terabytes of
stored data, stream processing solutions offer the ability to process, possibly, a few mil-
lion updates every second. However, there is an increasing number of data-processing
applications, which need a solution that effectively and efficiently combines the ben-
efits of MapReduce and stream processing to address their data-processing needs.
DEDUCE [16] is a middleware that has been designed to offer a unified abstraction and
runtime for addressing the needs of modern data-processing applications. It attempts
to combine real-time stream processing with the capabilities of a massive data analysis
framework like MapReduce by providing the following features:
Language Constructs : DEDUCE extends SPADE'S data-flow composition
language to enable the specification and use of MapReduce jobs as data-
flow elements.
Reusable Modules : DEDUCE provides the capability to describe reusable
modules for implementing offline MapReduce tasks aimed at calibrated
analytic models.
Runtime Support : DEDUCE augments the System S runtime infrastructure
to support the execution and optimized deployment of map and reduce tasks.
Control Parameters : DEDUCE provides configuration parameters (e.g.,
update frequency, resource utilization hints, etc.) associated with the
MapReduce jobs that can be tweaked to perform end-to-end system optimi-
zations and shared resource management.
The DEDUCE-specific language extensions to the SPADE language has been
designed to achieve the following goals:
1. To be able to easily specify the MapReduce jobs
2. To support MapReduce jobs as composable data-flow elements
3. To provide the capability for creating domain-specific collection of map
and reduce modules
In particular, the DEDUCE language extensions consists of two important com-
ponents: the DEDUCE Operator Toolkit and the module specification framework.
DEDUCE Operator Toolkit contains the following operators:
MapReduce Operator : DEDUCE models the MapReduce job as a SPADE
operator. This approach simplifies the design of applications that combine the
data at rest with the data in motion. While the input data set for a MapReduce
job can either be specified as a parameter to the operator or as a punctuated
input stream containing the location of directories or files to be processed, the
output of the MapReduce job is written to a prespecified location on the dis-
tributed file system and the location of this output data is optionally available
as a punctuated output stream from the MapReduce operator.
Search WWH ::




Custom Search