Information Technology Reference
In-Depth Information
In the following sections, we describe the current version of the framework
in detail.
9.4.1 The R Script Engine Extension
To integrate R and Pig and take advantage of both, the R language is expected
to be supported to define Pig UDFs for specifying custom processing in
Pig data flows. Pig already supports a number of languages, such as Python
and JavaScript for UDFs. They are implemented as different script engine
extensions in Pig. That is, an R script engine extension ( RScriptEngine ) is
required for our case. It wraps the R engine in the back end, which can inter-
pret R scripts at runtime (Figure 9.1). The user defines R functions as UDFs in
an R script and makes Pig aware of the R script by using the Pig REGISTER
statement in a data flow (step 1 of Figure 9.1). An RScriptEngine will be
initialized, and it will register the defined R functions. The RScriptEngine
will be shipped within Pig-generated MapReduce programs to all Hadoop
task nodes during execution (step 2 of Figure  9.1). RScriptEngine can
execute the registered R functions in the back-end R engine by providing a
bridge function for interactions between Pig and R (step 3 of Figure 9.1).
The back-end script engine is usually selected from the Java implementations
of the script language. For example, Jython and Rhino are used for Python and
JavaScript back-end engines, respectively. This enables running the script lan-
guages on the JVM where Hadoop and Pig are running. Hence, no additional
back-end script engine is required to be installed on every host along with the
Pig
script
Pig/Hadoop
R script
(1)
Analyst/
developer
R script engine
extension
(Bridge)
JobTracker
(2)
MapReduce
program
MapReduce
program
R
on JVM
R
on JVM
R script engine
extension
(Bridge)
R script engine
extension
(Bridge)
...
Pig/Hadoop
Pig/Hadoop
R
Stand-alone
R
Stand-alone
(3)
(3)
Ta skTracker
Ta skTracker
FIGURE 9.1
The framework overview.
Search WWH ::




Custom Search