Information Technology Reference
In-Depth Information
With regard to fault tolerance, the fault handling happens in two different
layers, the node layer and the R engine layer. The underlying Hadoop frame-
work provides failure handling on nodes of the cluster. If a node fails during
the execution of an RPig job, Hadoop will restart the task of the failed node
in an alternate node. Within a node task, the RPig framework allows the user
to define the fault policy to handle errors from an R engine execution on an R
function. For example, by default func _ name.fault.ignore←T . This policy
ignores any exceptions and continues. Also, func _ name.fault.retry←1 .
This allows at most one retry when an exception occurs. If an R  execution
fails during the map task, then a remedy action defined in a failure policy of
the named R function will be applied, and the failure event will be logged by
the RPig framework. The user still can use R's tryCatch() function within
an R function to define the fault-handling mechanisms within the R session,
but the fault policy of RPig allows the user to restart the R function in a brand
new R session.
R functions may run exceedingly slowly on occasion, and the user would
expect a way to monitor the UDF execution time and terminate its execution
if it runs too long. RPig offers the facility for monitoring long-running
R  functions. For example, func _ name.monitoredUDF.duration←10
will terminate the named R function if it runs for more than 10 seconds and
return the default value of null.
9.4.4 Implementation
There are several libraries used for the RPig implementation. Renjin [10] is
used for the JVM-based R engine. Since the stand-alone R is implemented in
C and Fortan and Pig is written in Java, Rsession [11] is adopted as the Java
interface of R to use the Pig APIs. Pig offers Java annotation-based imple-
mentation for a monitored Java UDF. To build the same function for R UDFs,
we need to create a new Java class with annotations for each R function at
runtime. The Javassist [12] is used for defining a new class at runtime and to
modify a class file when the JVM loads it.
9.5 Use Case and Experiment
In this section, we describe the usage of RPig with the examples we discussed
in Section 9.2. To provide valuable comparative experimental results, we
also describe and experiment with one alternative framework or implemen-
tation for each use case. Although the use cases here are from the telecom
domain, the design and the solution we describe are general and applicable
to other domains.
Our experiments are conducted in Amazon Elastic MapReduce (EMR), for
which we have all nodes with the same configuration (m1.medium instance,
Search WWH ::




Custom Search