Information Technology Reference
In-Depth Information
JVM. However, this is not a case for the R language since there is no mature
JVM-based R interpreter. Some preliminary implementations are available,
such as Renjin [10], but it is incompatible with most R packages except for some
basic libraries. As a result, two types of back-end R engine are supported in
our implementation (JVM based and stand-alone R). JVM-based R can be used
for some basic statistical functions (e.g., standard deviation) without requiring
the installation of a script engine on all nodes. Stand-alone R is able to use any
R functions and packages. However, the R must be preinstalled on all hosts.
9.4.2 Data-Type Conversions
It is necessary for data to be passed back and forth between Pig and R
during the R function executions. Since the two languages Pig and R have
very different data models, the data must go through a conversion pro-
cess, which is one of the main responsibilities of the R bridge function. The
data-type conversion is done automatically based on the set of predefined
rules discussed next.
9.4.2.1 From Pig to R
• Simple data type
int: integer ; long/float/double: double ; chararray/
datetime: character bytearray: raw ; boolean: logical
(e.g., null: NULL); datatime: POSIXlt,POSIXt;
• Complex data type
tuple: list , e.g. (19,2): list(19,2); dataBag: nested list ,
e.g. {(19,2), (18,1)}: list(list(19,2), list(18,1)); map: named list , e.g.,
[apache#pig]: list(key = “apache”, value = “pig”)
• Anything else raises an exception
Any nested data objects in Pig, such as nested tuples, will be converted to
nested lists in R. Due to the different purposes of the two languages, there
is no exact semantic match between all data types in their data models. For
example, the map[key#value] type of Pig is hardly used in statistical com-
puting, so we convert it to a named list(key = key, value = value) ,
which is an ordered collection in R. Users can still convert the converted
R object to other R data types via R operations (inside R functions) if neces-
sary. For example, it would be possible to convert a nested list to a data frame
or a matrix.
9.4.2.2 From R to Pig
When the data must be sent back to Pig after R execution, a user-defined output
schema of the R function is needed. This allows the user to specify what they
Search WWH ::




Custom Search