Information Technology Reference
In-Depth Information
aggregate traffic consumption (both up/downlink) on selected VoIP clients
(e.g., Skype, Viber) in a time window on events described in Section 9.2.1.
Events = LOAD '$load_par' USING PigStorage('|') AS
(ID, period_s:LONG,...,);
Events = FILTER Events BY (client = = 'Viber' OR...);
Traffics = FOREACH (GROUP Events BY (period_s, period_e, client))
GENERATE FLATTEN (group), (SUM(Events.downlink)+SUM(Events.
uplink)) AS links:DOUBLE;
9.4 The Framework
An initial version of the RPig framework [9] was implemented as a proof-of-
concept prototype. The framework provides the RPig script for users to write
analytic jobs. The RPig script inherits Pig script syntaxes as the language
skeleton but allows defining inline R scripts as R functions. An R function
element will be interpreted as an input payload of a predefined Pig extended
function or Pig UDF, which handles the payload at the execution stage. This
design gives us a quick implementation by only using the Pig UDF APIs
without going through the Pig source code. However, it is not an optimal
approach for integrating Pig and R. RPig script has its own constructs, and
it needs to generate additional Pig supporting statements in execution. The
initial version also has the large performance overhead of the data exchange
between R and Pig.
To improve the performance of RPig and to integrate R and Pig in an
optimal way, we completely redesigned and rewrote the source code to over-
come the aforementioned disadvantages of the initial version. By doing so,
we have brought the research prototype to an early production stage. Some of
the main advantages of the current version over the initial proof-of-concept
version are the following:
• There is seamless integration with Apache Pig by having a built-in
R script extension similar to other Pig script extensions, such as
Python and JavaScript.
• Only standard R and Pig language syntaxes are used without any
new language constructs. It allows the use of any existing R and
Pig script integrated development environment (IDEs).
• There is support for two types of R engines. R UDFs can be executed
on the Java virtual machine (JVM) or a stand-alone R engine.
• Much faster performance is provided. Optimized data conversion
and verbosity XML (extensible markup language) messages are not
involved as the intermediate data format.
Search WWH ::




Custom Search