Information Technology Reference
In-Depth Information
describes the foundation frameworks: R, Hadoop, and Pig. The overall RPig
framework and its components are explained in Section 9.4. Experiments
and results are in Section 9.5. Finally, we talk about related work and give
our conclusion (Sections 9.6 and 9.7, respectively).
9.2 Motivating Scenarios
To demonstrate the need and usefulness of our RPig framework, we describe
two example use cases in the context of network management systems where
scalable statistical processing is necessary.
9.2.1 Intensive Scenario with Both Input/Output and
Central Processing Unit with Exponential Moving Average
In this first use case, a vast amount of events are collected from a given
mobile network and stored as event log files. An event is a report about a
particular service client (e.g., Viber voice over Internet protocol [VoIP] service
client) and contains information such as
ID|period_start|period_end|IMSI|IMEISV|RAT|...
|packets_downlink|packets_uplink|...
The exponential moving average (EMA) is a simple forecasting algorithm
based on historical sample data. Using the EMA, an analytic feature of a
network management system can forecast the amount of traffic of selected
service clients in the next time window when a request is sent. Because of
the vast number of events, it is impossible for R to load all data into memory
for a simple EMA calculation. However, Pig does not have the EMA function,
which R has.
This problem can be addressed by RPig, which allows log files to be effi-
ciently loaded, preprocessed (filtering, aggregating, etc.) by Pig in parallel,
and then directly passes the data to R for a final EMA calculation. In this
case, it is both an input/output (I/O) and central processing unit (CPU)
intensive scenario as it requires loading and preprocessing massive log files
from hard disks.
9.2.2 A CPU-Intensive Scenario with SVM
The SVM machine learning algorithms can be used for advanced classifica-
tion and regression analysis. Unknown data can be predicted by an SVM
model, which is built from training data in the training phase.
Search WWH ::




Custom Search