Information Technology Reference
In-Depth Information
An increasing amount of phone calls are made by various VoIP clients, such
as Viber and Skype. One approach for monitoring the service quality of VoIP
is using network-level key performance indicators (N-KPIs) at the Internet
protocol (IP) layer, such as packet loss or jitter, to predict the mean opinion
score (MOS), which is a standard speech quality measurement parameter [4].
An SVM-based regression algorithm is used in this case, but it is a complex
algorithm, usually involving long computation times on a relatively small
amount of data in the training phase. RPig enables us to define and execute
the SVM algorithms in the MapReduce model for both SVM training and
prediction phases without writing any key-value pair MapReduce functions.
As a result, the performance becomes scalable to cluster size, and develop-
ment effort is reduced.
This use case deals with a complex machine learning algorithm, which is
CPU intensive rather than I/O intensive. R's in-memory computation takes
most of the overall computation time with a few data in an analysis job.
RPigĀ supports parallelism for various requirements in different scenarios.
9.3 Background
Big data [5] are data in volumes so large and complex that they become
difficult to process using on-hand database management tools or traditional
data-processing applications. Since Google published its MapReduce tech-
nology and Apache started the Hadoop project in 2004 and 2005, MapReduce
and Hadoop have become a generic and foundational approach for develop-
ing scalable, cost-effective, flexible, fault-tolerant big data systems [6]. Many
frameworks, such as Pig and Hive, have been developed based on Hadoop,
adding features on it. As Hadoop systems are more widely adopted in
industry, the requirements of the real-world problems are driving the
Hadoop ecosystem to become even richer. For example, Oozie and Azkaban
provide workflow and scheduling management. Impala and Shark aim at
low-latency real-time queries. Our work, RPig, is one of many frameworks,
such as Mahout and DataFu [7], targeting deep analytics. In the following
sections, we briefly describe the frameworks on which the RPig is based.
9.3.1 R and R Packages
R is a programming language and software environment widely used for sta-
tistical computing and deep data analysis, such as classification, and regression.
R is extensible through R packages. There are thousands of RĀ packages that
implement massive specialized machine learning and statistical algorithms.
Search WWH ::




Custom Search