Database Reference
In-Depth Information
6 Evaluation
The utmost goal of our performance evaluation is two-fold: (i) measure the accu-
racy of our learning model in ranking Internet videos in order of hotness ,and
(ii) evaluate the performance of our replication scheme in meeting viewers' expec-
tations in peer-assisted VoD systems. Further details about evaluation set-up are
available in Sect. 5 .
6.1 Performance Evaluation Metrics
We aim to evaluate the performance of two main WiseReplica modules: machine-
learned ranking and replication strategy. Hence we group evaluation metrics as
follows:
Machine-Learned Ranking Accuracy. We adopt the normalized Discounted
Cumulative Gain (nDCG) criterion as the main evaluation metric for our learn-
ing model. nDCG is a standard quality measure in information retrieval, espe-
cially for Web search [ 19 , 22 ]. We implement DCG measure proposed by Burges
et al. [ 8 ]. Therefore, DCG is defined as
DCG L = i =1
2 F ( i ) 1
log 2 (1+ i ) , where
L
is the
F
i
i
global set of ranked videos, and
th video. To com-
pute nDCG, we divide DCG measure by the idealized DCG with perfect order of
the set
(
) is the rank position of
. Thus, the perfect model scores 1. Unlike typical information retrieval
problems, as a ranking of web content, our model does not have the notion of
query . Instead, we rely on nDCG robustness to measure the performance of our
learning model as a global ranking problem. Since the ranking problem shares
properties with both classification and regression problems, we compare nDCG
to other three popular machine learning metrics: the mean square error, a stan-
dard metric for regressions; precision, for classification; and a less robust, well-
known variant of nDCG, namely in this work nDCG(2), described by Croft et al.
in [ 12 ]. We evaluate three different state-of-the-art ensemble learning methods
available in Scikit-learn library: Random Forest, Extremely Random-
ized Trees,andGradient Tree Boosting. Moreover, we report briefly on
the sample size for learning, number of estimators or learners of ensemble meth-
ods, measurements or features importance, and the computational overhead of
our model, including memory usage and computation time for prediction.
L
Metrics for Replication Strategies in Peer-Assisted VoD Systems. Ass-
uming that content and CDN providers are committed to enforcing bitrate as
main QoS metric through SLA contracts, we consider SLA violation as the
primary performance metric. Thus, a SLA violation happens whenever the peer-
assisted VoD system does not provide the minimum average bitrate for prevent-
ing rebuffering. This measures the WiseReplica capacity of meeting consumers'
expectations. We also investigate the impact of our replication scheme using
storage domains in peer-assisted VoD systems. To this end, our evaluation met-
rics are network and storage usage. Finally, we compare WiseReplica results
with a non-collaborative caching and the Oracle-like assumption, described in
Subsect. 5.3 .
Search WWH ::




Custom Search