Database Reference
In-Depth Information
6 Evaluation
The utmost goal of our performance evaluation is two-fold: (i) measure the accu-
racy of our learning model in ranking Internet videos in order of
hotness
,and
(ii) evaluate the performance of our replication scheme in meeting viewers' expec-
tations in peer-assisted VoD systems. Further details about evaluation set-up are
available in Sect.
5
.
6.1 Performance Evaluation Metrics
We aim to evaluate the performance of two main WiseReplica modules: machine-
learned ranking and replication strategy. Hence we group evaluation metrics as
follows:
Machine-Learned Ranking Accuracy.
We adopt the normalized Discounted
Cumulative Gain (nDCG) criterion as the main evaluation metric for our learn-
ing model. nDCG is a standard quality measure in information retrieval, espe-
cially for Web search [
19
,
22
]. We implement DCG measure proposed by Burges
et al.
[
8
]. Therefore, DCG is defined as
DCG
L
=
i
=1
2
F
(
i
)
−
1
log
2
(1+
i
)
, where
L
is the
F
i
i
global set of ranked videos, and
th video. To com-
pute nDCG, we divide DCG measure by the idealized DCG with perfect order of
the set
(
) is the rank position of
. Thus, the perfect model scores 1. Unlike typical information retrieval
problems, as a ranking of web content, our model does not have the notion of
query
. Instead, we rely on nDCG robustness to measure the performance of our
learning model as a global ranking problem. Since the ranking problem shares
properties with both classification and regression problems, we compare nDCG
to other three popular machine learning metrics: the mean square error, a stan-
dard metric for regressions; precision, for classification; and a less robust, well-
known variant of nDCG, namely in this work nDCG(2), described by Croft
et al.
in [
12
]. We evaluate three different state-of-the-art ensemble learning methods
available in Scikit-learn library: Random Forest, Extremely Random-
ized Trees,andGradient Tree Boosting. Moreover, we report briefly on
the sample size for learning, number of estimators or learners of ensemble meth-
ods, measurements or features importance, and the computational overhead of
our model, including memory usage and computation time for prediction.
L
Metrics for Replication Strategies in Peer-Assisted VoD Systems.
Ass-
uming that content and CDN providers are committed to enforcing bitrate as
main QoS metric through SLA contracts, we consider SLA violation as the
primary performance metric. Thus, a SLA violation happens whenever the peer-
assisted VoD system does not provide the minimum average bitrate for prevent-
ing rebuffering. This measures the WiseReplica capacity of meeting consumers'
expectations. We also investigate the impact of our replication scheme using
storage domains in peer-assisted VoD systems. To this end, our evaluation met-
rics are network and storage usage. Finally, we compare WiseReplica results
with a non-collaborative caching and the Oracle-like assumption, described in
Subsect.
5.3
.
Search WWH ::
Custom Search