Database Reference
In-Depth Information
confirms these results. Therefore, we select Random Forest method for imple-
menting our prediction model and nDCG as the key accuracy metric for ranking
predictions.
Adjusting the Number of
to Learn. According to Friedman
et al. , Random Forest performs predictions by building a collection of de-
correlated trees, namely estimators, and then averages them. We investigated
the impact of the number of estimators in ranking accuracy, memory and com-
putation time. We varied the number of estimators progressively from 10 to
1000, with the same previous samples. Results show that the number of estima-
tors has a negligible impact in the accuracy of our model. While a model with
10 estimators scores 0.9594, 1000 scores 0.9569, slightly worse. One reason for
this might be the number of inputs, relatively small, that is likely to require
a small number of estimators. Yet, the number of estimators impacts on the
model overhead, specially for computation time. As depicted in Fig. 6 , the com-
putation time ranges from 0.3 ms with 10 estimators to almost 26 ms with 1000
ones. Although the worst case still represents low overhead, the lower the better.
Memory overhead is rather negligible, ranging from 30 to 32 MB. Overall, our
model has a quite low overhead, suitable for going online in large peer-assisted
VoD systems. Since there is no evidence to increase the number of estimators,
we keep 10 estimators as a default, fair setting.
Estimators
Evaluating Bigger Samples for Fitting the Model. Towards a higher
accuracy, we evaluated bigger samples for fitting our prediction model in its
learning phase, described in Subsect. 3.3 . We collected more information by run-
ning longer simulations. As expected, Fig. 7 confirms that we improve accuracy
through bigger samples. The improvement in accuracy was slight, about 0.03 as
we use a sample size almost six times bigger, i.e. 683,000. It is quite important to
highlight, though, that this has no impact on computation time of predictions.
Thus, we use the biggest sample for the remaining evaluations.
Analysing the Relative Importance of Model's Inputs. We were par-
ticularly interested in evaluating the contribution of each input of our model,
described in Subsect. 3.2 . Scikit-learn library allows us to measure the relative
importance of each input for predicting the ranking position using the Random
Forest method. Figure 8 highlights the relative importance for all 10 inputs
of our ranking model. The two most relevant inputs are the current number of
viewers and network availability. These inputs alone account for 99.6 % of the
all model's accuracy. It seems quite reasonable, since the former measures the
demand for a video and the later depicts the offer of network resources, the main
system feature for enforcing average bitrate. Based on the analysis of the current
datasets, the remaining eight inputs are less important to the ranking model's
accuracy. Surprisingly, the number of replicas, current network load, and video
Search WWH ::




Custom Search