Boosting Streaming Video Delivery with WiseReplica - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

confirms these results. Therefore, we select Random Forest method for imple-

menting our prediction model and nDCG as the key accuracy metric for ranking

predictions.

Adjusting the Number of

to Learn. According to Friedman

et al. , Random Forest performs predictions by building a collection of de-

correlated trees, namely estimators, and then averages them. We investigated

the impact of the number of estimators in ranking accuracy, memory and com-

putation time. We varied the number of estimators progressively from 10 to

1000, with the same previous samples. Results show that the number of estima-

tors has a negligible impact in the accuracy of our model. While a model with

10 estimators scores 0.9594, 1000 scores 0.9569, slightly worse. One reason for

this might be the number of inputs, relatively small, that is likely to require

a small number of estimators. Yet, the number of estimators impacts on the

model overhead, specially for computation time. As depicted in Fig. 6 , the com-

putation time ranges from 0.3 ms with 10 estimators to almost 26 ms with 1000

ones. Although the worst case still represents low overhead, the lower the better.

Memory overhead is rather negligible, ranging from 30 to 32 MB. Overall, our

model has a quite low overhead, suitable for going online in large peer-assisted

VoD systems. Since there is no evidence to increase the number of estimators,

we keep 10 estimators as a default, fair setting.

Estimators

Evaluating Bigger Samples for Fitting the Model. Towards a higher

accuracy, we evaluated bigger samples for fitting our prediction model in its

learning phase, described in Subsect. 3.3 . We collected more information by run-

ning longer simulations. As expected, Fig. 7 confirms that we improve accuracy

through bigger samples. The improvement in accuracy was slight, about 0.03 as

we use a sample size almost six times bigger, i.e. 683,000. It is quite important to

highlight, though, that this has no impact on computation time of predictions.

Thus, we use the biggest sample for the remaining evaluations.

Analysing the Relative Importance of Model's Inputs. We were par-

ticularly interested in evaluating the contribution of each input of our model,

described in Subsect. 3.2 . Scikit-learn library allows us to measure the relative

importance of each input for predicting the ranking position using the Random

Forest method. Figure 8 highlights the relative importance for all 10 inputs

of our ranking model. The two most relevant inputs are the current number of

viewers and network availability. These inputs alone account for 99.6 % of the

all model's accuracy. It seems quite reasonable, since the former measures the

demand for a video and the later depicts the offer of network resources, the main

system feature for enforcing average bitrate. Based on the analysis of the current

datasets, the remaining eight inputs are less important to the ranking model's

accuracy. Surprisingly, the number of replicas, current network load, and video

Search WWH ::

Custom Search

Home