Boosting Streaming Video Delivery with WiseReplica - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

network usage:

P min and

P max . Our replication strategy works as follows. A video

v

that has

N

replicas in peers with network capacity of

b

requires more replicas

>P max i =1 b

if the current bandwidth reservation

U

(

v

)

. Similarly, if

U

(

v

)

<

P min i =1 b

, replicas can be deleted. Otherwise, keep the replication degree.

Although this empirical approach is hard to be adopted in a real deployment, our

previous results [ 29 ] suggest that it allows us to achieve near-optimal results, pre-

venting all SLA violations, enhancing network usage and decreasing storage usage

dramatically.

5.4 Collecting the Datasets for Learning

To perform rank predictions of Internet videos, we need training datasets from

which we can learn the behaviour of video demand in peer-assisted VoD systems.

In this section, we explain the methodology to gather data for these predictions.

The training dataset of our prediction model comes from measurements of the

request arrival process on per-assisted VoD systems, as described in Subsect. 3.2 .

Each line of our training dataset has 11 values, 10 input measurements about a

video current state, and a rank position. Although, the datasets evaluated in this

work were synthetically collected by performing simulations with the Oracle-like

benchmark replication approach (detailed in Subsect. 5.3 ), similar datasets can

be collected from monitoring systems of running CDN systems.

In this work, Oracle-like benchmark replication approach (Subsect. 5.3 ) rep-

resents the near-optimal way to serve VoD service according to video encodings

and popularity, whose functioning we are very interested in learning. In this

empirical approach, a video requires additional replicas only if there exists a cer-

tain number of concurrent accesses, where concurrence is measured by checking

a high threshold of the current reserved bandwidth, as detailed in Subsect. 5.3 .

We assume that popular videos are those that have additional replicas during its

lifetime. Since Internet videos popularity distribution follows a Zipf-like distribu-

tion [ 33 ], concurrent access are rare events as well as popular videos classified by

this approach, thus it provides a quite fair approach to identify popular videos.

Raw data from Oracle-like technique permits easily distinguishing between

two ranking positions only, non-popular and popular videos, i.e. requests to

non-popular videos are all those that do not trigger any replica creation, or

those that resulted in deletions. However, there is a lack of information about

different ranking positions of popular videos. Hence, depending on the frequency

of replica creation, we add information to requests to popular videos classifying

them in popular, very popular, or viral. To define these three levels of hotness ,

we run simulations with YouTube traces, collected the distribution of replicas

creation in milliseconds, and split it in three nearly equal parts by observing the

66-percentile and 33-percentile inter-creation time for new replicas. This means

that the higher is the frequency of replica creation, the hotter is the video, and

the higher is the ranking position. Now, collected data suit model's definitions

well.

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home