Boosting Streaming Video Delivery with WiseReplica - Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Database Reference

In-Depth Information

3.1 An Overview About How to Learn from Data

Statistical learning is about learning from seen data in order to predict unseen

data with minimal error. Data comprise inputs x represented by a vector with

a fixed number of dimensions

∈X ↂ R p ) from the input space

.In

our problem, x is a video, represented by a vector of measurements from video

sessions' activity.

In supervised learning, each input measurement is coupled with a

( x

, a label

selected by an oracle, from the output space

)

drawn independently and identically distributed (i.i.d.) from a fixed but unknown

joint probability density

. To learn, we take

pairs ( x

). This is true for both training and testing

datasets. For instance, we consider the training dataset

(

X, Y

x i ,y i } i =1 of

{

pairs ( x

). Using this dataset, the supervised learning algorithm searches for a

function

. State-of-the-art algorithms, such

as support vector machines (SVM)[ 11 ]or ensemble methods [ 16 ], aim to find

f in

Xₒ R

in a fixed function class

with the lowest empirical risk defined as:

f ∈

arg min

f ∈F

r emp (

)

(1)

i =1 I {f ( x ) = y i } is computed over the training set, and

)= 1

where

r emp (

I {.} is

the indicator function which returns 1 if the predicate

{.}

is true and 0 otherwise.

In other terms,

r emp is a quality measure relating the label to the prediction

provided by the function

To model our prediction problem, we use a statistical learning approach

called learning-to-rank. This approach has been a hot topic in Machine Learn-

ing community for the last 10 years. It combines properties of two well-known

other approaches: regression, where

on the training dataset

y ∈Yↂ R

; classification where

y ∈Yↂ

{

, ..., K}

with

K ≥

1. In learning-to-rank approach,

gives an indication on

the target order (formally represented by a permutation

σ ∈

ʣ).

3.2 A Ranking Model for Internet Videos

The main purpose of our learning model is to capture popularity growth dynam-

ics and system resources availability of peer-assisted VoD systems. Therefore, we

assume that prediction model must allow us to rank Internet videos in order of

demand. This can be modelled as a learning-to-rank problem.

Given an i.i.d. sample ( x

) such as described in Subsect. 3.1 , we model

inputs and outputs as follows.

Inputs. We represent the input space x is a video described as 10 lightweight

measurements from the request arrival process. These measurements are video

size, network availability, network usage (load), current number of viewers and

replicas, inter-arrival time between requests (delta), aggregate number of views,

mean of time between requests (mtbr), life time, and average bitrate. We com-

pute averages and means from up to the five last requests. Our goal is to gather

as much information about users' interactions as possible in an easy manner to

Transactions on Large-Scale-Data-and Knowledge-Centered Systems XX

Search WWH ::

Custom Search

Home