Database Reference
In-Depth Information
clear winner, but linear regression is a clear loser. The regression technique selected
is application-dependent.
Linear
Regression
Quantile
Regression
Percentage
Regression
PCA+MARS
RMSE
24.95
35.33
22.49
158.46
Relative error
0.261
0.22
0.387
0.169s
Average bucket error
0.256
0.231
0.284
0.841
14.4 IP SIZE PREDICTION
As discussed in Section 14.2.2, the estimates-tables produced by the Est (.) phase
over a window of size w are input to the Prd (.) phase. To reduce the lookahead delay,
the prediction algorithm was developed with high focus on efficiency.
14.4.1 t he s ize P reDiCtion a lternative a PProaChes
Predicting the size of every possible IP based on its previous size estimates is a mas-
sive web-scale time series analysis problem. Typically, a time series prediction is
carried out by doing a weighted average of the previous w values, where the weights
are usually calculated offline using regression analysis. This is effective with time
series with high autocorrelation where the current state is a function of previous
states and white noise. This is customarily combined with smoothing, such as mov-
ing averages, to reduce noise, yielding the commonly known autoregressive moving
average (ARMA) models [10]. The ARMA model is used to model time series with
both autoregressive and moving average part.
There are several challenges with applying conventional time series procedures
for predicting sizes of IPs. First, the time series of sizes are nonstationary. The time
series of each IP does not follow the same distribution over time due to, among
other factors, the reassignments of the dynamic IPs [27]. Such abrupt and unfore-
seen changes to sizes cause lack of stationarity, which limits the application of some
techniques, like ARMA.
The second and bigger challenge is the estimates of each IP form a time series that
should be analyzed to produce a prediction for this IP. Given the high heterogeneity in the
behavior of IPs according to numerous factors including their assignments, time zones,
and sizes, building a one-size-fits-all predictive model based on a sample of IPs becomes
impracticable. Therefore, a specialized and efficient prediction algorithm is required.
The PredictSizes algorithm employs some concepts from seasonal autoregressive inte-
grated moving average (ARIMA) models [10]. The ARIMA model is further generaliza-
tion of an ARMA model and is applied to nonstationary data with some seasonality or
trend. At a high level, for each IP, PredictSizes predicts its size in isolation based on its
latest w size estimates. For each IP, the PredictSizes algorithm performs three main func-
tions. First, it analyzes the periodicity of the size estimates, since it has been consistently
observed that the activity of IPs is periodic (Section 14.4.2). Second, for each periodicity,
PredictSizes analyzes a sliding window of estimates and seeks their representative stable
 
Search WWH ::




Custom Search