Database Reference
In-Depth Information
size by doing iterative variance reduction until the estimates lie within an acceptable
confidence interval. Third, it combines the estimates of all periodicities.
14.4.2 C onsiDering m ultiPle s ize P erioDiCities
Considering the periodicity of IP activity is imperative. Periodicities of the sizes of
the IPs were discovered by selecting a sample of IPs, and applying discrete Fourier
transform to each. The terms with the highest coefficients correspond to the peri-
odicities used by the PredictSizes algorithm [2]. The vast majority of IPs have diur-
nal and weekly periodicities. These periodicities are especially clear for the IPs of
school districts and large institutes.
PredictSizes fetches the estimates of several periodicities, for example, diurnal  and
weekly, for each IP to produce its prediction. For n periodicities, s 1 < s 2 < ⋯ < s n ,
PredictSizes considers the most recent w if estimates s if periods apart, for 1 ≤ if n . For
example, to estimate the sizes of IPs in six hours with all the sliding windows having
length 10, s 1 = 1, s 2 = 4, and s 3 = 28, PredictSizes considers the last 10 six-hour contigu-
ous estimates, as well as the same-slot estimates of the last 10 days and the last 10 weeks.
14.4.3 i iterative v arianCe r eDuCtion
PredictSizes deals with the sizes time series of each periodicity of each IP in isola-
tion. It then combines the predictions from all the periodicities of an IP as discussed
in Section 14.4.4.
For time series predictions, it is typical to do simple trend analysis using simple
linear regression to show consistent increase or decrease over time (allowing for
some white noise) [10]. The trend is then used for extrapolation. However, based on
analysis of numerous IPs, time series of size periodicities almost never show strong
trends within the window of estimates used for predictions. Moreover, using trend
analysis hurts IPs that have drastic size change, since false trends result in erroneous
predictions. Hence, PredictSizes assumes a stable value for each time series. The
stable value, the representative statistic on the time series, is calculated using the
StableSize function and is produced as the prediction.
For simplicity, StableSize deals with each periodicity time series as a set. For
each time series, StableSize does iterative variance reduction by removing outliers
that contribute the most to the variance until the ratio of the width of the confidence
interval to the mean falls to a given bound. The truncated mean of the remaining
sizes is declared the stable size.
At each iteration, StableSize calculates the standard deviation, mean of the time
series, and the width of its c -conidence interval. The element that contributes the
most to the variance is the farthest from the mean. This element can be identified in
constant time by checking the maximum and the minimum elements. This element
is deleted in each iteration. Each time an extreme element is deleted, the new mean
and variance are updated in constant time. The most costly process is identifying
the extreme elements, which can be done efficiently using a minmax heap. The algo-
rithm fails if the time series exhibits little stability, due to abrupt size changes or due
Search WWH ::




Custom Search