Database Reference
In-Depth Information
TABLE 17.1
Results of Regression Analysis for the In-House Cluster and AWS Clusters
WordCount
Sort
PageRank
TableJoin
Local
AWS
Local
AWS
Local
AWS
Local
AWS
51.82
0
20.55
0
25.89
37.73
47.53
3.61
β 0
28.32
54.30
0.72
21.74
12.24
10.37
12.27
20.07
β 1
0.01
0
0
0
0
0.18
0
0
β 2
9.24
0
0
0
0
0
0
14.75
β 3
0
0
4.09
3.58
6.58
0
1.60
3.01
β 4
0
0
0
0
0
26.79
0
0
β 5
0.10
0
0.59
0.05
0.51
0
0.19
0
β 6
0.38
0
0
0
0
0
0
0
β 7
R 2
0.9751
0.9524
0.9692
0.9253
0.9847
0.9733
0.9647
0.8432
Note: R 2 values higher than 0.90 indicate good fit of the proposed model.
good accuracy, which may imply the run-time environment is the main reason. The
cause of the problem will be further studied in our future work.
17.6.3.2 Prediction Accuracy
We also conduct a careful analysis on the prediction accuracy of the models. The
leave-one-out [6] cross-validation is used to identify the average prediction accuracy
and also the outliers that have low accuracy. Concretely the leave-one-out cross-
validation runs in n rounds if there are n training samples. In each round, one of the
n samples is used for testing, while the other n − 1 samples for training.
Figures 17.3 and 17.4 show the comparison between the actual running time and
the predicted running time for each sample case. The x -axis represents the actual
running time, and the y -axis the predicted time. In ideal cases, all the points will be
distributed on the line y = x , which is shown as the solid line. These figures show that
the points are very close to the ideal line, indicating excellent prediction accuracy.
We define the average accuracy as the average relative errors (ARE) over the n
rounds of testing in the cross-validation. Let C i be the real cost and Ĉ i be the esti-
mated cost by the trained model in the round i . We calculate ARE with the following
equation.
ˆ
n
CC
C i
1
1
ARE
=
(17.19)
n
i
=
1
Intuitively, this represents the percentage of prediction error in terms of the actual
execution time. Table 17.2 shows the AREs in leave-one-out cross-validation. The
result confirms most models are robust and perform well. However, certain models
such as PageRank in the local cluster perform less effectively than others. A further
detailed study will be performed to understand the factors affecting the modeling.
 
Search WWH ::




Custom Search