Information Technology Reference
In-Depth Information
and for the Statistics Canada datasets, the windowed cross-validation was superior and for the toner
cartridge manufacturer the unordered approach was better. Accordingly, we are impartial regarding the
cross-validation (CV) procedure. Since the standard cross-validation procedure is simpler to implement
than the windowed counterpart and, since there can be more models tested while at the same time using
more data for each CV model and, further, since the error curves seemed more stable (had lower varia-
tion and clearer concave shape), we recommend the standard CV procedure over the windowed one.
It is interesting to further examine how the cross-validation-based parameter selection behaves. It
is this key feature in combination with the guaranteed optimality of the SVM that makes it possible to
determine the best level of complexity. The cross-validation error curves for the range of complexity
constants is presented in Figure 15 for the chocolate manufacturer's dataset, Figure 16 for the toner
cartridge manufacturer's dataset and Figure 17 for Statistics Canada manufacturing dataset. In both
Figure 15 and Figure 16 there is a clear concave pattern that indicates that a complexity constant that
generalizes well, is identified without ambiguity. The optimal complexity is more difficult to identify
in Figure 17 because as the complexity increases, the error stays relatively low and stable. This is prob-
ably a result of the larger amount of data and less noise and so there is a range of complexity that may
generalize well. In all three figures, there is a clear distinction between complexity levels that general-
ize better than others thus permitting the selection of a complexity level. By contrast, if these figures
presented error lines that randomly moved up and down as the complexity constant varied, this would
indicate that the cross-validation (CV) procedure was not providing any value.
Alternatives
In the case of the chocolate manufacturer's dataset, we found that the next best performing algorithms
that were better than exponential smoothing were the Super Wide multiple lnear regression (MLR) and
the Super Wide artificial neural networks (ANN) with cross-validation based early stopping. In analyz-
Figure 15. Complexity optimization CV error on chocolate man dataset
Complexity optimization CV error on chocolate manufacturer's
dataset
19100
19000
18900
18800
18700
18600
18500
18400
Complexity
Search WWH ::




Custom Search