Hardware Reference
In-Depth Information
set that is accounted for by the statistical model. Usually, the higher R 2
the better is
R 2
the quality of fit (with 0
1).
Although adding parameters to the model can improve the R 2 , there is a risk
of exceeding the actual information content of the data, leading to arbitrariness in
the final model parameters (also called over-fitting ). This reduces the capability of
the model to generalize beyond the fitting data, while giving very good results on
training-data. In particular, this phenomenon occurs when the model is excessively
complex in relation to the amount of data available. A model which has been over-fit,
generally has poor predictive performance, as it can exaggerate minor fluctuations
in the training data.
To this purpose, we introduce an 'adjusted' definition of the R 2 . This terms adjusts
for the number of explanatory terms in a model; it increases only if the terms of the
model improve it more than expected by chance and will always be less than or equal
to R 2 ; it is defined as:
R 2 ) N
1
1
(1
(4.10)
N
p
where p is the total number of terms in the linear model (i.e., the set of coefficients
a , b , c ), while N is sample set size. Adjusted R 2
is particularly useful in the model
selection stage of model building.
Linear regression model selection. In order to understand the optimal number
of terms of the linear model and the corresponding model order, we analyzed the
behavior of the RSM cross-validation error and adjusted R 2 by varying the number
of random training samples (derived from the simulations of the target applications).
Equation 4.10 represents an improved measure of the overall quality of fit of the
linear regression: it is inversely proportional to the model's degrees of freedom (i.e.,
N
p ) which, in turn, depend on the order of the chosen polynomial ρ ( x ). As a
matter of fact, higher degrees of freedom increase the chance of reduced variance
of the model coefficients thus improving model stability while avoiding over-fitting
[ 9 , 10 ]. In this topic, our heuristic model selection tries to maximize the number
of degrees of freedom and, at the same time, to minimize (in the order of 200) the
number of simulations needed to build the model. Thus, as a “rule of thumb”, we
set a maximum number of terms (to increase the chance of good quality of fit) and
eventually, we limit the set of considered models to the following configurations:
1. First order model, without any interaction between parameters.
2. First order model, with interaction between parameters.
3. Second order model, without any interaction between parameters.
4.4.2
Radial Basis Functions
Radial basis functions (RBF) represent a widely used interpolation/approximation
model [ 13 ]. The interpolation function is built on a set of training configurations x k
Search WWH ::




Custom Search