Database Reference
In-Depth Information
}
public double
y(
double
[] x) {
double
y = 0;
for
(
int
i=0;i<B.length;i++) y += B[i]*x[i];
return
y;
}
}
“Fitting” these models means finding appropriate values of
B
given that the
values of
x
are observed along with
y
, which can be interpreted as a noisy
version of . In the original formulation this noise is normally distributed
(normal distributions are discussed in Chapter 9) with a mean of zero and
a variance of
s
. This is done by minimizing the square of the difference
between the observed values of
y
and the values of given
x
returned by the
y
method of the
LinearModel
class.
In other words, the goal is to find an array for
B
that returns the least sum of
squares or “least squares error,” given in the following function:
public double error(double[] y,double[][] x) {
double error = 0.0;
for(int i=0;i<y.length;i++) {
double diff = y[i] - y(x[i]);
error += diff*diff;
}
return error;
}
This error is also known as the residual sum of squares (RSS) and is often
used to determine how well a model fits the data. Inspecting the individual
elements of the error (usually without the square) is used to determine
whether or not the model is well specified. Trends or a periodic signal in the
data is often a sign that the model is missing a term.
Simple Linear Regression
In the case that the
x
array is only ever two values, with the first value being
the constant 1 (known as the intercept term), there is a simple closed form
solution for the two values of the
B
array. When this happens, the first value