Beyond Aggregation - Real-Time Analytics

Database Reference

In-Depth Information

}

public double y( double [] x) {

double y = 0;

for ( int i=0;i<B.length;i++) y += B[i]*x[i];

return y;

}

“Fitting” these models means finding appropriate values of B given that the

values of x are observed along with y , which can be interpreted as a noisy

version of . In the original formulation this noise is normally distributed

(normal distributions are discussed in Chapter 9) with a mean of zero and

a variance of s . This is done by minimizing the square of the difference

between the observed values of y and the values of given x returned by the y

method of the LinearModel class.

In other words, the goal is to find an array for B that returns the least sum of

squares or “least squares error,” given in the following function:

public double error(double[] y,double[][] x) {

double error = 0.0;

for(int i=0;i<y.length;i++) {

double diff = y[i] - y(x[i]);

error += diff*diff;

}

return error;

}

This error is also known as the residual sum of squares (RSS) and is often

used to determine how well a model fits the data. Inspecting the individual

elements of the error (usually without the square) is used to determine

whether or not the model is well specified. Trends or a periodic signal in the

data is often a sign that the model is missing a term.

Simple Linear Regression

In the case that the x array is only ever two values, with the first value being

the constant 1 (known as the intercept term), there is a simple closed form

solution for the two values of the B array. When this happens, the first value

Search WWH ::

Custom Search

Home