Database Reference
In-Depth Information
• The entire dataset is randomly split into N datasets of approximately equal
size.
• A model is trained against N - 1 of these datasets and tested against the
remaining dataset. A measure of the model error is obtained.
• This process is repeated a total of N times across the various combinations
of N datasets taken N - 1 at a time. Recall:
• The observed N model errors are averaged over the N folds.
The averaged error from one model is compared against the averaged error from
another model. This technique can also help determine whether adding more
variables to an existing model is beneficial or possibly overfitting the data.
Other Diagnostic Considerations
Although a fitted linear regression model conforms with the preceding diagnostic
criteria, it is possible to improve the model by including additional input variables
not yet considered. In the previous Income example, only three possible input
variables— Age , Education , and Gender —were considered. Dozens of other
additional input variables such as Housing or Marital_Status may improve
the fitted model. It is important to consider all possible input variables early in the
analytic process.
As mentioned earlier, in reviewing the R output from fitting a linear regression
model, the adjusted R 2 applies a penalty to the R 2 value based on the number of
parameters added to the model. Because the R 2 value will always move closer to
one as more variables are added to an existing regression model, the adjusted R 2
value may actually decrease after adding more variables.
The residual plots should be examined for any outliers , observed points that
are markedly different from the majority of the points. Outliers can result from
bad data collection, data processing errors, or an actual rare occurrence. In the
Income example, suppose that an individual with an income of a million dollars
was included in the dataset. Such an observation could affect the fitted regression
model, as seen in one of the examples of Anscombe's Quartet.
Search WWH ::




Custom Search