The Recipes - Getting Started with R

Information Technology Reference

In-Depth Information

1.23 Diagnosing a Linear Regression

Problem

You have performed a linear regression. Now you want to verify the model's quality

by running diagnostic checks.

Solution

Start by plotting the model object, which will produce several diagnostic plots:

> m <- lm(y ~ x)

> plot(m)

Next, identify possible outliers either by looking at the diagnostic plot of the residuals

or by using the outlier.test function of the car package:

> library(car)

> outlier.test(m)

Finally, identify any overly influential observations (by using the influence.measures

function, for example).

Discussion

R fosters the impression that linear regression is easy: just use the lm function. Yet fitting

the data is only the beginning. It's your job to decide whether the fitted model actually

works and works well.

Before anything else, you must have a statistically significant model. Check the F sta-

tistic from the model summary ( Recipe 1.22 ) and be sure that the p -value is small

enough for your purposes. Conventionally, it should be less than 0.05, or else your

model is likely meaningless.

Simply plotting the model object produces several useful diagnostic plots:

> m <- lm(y ~ x)

> plot(m)

Figure 1-7 shows diagnostic plots for a pretty good regression:

• The points in the Residuals vs Fitted plot are randomly scattered with no particular

pattern.

• The points in the Normal Q-Q plot are more-or-less on the line, indicating that

the residuals follow a normal distribution.

• In both the Scale-Location plot and the Residuals vs Leverage plots, the points are

in a group with none too far from the center.

Search WWH ::

Custom Search

Home