Database Reference
In-Depth Information
prediction interval on the Income for a 41-year-old person with 12 years of
education is obtained as follows:
pred_int_pt <-
predict(results2,new_pt,level=.95,interval="prediction")
pred_int_pt
fit lwr upr
1 68.69884 44.98867 92.40902
Again, the expected income is $68,699. However, the 95% prediction interval is
($44,988, $92,409). If the reason for this much wider interval is not obvious, recall
that in Figure 6.3 , for a particular input variable value, the expected outcome falls
on the regression line, but the individual observations are normally distributed
about the expected outcome. The confidence interval applies to the expected
outcome that falls on the regression line, but the prediction interval applies to an
outcome that may appear anywhere within the normal distribution.
Thus, in linear regression, confidence intervals are used to draw inferences on
the population's expected outcome, and prediction intervals are used to draw
inferences on the next possible outcome.
6.1.3 Diagnostics
The use of hypothesis tests, confidence intervals, and prediction intervals is
dependent on the model assumptions being true. The following discussion
provides some tools and techniques that can be used to validate a fitted linear
regression model.
Evaluating the Linearity Assumption
A major assumption in linear regression modeling is that the relationship between
the input variables and the outcome variable is linear. The most fundamental way
to evaluate such a relationship is to plot the outcome variable against each input
variable. In the Income example, such scatterplots were generated in Figure 6.4 .
If the relationship between Age and Income is represented as illustrated in Figure
6.5 , a linear model would not apply. In such a case, it is often useful to do any of
the following:
• Transform the outcome variable.
• Transform the input variables.
Search WWH ::




Custom Search