Databases Reference
In-Depth Information
type in summary(model) , which is the name we gave to this model, the
output would be:
summary (model)
Call:
lm(formula = y ~ x)
Residuals:
Min 1Q Median 3Q Max
-121.17 -52.63 -9.72 41.54 356.27
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -32.083 16.623 -1.93 0.0565 .
x 45.918 2.141 21.45 <2e-16 ***
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 77.47 on 98 degrees of freedom
Multiple R-squared: 0.8244, Adjusted R-squared: 0.8226
F-statistic: 460 on 1 and 98 DF, p-value: < 2.2e-16
R-squared
R 2 = 1− i y i y i 2
i yi y 2 . This can be interpreted as the proportion of
variance explained by our model. Note that mean squared error
is in there getting divided by total error, which is the proportion
of variance unexplained by our model and we calculate 1 minus
that.
p-values
Looking at the output, the estimated β s are in the column marked
Estimate. To see the p-values, look at Pr > t . We can interpret
the values in this column as follows: We are making a null hy‐
pothesis that the β s are zero. For any given β , the p-value captures
the probability of observing the data that we observed, and ob‐
taining the test-statistic that we obtained under the null hypothe‐
sis . This means that if we have a low p-value, it is highly unlikely
to observe such a test-statistic under the null hypothesis, and the
coefficient is highly likely to be nonzero and therefore significant.
Cross-validation
Another approach to evaluating the model is as follows. Divide
our data up into a training set and a test set: 80% in the training
and 20% in the test. Fit the model on the training set, then look at
the mean squared error on the test set and compare it to that on
the training set. Make this comparison across sample size as well.
Search WWH ::




Custom Search