Database Reference
In-Depth Information
level, the null hypothesis would not be rejected. So, dropping the variable Gender
from the linear regression model should be considered. The following R code
provides the modified model results:
results2 <- lm(Income ˜ Age + Education, income_input)
summary(results2)
Call:
lm(formula = Income ˜ Age + Education, data = income_input)
Residuals:
Min 1Q Median 3Q Max
-36.889 -7.892 0.185 8.200 37.740
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.75822 1.92728 3.507 0.000467 ***
Age 0.99603 0.02057 48.412 < 2e-16 ***
Education 1.75860 0.11586 15.179 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
1
Residual standard error: 12.08 on 1497 degrees of freedom
Multiple R-squared: 0.6359, Adjusted R-squared: 0.6354
F-statistic: 1307 on 2 and 1497 DF, p-value: < 2.2e-16
Dropping the Gender variable from the model resulted in a minimal change to the
estimates of the remaining parameters and their statistical significances.
The last part of the displayed results provides some summary statistics and tests
on the linear regression model. The residual standard error is the standard
deviation of the observed residuals. This value, along with the associated degrees of
freedom, can be used to examine the variance of the assumed normally distributed
error terms. R-squared (R 2 ) is a commonly reported metric that measures the
variation in the data that is explained by the regression model. Possible values
of R 2 vary from 0 to 1, with values closer to 1 indicating that the model is better
at explaining the data than values closer to 0. An R 2 of exactly 1 indicates that
the model explains perfectly the observed data (all the residuals are equal to 0).
In general, the R 2 value can be increased by adding more variables to the model.
However, just adding more variables to explain a given dataset but not to improve
Search WWH ::




Custom Search