Database Reference
In-Depth Information
The output provides details about the coefficients. The column
Estimate
provides
the OLS estimates of the coefficients in the fitted linear regression model. In
general, the
(Intercept)
corresponds to the estimated response variable when
all the input variables equal zero. In this example, the intercept corresponds to
an estimated income of $7,263 for a newborn female with no education. It is
important to note that the available dataset does not include such a person. The
minimum age and education in the dataset are 18 and 10 years, respectively.
Thus, misleading results may be obtained when using a linear regression model to
estimate outcomes for input values not representative within the dataset used to
train the model.
The coefficient for
Age
is approximately equal to one. This coefficient is
interpreted as follows: For every one unit increase in a person's age, the person's
income is expected to increase by $995. Similarly, for every unit increase in a
person's years of education, the person's income is expected to increase by about
$1,758.
Interpreting the
Gender
coefficient is slightly different. When
Gender
is equal to
zero, the
Gender
coefficient contributes nothing to the estimate of the expected
income. When
Gender
is equal to one, the expected
Income
is decreased by about
$934.
Because the coefficient values are only estimates based on the observed incomes
in the sample, there is some uncertainty or sampling error for the coefficient
estimates. The
Std. Error
column next to the coefficients provides the sampling
error associated with each coefficient and can be used to perform a hypothesis test,
using the
t
-distribution, to determine if each coefficient is statistically different
from zero. In other words, if a coefficient is not statistically different from zero,
the coefficient and the associated variable in the model should be excluded from
the model. In this example, the associated hypothesis tests' p-values,
Pr(>|t|)
,
are very small for the
Intercept
,
Age
, and
Education
parameters. As seen in
Chapter 3, a small p-value corresponds to a small probability that such a large
t
value would be observed under the assumptions of the null hypothesis. In this case,
for a given j = 0, 1, 2, …, p - 1, the null and alternate hypotheses follow:
For small p-values, as is the case for the
Intercept
,
Age
, and
Education
parameters, the null hypothesis would be rejected. For the
Gender
parameter, the
corresponding p-value is fairly large at 0.13. In other words, at a 90% confidence