Geoscience Reference
In-Depth Information
Because the calculated F is much greater than tabular F 0.01 with 1/60 degree of freedom, the regres-
sion is deemed significant at the 0.01 level.
Before we fitted a regression line to the data, Y had a certain amount of variation about its mean
( Y ). Fitting the regression was, in effect, an attempt to explain part of this variation by the linear
association of Y with X . But even after the line had been fitted, some variation was unexplained—
that of Y about the regression line. When we tested the regression line above, we merely showed that
the part of the variation in Y that is explained by the fitted line is significantly greater than the part
that the line left unexplained. The test did not show that the line we fitted gives the best possible
description of the data (a curved line might be even better), nor does it mean that we have found the
true mathematical relationship between the two variables. There is a dangerous tendency to ascribe
more meaning to a fitted regression than is warranted.
It might be noted that the residual sum of squares is equal to the sum of the squared deviations
of the observed values of Y from the regression line. That is,
(
) =
2
ˆ
(
)
2
Residual SS
=
YY
YabX
− −
The principle of least squares says that the best estimates of the regression coefficients ( a and b ) are
those that make this sum of squares a minimum.
7.17.1.2 Coefficient of Determination
The coefficient of determination, denoted R 2 , is used in the context of statistical models whose
main purpose is the prediction of future outcomes on the basis of other related information. Stated
differently, the coefficient of determination is a ratio that measures how well a regression fits the
sample data:
Reduction
To
SS
2 1115
2 7826
.
.
Coefficientofdetermination
=
=
=
0 758823
.
tal SS
When someone says, “76% of variation in Y was associated with X ,” she means that the coefficient
of determination was 0.76. Note that R 2 is most often seen as a number between 0 and 1.0, used to
describe how well a regression lien fit a set of data. An R 2 near 1.0 indicates that a regression line fits
the data well, while an R 2 closer to 0 indicates that a regression line does not fit the data very well.
The coefficient of determination is equal to the square of the correlation coefficient:
= (
)
(
)
2
2
∑∑
2
xy
x
xy
Reduction
Total
SS
2
=
) =
r
(
)(
y
2
SS
y
2
2
x
In fact, most present-day users of regression refer to R 2 values rather than to coefficients of
determination.
7.17.1.3 Confidence Intervals
Because it is based on sample data, a regression equation is subject to sample variation. Confidence
limits (i.e., a pair of numbers used to estimate a characteristic of a population) on the regression line
can be obtained by specifying several values over the range of X and computed by
(
)
2
XX
x
1
ˆ
0
Yt
±
(
Residualmean square)
+
2
n
Search WWH ::




Custom Search