Geoscience Reference
In-Depth Information
(
)
2
XX
x
1
ˆ
0
Yt
±
(
Residualmean square)1
+
+
2
n
In addition to assuming that the relationship of Y to X is linear, the above method of fitting
assumes that the variance of Y about the regression line is the same at all levels of X (the assumption
of homogeneous variance or homoscedasticity—that is, the property of having equal variances).
The fitting does not assume nor does it require that the variation of Y about the regression line follow
the normal distribution. However, the F test does assume normality, and so does the use of t for the
computation of confidence limits.
There is also an assumption of independence of the errors (departures from regression) of the
sample observations. The validity of this assumption is best ensured by selecting the sample units
at random. The requirements of independence may not be met if successive observations are made
on a single unit or if the units are observed in clusters. For example, a series of observations of tree
diameter made by means of a growth band would probably lack independence.
Selecting the sample units so as to get a particular distribution of the X values does not violate
any of the regression assumptions, provided the Y values are a random sample of all Y values associ-
ated with the selected values of X . Spreading the sample over a wide range of X values will usually
increase the precision with which the regression coefficients are estimated. This device must be
used with caution, however, for if the Y values are not random then the regression coefficients and
mean square residual may be improperly estimated.
7.17.2 m ultiple r egression
It frequently happens that a variable ( Y ) in which we are interested is related to more than one inde-
pendent variable. If this relationship can be estimated, it may enable us to make more precise pre-
dictions of the dependent variable than would be possible by a simple linear regression. This brings
us up against multiple regression, which describes the changes in a dependent variable associated
with changes in one or more independent variables; it is a little more work but no more complicated
than a simple linear regression.
The calculation methods can be illustrated with the following set of hypothetical data from an
environmental study relating the growth of even-aged loblolly-shortleaf pine stands to the total
basal area ( X 1 ), the percentage of the basal area in loblolly pine ( X 2 ), and loblolly pine site index ( X 3 ).
Y
X 1
X 2
X 3
65
41
79
75
78
90
48
83
85
53
67
74
50
42
52
61
55
57
52
59
59
32
82
73
82
71
80
72
66
60
65
66
113
98
96
99
86
80
81
90
104
101
78
86
92
100
59
88
96
84
84
93
65
72
48
70
81
55
93
85
Search WWH ::




Custom Search