Information Technology Reference
In-Depth Information
Discussion
Multiple linear regression
is the obvious generalization of simple linear regression. It
allows multiple predictor variables instead of one predictor variable and still uses OLS
to compute the coefficients of a linear equation. The three-variable regression just
shown corresponds to this linear model:
y
i
=
β
0
+
β
1
u
i
+
β
2
v
i
+
β
3
w
i
+
ε
i
R uses the
lm
function for both simple and multiple linear regression. You simply add
more variables to the righthand side of the model formula. The output then shows the
coefficients of the fitted model:
>
lm(y ~ u + v + w)
Call:
lm(formula = y ~ u + v + w)
Coefficients:
(Intercept) u v w
1.4222 1.0359 0.9217 0.7261
The
data
parameter of
lm
is especially valuable when the number of variables increases,
since it's much easier to keep your data in one data frame than in separate variables.
Suppose your data is captured in a data frame, such as the
dfrm
variable shown here:
>
dfrm
y u v w
1 6.584519 0.79939065 2.7971413 4.366557
2 6.425215 -2.31338537 2.7836201 4.515084
3 7.830578 1.71736899 2.7570401 3.865557
4 2.757777 1.27652888 0.4191765 2.547935
5 5.794566 0.39643488 2.3785468 3.265971
6 7.314611 1.82247760 1.8291302 4.518522
7 2.533638 -1.34186107 2.3472593 2.570884
8 8.696910 0.75946803 3.4028180 4.442560
9 6.304464 0.92000133 2.0654513 2.835248
10 8.095094 1.02341093 2.6729252 3.868573
.
.
(etc.)
.
When we supply
dfrm
to the
data
parameter of
lm
, R looks for the regression variables
in the columns of the data frame:
> lm(y ~ u + v + w, data=dfrm)
Call:
lm(formula = y ~ u + v + w, data = dfrm)
Coefficients:
(Intercept) u v w
1.4222 1.0359 0.9217 0.7261