The Recipes - Getting Started with R

Information Technology Reference

In-Depth Information

Discussion

Multiple linear regression is the obvious generalization of simple linear regression. It

allows multiple predictor variables instead of one predictor variable and still uses OLS

to compute the coefficients of a linear equation. The three-variable regression just

shown corresponds to this linear model:

y i = β 0 + β 1 u i + β 2 v i + β 3 w i + ε i

R uses the lm function for both simple and multiple linear regression. You simply add

more variables to the righthand side of the model formula. The output then shows the

coefficients of the fitted model:

> lm(y ~ u + v + w)

Call:

lm(formula = y ~ u + v + w)

Coefficients:

(Intercept) u v w

1.4222 1.0359 0.9217 0.7261

The data parameter of lm is especially valuable when the number of variables increases,

since it's much easier to keep your data in one data frame than in separate variables.

Suppose your data is captured in a data frame, such as the dfrm variable shown here:

> dfrm

y u v w

1 6.584519 0.79939065 2.7971413 4.366557

2 6.425215 -2.31338537 2.7836201 4.515084

3 7.830578 1.71736899 2.7570401 3.865557

4 2.757777 1.27652888 0.4191765 2.547935

5 5.794566 0.39643488 2.3785468 3.265971

6 7.314611 1.82247760 1.8291302 4.518522

7 2.533638 -1.34186107 2.3472593 2.570884

8 8.696910 0.75946803 3.4028180 4.442560

9 6.304464 0.92000133 2.0654513 2.835248

10 8.095094 1.02341093 2.6729252 3.868573

.

. (etc.)

.

When we supply dfrm to the data parameter of lm , R looks for the regression variables

in the columns of the data frame:

> lm(y ~ u + v + w, data=dfrm)

Call:

lm(formula = y ~ u + v + w, data = dfrm)

Coefficients:

(Intercept) u v w

1.4222 1.0359 0.9217 0.7261

Search WWH ::

Custom Search

Home