Regression by Parts: Fitting Visually Interpretable Models with GUIDE - Data Visualization

Graphics Reference

In-Depth Information

quan-

tifies the linear effect of the variable ater the linear effects of the other variables are

accounted for.On the other hand,the correlation of log

Toresolvethecontradiction,recallthattheregressioncoe cientoflog

(

DIS

)

with theresponsevari-

able ignores the effects of the other variables. Since it is important to take the other

variables intoconsideration, theregression coe cient maybeabettermeasureofthe

effect of log

(

DIS

)

. But this conclusion requires that the linear model assumption be

correct. Nonetheless, it is hard to explain the negative linear effect of log

(

DIS

)

(

DIS

)

when

we are faced with Fig. . .

heproblemofcontradictorysignsvanisheswhenthereisonlyoneregressorvari-

able. Although it can occur with two regressor variables, the di culty is diminished

because the fitted model can be visualized through a contour plot. For datasets that

contain more than two predictor variables, we propose a divide-and-conquer strat-

egy. Just as a prospective buyer inspects a house one room at a time, we propose to

partition the dataset into pieces such that a visualizable model involving one or two

predictors su ces for each piece. One di culty is that, unlike a house, there are no

predefined“rooms”or“walls”inadataset. Arbitrarily partitioning adataset makes as

much sense as arbitrarily slicing a house into several pieces. We need a method that

gives interpretable partitions of the dataset. Further, the number and kind of parti-

tions should bedictated bythe complexity of the dataset aswellas the typeof models

tobe fitted. Forexample, if a dataset isadequately described bya nonconstant simple

linear regression involving one predictorvariable and we fit a piecewise linear model

to it, then no partitioning is necessary. On the other hand, if we fit a piecewise con-

stant model to the same dataset, the number of partitions should increase with the

sample size.

he GUIDE regression tree algorithm (Loh, ) provides a ready solution to

these problems. GUIDE can recursively partition a dataset and fit a constant, best

polynomial, or multiple linear model to the observations in each partition. Like the

earlier CART algorithm (Breiman et al., ), which fits piecewise constant models

only, GUIDE first constructs a nested sequence of tree-structured models and then

uses cross-validation to select the smallest one whose estimated mean prediction de-

viance lies within a short range of the minimum estimate. But unlike CART, GUIDE

employslack-of-fittestsoftheresidualstochooseavariable topartition ateachstage.

Asaresult, itdoes nothave the selection bias ofCART andother algorithms that rely

solely on greedy optimization.

To demonstrate a novel application of GUIDE, we use it to study the linear ef-

fect of log

ater controlling for the effects of the other variables, without mak-

ing the linear model assumption. We do this by constructing a GUIDE regression

tree in which log

(

DIS

)

is the sole linear predictor in each partition or node of the

tree. he effects of the other predictor variables, which need not be transformed, can

be observed through the splits at the intermediate nodes. Figure . shows the tree,

which splits the data into nodes. he regression coe cients are between

(

DIS

)

. and

. in all but four leaf nodes. hese nodes are colored red (for slope less than

−

. )

and blue (for slope greater than . ). We choose the cutoff values of

. because

the coe cient of log

in Table . is . . he tree shows that the linear effect of

(

DIS

)

log

is neither always positive nor always negative - it depends on the values of

(

DIS

)

Data Visualization

Search WWH ::

Custom Search

Home