Graphics Reference
In-Depth Information
Figure
.
.
GUIDE model for log
as linear predictor in each node. At each
branch, a case goes to the let child node if and only if the given condition is satisfied. he sample mean
of log
(
MEDV
)
using log
(
DIS
)
is printed beneath each leaf node. A blue leaf node indicates a slope coe
cient greater
than
.
. Correspondingly, a red leaf node is associated with a slope coe
cient less than
(
MEDV
)
−
.
the other variables. his explains the contradiction between the sign of the multiple
linear regression coe
cient of log
and that of its marginal correlation. Clearly,
a multiple linear regression coe
cient is, at best, an average of several conditional
simple linear regression coe
cients.
Figure
.
explains the situation graphically by showing the data and the
re-
gression lines and their associated data points using blue triangles and red circles for
observations associated with slopes greater than
.
and less than
(
DIS
)
.
, respectively,
and green crosses for the others. he plot shows that, ater we allow for the effects of
the other variables, log
−
generally has little effect on median houseprice, except
infourgroupsofcensustracts(trianglesandcircles)thatarelocatedrelativelycloseto
employmentcenters(log
(
DIS
)
).AccordingtoFig.
.
,thegroupsdenotedbyblue
triangles are quite similar. hey contain a large majority of the lower-priced tracts
and have high values of
LSTAT
and
CRIM
. he two groups composed of red circles,
on the other hand, are quite different from each other. One group contains tracts in
Beacon Hill and Back Bay, two high-priced Boston neighborhoods. he other group
contains tracts with
DIS
lying within a narrow range and with mostly below-average
MEDV
values. Clearly, the regression coe
cient of log
(
DIS
)<
in Table
.
cannot pos-
sibly reveal such details. Unfortunately, this problem is by no means rare. Friedman
and Wall (
), for example, found a similar problem that involves different vari-
ables in a subset of these data.
(
DIS
)