Graphics Reference
In-Depth Information
Figure . . GUIDE model for log
as linear predictor in each node. At each
branch, a case goes to the let child node if and only if the given condition is satisfied. he sample mean
of log
(
MEDV
)
using log
(
DIS
)
is printed beneath each leaf node. A blue leaf node indicates a slope coe cient greater
than . . Correspondingly, a red leaf node is associated with a slope coe cient less than
(
MEDV
)
.
the other variables. his explains the contradiction between the sign of the multiple
linear regression coe cient of log
and that of its marginal correlation. Clearly,
a multiple linear regression coe cient is, at best, an average of several conditional
simple linear regression coe cients.
Figure . explains the situation graphically by showing the data and the re-
gression lines and their associated data points using blue triangles and red circles for
observations associated with slopes greater than . and less than
(
DIS
)
. , respectively,
and green crosses for the others. he plot shows that, ater we allow for the effects of
the other variables, log
generally has little effect on median houseprice, except
infourgroupsofcensustracts(trianglesandcircles)thatarelocatedrelativelycloseto
employmentcenters(log
(
DIS
)
).AccordingtoFig. . ,thegroupsdenotedbyblue
triangles are quite similar. hey contain a large majority of the lower-priced tracts
and have high values of LSTAT and CRIM . he two groups composed of red circles,
on the other hand, are quite different from each other. One group contains tracts in
Beacon Hill and Back Bay, two high-priced Boston neighborhoods. he other group
contains tracts with DIS lying within a narrow range and with mostly below-average
MEDV values. Clearly, the regression coe cient of log
(
DIS
)<
in Table . cannot pos-
sibly reveal such details. Unfortunately, this problem is by no means rare. Friedman
and Wall ( ), for example, found a similar problem that involves different vari-
ables in a subset of these data.
(
DIS
)
Search WWH ::




Custom Search