Database Reference
In-Depth Information
Figure 3.17 (a) Scatterplot and (b) Hexbinplot of household income against
years of education
Although color and transparency can be used in a scatterplot to address this issue,
a hexbinplot is sometimes a better alternative. A hexbinplot combines the ideas of
scatterplot and histogram. Similar to a scatterplot, a hexbinplot visualizes data in
the x -axis and y -axis. Data is placed into hexbins, and the third dimension uses
shading to represent the concentration of data in each hexbin.
In Figure 3.17 ( b), the same data is plotted using a hexbinplot. The hexbinplot
shows that the data is more densely clustered in a streak that runs through the
center of the cluster, roughly along the regression line. The biggest concentration
is around 12 years of education, extending to about 15 years.
In Figure 3.17 , note the outlier data at MeanEducation=0 . These data points may
correspond to some missing data that needs further cleansing.
Assuming the two variables MeanHouseholdIncome and MeanEducation are
from a data frame named zcta , the scatterplot of Figure 3.17 (a) is plotted by the
following R code.
# plot the data points
plot(log10(MeanHouseholdIncome) ˜ MeanEducation, data=zcta)
# add a straight fitted line of the linear regression
abline(lm(log10(MeanHouseholdIncome) ˜ MeanEducation,
data=zcta), col='red')
Using the zcta data frame, the hexbinplot of Figure 3.17 ( b) is plotted by the
following R code. Running the code requires the use of the hexbin package, which
can be installed by running install .packages("hexbin") .
library(hexbin)
# "g" adds the grid, "r" adds the regression line
Search WWH ::




Custom Search