Database Reference
In-Depth Information
library(ggplot2)
# plot the jittered scatterplot w/ boxplot
# color-code points with zip codes
# the outlier.size=0 prevents the boxplot from plotting the
outlier
ggplot(data=DF, aes(x=as.factor(Zip1),
y=log10(MeanHouseholdIncome))) +
geom_point(aes(color=factor(Zip1)), alpha=0.2,
position="jitter") +
geom_boxplot(outlier.size=0, alpha=0.1) +
guides(colour=FALSE) +
ggtitle ("Mean Household Income by Zip Code")
Alternatively, one can create a simple box-and-whisker plot with the boxplot()
function provided by the R base package.
Hexbinplot for Large Datasets
This chapter has shown that scatterplot as a popular visualization can visualize
data containing one or more variables. But one should be careful about using it
on high-volume data. If there is too much data, the structure of the data may
become difficult to see in a scatterplot. Consider a case to compare the logarithm
of household income against the years of education, as shown in Figure 3.17 . The
cluster in the scatterplot on the left (a) suggests a somewhat linear relationship
of the two variables. However, one cannot really see the structure of how the
data is distributed inside the cluster. This is a Big Data type of problem. Millions
or billions of data points would require different approaches for exploration,
visualization, and analysis.
Search WWH ::




Custom Search