Database Reference
In-Depth Information
# Only keep the premium and ideal cuts of diamonds
niceDiamonds <- diamonds[diamonds$cut=="Premium" |
diamonds$cut=="Ideal",]
summary(niceDiamonds$cut)
Fair Good Very Good Premium Ideal
0 0 0 13791 21551
# plot density plot of diamond prices
ggplot(niceDiamonds, aes(x=price, fill=cut)) +
geom_density(alpha = .3, color=NA)
# plot density plot of the log10 of diamond prices
ggplot(niceDiamonds, aes(x=log10(price), fill=cut)) +
geom_density(alpha = .3, color=NA)
As an alternative to ggplot2 , the lattice package provides a function called
densityplot() for making simple density plots.
3.2.4 Examining Multiple Variables
A scatterplot (shown previously in Figure 3.1 and Figure 3.5 ) is a simple and
widely used visualization for finding the relationship among multiple variables.
A scatterplot can represent data with up to five variables using x-axis, y-axis,
size, color, and shape. But usually only two to four variables are portrayed in a
scatterplot to minimize confusion. When examining a scatterplot, one needs to pay
close attention to the possible relationship between the variables. If the functional
relationship between the variables is somewhat pronounced, the data may roughly
lie along a straight line, a parabola, or an exponential curve. If variable y is related
exponentially to x , then the plot of x versus log( y ) is approximately linear. If the
plot looks more like a cluster without a pattern, the corresponding variables may
have a weak relationship.
The scatterplot in Figure 3.13 portrays the relationship of two variables: x and
y . The red line shown on the graph is the fitted line from the linear regression.
Linear regression will be revisited in Chapter 6, “Advanced Analytical Theory and
Methods: Regression.” Figure 3.13 shows that the regression line does not fit the
data well. This is a case in which linear regression cannot model the relationship
between the variables. Alternative methods such as the loess() function can
be used to fit a nonlinear line to the data. The blue curve shown on the graph
represents the LOESS curve, which fits the data better than linear regression.
Search WWH ::




Custom Search