Summarized Data Distributions - R Graphics

Graphics Reference

In-Depth Information

h <- ggplot(faithful, aes(x = waiting)) # Save the base object for reuse

h + geom_histogram(binwidth = 8 , fill = "white" , colour = "black" , origin = 31 )

h + geom_histogram(binwidth = 8 , fill = "white" , colour = "black" , origin = 35 )

Figure 6-3. Different appearance of histograms with the origin at 31 and 35

The results look quite different, even though they have the same bin size. The faithful data

set is not particularly small, with 272 observations; with smaller data sets, this is even more of

an issue. When visualizing your data, it's a good idea to experiment with different bin sizes and

boundary points.

Also, if your data has discrete values, it may matter that the histogram bins are asymmetrical.

They are closedon the lower bound and openon the upper bound. If you have bin boundaries at

1, 2, 3, etc., then the bins will be [1, 2), [2, 3), and so on. In other words, the first bin contains 1

but not 2, and the second bin contains 2 but not 3.

It is also possible to use geom_bar(stat="bin") for the same effect, although I find it easier to

interpret the code if it uses geom_histogram() .