Database Reference
In-Depth Information
to be on board, he would account for no more than about half a percent of the summed
heights of the group (and that one person would be incredibly uncomfortable, I would ima-
gine).
To get an empirical view of the normal distribution, let's look at another data set from base-
ball: runs batted in.
An Example of “Normal” Data
In baseball, a run batted in (RBI) is granted to a batter every time he enables a runner to
score during his at bat. A batter can earn more than one RBI during a single at bat; a grand
slam home run would result in 4 RBI—one for each of the runners on base, and one for the
batter himself. A player's RBI tally is an important batting statistic; as mentioned in the pre-
vious chapter, it's part of the “triple crown” along with batting average and home runs.
Batting statistics, including RBI, for the 2012 season are available online , and we can down-
load and connect to the data with Tableau. We can create a histogram as before, in order to
visualize the distribution of qualifying players' RBI during the 2012 season, as shown in Fig-
ure 6-2 .
At a quick glance, this distribution seems to approximate the normal curve quite well. It has
a mean of 73.69 and a median of 74—measures nearly identical in magnitude. In this case,
both of these measures are effective at communicating the “typical” number of RBIs earned
by qualifying batters (those who have roughly 500 at bats in a season).
The histogram is a helpful way to see the overall distribution of all of the players, but what if
we were interested in comparing the RBI tallies for players that play different positions on
the field? Players play one of nine positions on each baseball team. Which position had the
most RBI, on average? First basemen? Center fielders?
To make this comparison, we'll create a box plot.
Search WWH ::




Custom Search