Database Reference
In-Depth Information
An Example of “Non-Normal” Data
The world we live in is full of non-normal data sets that are much more challenging to deal
with than those that approximate the Gaussian. If the Gaussian displays a mild form of vari-
ation, then non-normal data sets display a wild form of variation: skewed to either side, mul-
timodal, or having fat tails or distant outliers.
Distribution of wealth is notoriously non-normal. If the next flight overseas collected net
worth data from each of the passengers onboard, instead of height measurements, a small
number of passengers would account for a lion's share of the overall wealth. If the richest
person in the world were on board (as unlikely as that may seem), he would account for vir-
tually all of the wealth of the group. It may not be fair, but that's how it is. Very different
than the distribution of height, wouldn't you say?
Let's consider an example of the distribution of money from the world of sports. Salary data
for all of the 550 players in the US professional soccer league during 2012 can be found on-
line . Figure 6-10 shows a histogram of guaranteed compensation.
Figure 6-10. Histogram of 2012 U.S. professional soccer player salaries
Not exactly a bell-shaped curve, is it? The length of the x-axis scale isn't a mistake: those
tiny bars along the baseline are the nine players that earned over $1M in guaranteed com-
Search WWH ::




Custom Search