Database Reference
In-Depth Information
These variables are fundamentally different in the type of distribution their values form. The
former demonstrates the Gaussian properties of the bell-shaped curve, while the latter is
characterized by extreme outliers that make up the bulk of the total amount.
The importance of this distinction and what it means to the way we deal with data has been
articulated by Benoit Mandelbrot, and more recently by Nicholas Nassim Taleb. In a nut-
shell, applying measures of variance like standard deviation to describe and predict phenom-
ena that are decidedly non-normal—such as just about any parameter in the world of eco-
nomics and finance—is fraught with error and should be avoided. It's the wrong tool for the
job.
Visualizing Variation
One way to respect the variation inherent in our data is to show it. Merely showing averages
yields an overly simplistic picture of the world. Not every person in a country possesses the
most common physical traits in that country. So, too, not every value in a data set is equal to
the mean, median, or mode. If we only show the most typical value, then we rob our audien-
ce of an appreciation of the rich texture to be found in the subject at hand.
If we consider once again the number of strikeouts per nine innings in professional baseball
over the past 100 years, we can show a simple line plot of average strikeouts per nine in-
nings, as shown in Figure 7-3 .
Search WWH ::




Custom Search