Graphics Reference
In-Depth Information
In introduction statistics courses, you typically learn about analysis methods,
such as hypothesis testing, regression, and modeling, in a vacuum, because the
goal is to learn the math and concepts. But when you get to real-world data, the
goal shifts to information gathering. You shift from, “What is in the numbers?” to
“What does the data represent in the world; does it make sense; and how does
this relate to other data?”
A major mistake is to treat every dataset the same and use the same canned
methods and tools. Don't do that.
When: Most data is linked to time in some way in that it might be a time
series, or it's a snapshot from a specific period. In both cases, you have to
know when the data was collected. An estimate made decades ago does not
equate to one in the present. This seems obvious, but it's a common mistake
to take old data and pass it off as new because it's what's available. Things
change, people change, and places change, and so naturally, data changes.
Where: Things can change across cities, states, and countries just as they do
over time. For example, it's best to avoid global generalizations when the data
comes from only a few countries. The same logic applies to digital locations.
Data from websites, such as Twitter or Facebook, encapsulates the behavior
of its users and doesn't necessarily translate to the physical world.
Although the gap between digital and physical continues to shrink, the space
between is still evident. For example, an animated map that represented the
“history of the world” based on geotagged Wikipedia, showed popping dots for
each entry, in a geographic space. The end of the video is shown in Figure 1-26.
The result is impressive, and there is a correlation to the real-life timeline for
sure, but it's clear that because Wikipedia content is more prominent in English-
speaking countries the map shows more in those areas than anywhere else.
Why: Finally, you must know the reason data was collected, mostly as a sanity
check for bias. Sometimes data is collected, or even fabricated, to serve an
agenda, and you should be wary of these cases. Government and elections might
be the first thing that come to mind, but so-called information graphics around
the web, filled with keywords and published by sites trying to grab Google
juice, have also grown up to be a common culprit. (I fell for these a couple of
times in my early days of blogging for FlowingData, but I learned my lesson.)
Learn all you can about your data before anything else, and your analysis
and visualization will be better for it. You can then pass what you know on
to readers.
Search WWH ::




Custom Search