Database Reference
In-Depth Information
4.1 Statistics
Statistics is a well grounded field but is composed of several subfields such
as descriptive statistics, classical statistics (also called confirmatory statistics),
Bayesian statistics and Exploratory Data Analysis. Information Visualization
is sometimes considered as a descendant and expansion of Exploratory Data
Analysis.
The differences between these subfields are the methods and the nature of
the answer they seek. All of them start with a problem and gathered data that is
related to the problem to solve. Classical analysis starts by designing a model of
the data, then uses mathematical analysis to test whether the model is refuted
or not by the data to conclude positively or negatively. The main challenge for
classical statistics is to find a model.
Exploratory Data Analysis performs an analysis using visual methods to
acquire insights of what the data looks like, usually to find a model. It uses
visual exploration methods to get the insights.
So why is visualization useful before the modeling? Because, there are cases
when we have no clear idea on the nature of the data and have no model.
To show why visualization can help finding a model, Anscombe in [1] has
designed four datasets that exhibit the same statistical profile but are quite dif-
ferent in shape, as shown in Figure 6. They have the following characteristics 12 :
- mean of the x values = 9.0
- mean of the y values = 7.5
- equation of the least-squared regression line is: y = 3 + 0.5x
- sums of squared errors (about the mean) = 110.0
- regression sums of squared errors (variance accounted for by x) = 27.5
- residual sums of squared errors (about the regression line) = 13.75
- correlation coecient = 0.82
- coecient of determination = 0.67.
Visualization is much more effective at showing the differences between these
datasets than statistics. Although the datasets are synthetic, Anscombe's Quar-
tet demonstrates that looking at the shape of the data is sometimes better than
relying on statistical characterizations alone.
4.2 Data Mining
More than statistics, the goal of data mining is to automatically find interesting
facts in large datasets. It is thus legitimate to wonder whether data mining, as a
competitor of InfoVis, can overcome and replace the visual capacity of humans.
This question has been addressed by Spence and Garrison in [22] where
they describe a simple plot called the Hertzsprung Russell Diagram (Figure 7a).
It represents the temperature of stars on the X axis and their magnitude on
the Y axis. Asking a person to summarize the diagram produces Figure 7b. It
12 See http://astro.swarthmore.edu/astro121/anscombe.html for details
Search WWH ::




Custom Search