The Value of Information Visualization - Information Visualization: Human-Centered Issues and Perspectives

Database Reference

In-Depth Information

4.1 Statistics

Statistics is a well grounded field but is composed of several subfields such

as descriptive statistics, classical statistics (also called confirmatory statistics),

Bayesian statistics and Exploratory Data Analysis. Information Visualization

is sometimes considered as a descendant and expansion of Exploratory Data

Analysis.

The differences between these subfields are the methods and the nature of

the answer they seek. All of them start with a problem and gathered data that is

related to the problem to solve. Classical analysis starts by designing a model of

the data, then uses mathematical analysis to test whether the model is refuted

or not by the data to conclude positively or negatively. The main challenge for

classical statistics is to find a model.

Exploratory Data Analysis performs an analysis using visual methods to

acquire insights of what the data looks like, usually to find a model. It uses

visual exploration methods to get the insights.

So why is visualization useful before the modeling? Because, there are cases

when we have no clear idea on the nature of the data and have no model.

To show why visualization can help finding a model, Anscombe in [1] has

designed four datasets that exhibit the same statistical profile but are quite dif-

ferent in shape, as shown in Figure 6. They have the following characteristics 12 :

- mean of the x values = 9.0

- mean of the y values = 7.5

- equation of the least-squared regression line is: y = 3 + 0.5x

- sums of squared errors (about the mean) = 110.0

- regression sums of squared errors (variance accounted for by x) = 27.5

- residual sums of squared errors (about the regression line) = 13.75

- correlation coecient = 0.82

- coecient of determination = 0.67.

Visualization is much more effective at showing the differences between these

datasets than statistics. Although the datasets are synthetic, Anscombe's Quar-

tet demonstrates that looking at the shape of the data is sometimes better than

relying on statistical characterizations alone.

4.2 Data Mining

More than statistics, the goal of data mining is to automatically find interesting

facts in large datasets. It is thus legitimate to wonder whether data mining, as a

competitor of InfoVis, can overcome and replace the visual capacity of humans.

This question has been addressed by Spence and Garrison in [22] where

they describe a simple plot called the Hertzsprung Russell Diagram (Figure 7a).

It represents the temperature of stars on the X axis and their magnitude on

the Y axis. Asking a person to summarize the diagram produces Figure 7b. It

12 See http://astro.swarthmore.edu/astro121/anscombe.html for details

Search WWH ::

Custom Search

Home