Database Reference
In-Depth Information
munication. Thus, for example, the tasks of detecting, classifying, and measuring a
particular phenomenon each have their own accuracy demands. The tasks to be per-
formed also have an implication on the types of information that the visualization
must be able to convey; categorization and ranking imply that the visualization must
have high selective information content, while identifying characteristics and bounda-
ries are part of building a mental model and thus require good descriptive information
content.
Returning to our dataset and the simplistic features and relations that are contained
in it, we can try to quantify the volume of information and then measure how much of
this volume a visualization technique is capable of effectively conveying. If we as-
sume a table of scalar values (M records, N dimensions or variables), the number of
individual values to be communicated is M*N, and the maximum resolution required
is the number of significant digits. Often, however, the available visual resolution is
far less than that of the data. We can then count all the pairwise relations between
records, or dimensions, or even values. For records, this would be M*(M−1)/2, and
similar for dimensions and values. Then there are relations that are 3-way, 4-way, or
even among an arbitrary number of elements, e.g., in clustering tasks. Clearly, there
are too many possibilities to consider them all, so perhaps we need a different tactic.
3.3
Measuring Information Loss
Perhaps it is easier to measure loss of information (entropy) during the visualization
process than the total information content of a dataset. There are several techniques in
common use for data transformation for visualization that provide an implicit measure
of information loss. For example, multidimensional scaling, a process commonly used
for dimensionality reduction, provides a measure of stress, which is the difference
between the distances between points in the original dimensioned space and the corre-
sponding distances in the reduced dimension space. Similarly, when using principal
component analysis for performing this reduction, the loss can be measured from the
dropped components. Cui et al. [11] developed measures of representativeness when
using processes such as sampling and clustering to reduce the number of data records
in the visualization. These measures, based on nearest neighbor computations, histo-
gram comparisons, and statistical properties, give the analysts control over what was
termed abstraction quality, so they are aware of the trade-offs between speed of ren-
dering, display clutter, and information loss. They, however, did not consider the
perceptual issues, which are very dependent on the particular visual encoding used.
Distortion techniques such as lens effects and occlusion reduction also provide the
analyst with trade-offs between accuracy and visual clarity. Each results in a trans-
formation (typically of an object's position on the screen) that is meant to improve the
local interpretability at the cost of accuracy of global relations. It would be interesting
to see measures of these competing processes to gauge the overall implications.
Another transformation that can impact on the information being communicated in
a visualization is the ordering of records and dimensions. Ordering can reveal trends,
associations, and other types of relations, and is useful for many tasks. There are many
possible orderings of a table of M records and N dimensions. The key is to determine
which are the most useful. An ordering can convey many pairwise relations. If there
are M records, an ordering can communicate M-1 of the M*(M−1)/2 possible pair-
Search WWH ::




Custom Search