Theoretical Foundations of Information Visualization - Information Visualization: Human-Centered Issues and Perspectives

Database Reference

In-Depth Information

3.2

Measuring the Amount of Information

There have been many efforts to date to quantify the amount of information in a

communication stream. If we think of plain text, there are numerous quantifiable

features, including:

-

The total number of words per minute

-

The occurrence of specific words

-

The frequency of occurrence for each word

-

The occurrence of word pairs, triples, phrases, and sentences.

There are problems, however, with such simplistic, syntax-only measurement. Words

can have variable significance; some are unnecessary or redundant. Many words can

encode the same concept. In fact, reading text or hearing speech may have no affect

on one's uncertainty regarding the subject of the text, e.g., you may already have

known it, or you don't understand the meaning of the words or their implied concepts.

This implies that the measurement of information content or volume can be specific to

the individual receiver and, as we'll see later, the task that is being performed based

on the communication.

Can we perform similar analysis on a dataset? Consider a table of numeric values.

Features of potential interest in the dataset include:

-

The count of number of entries or dimensions

-

The values

-

Clusters and their attributes (number, size, relations, …)

-

Trends and their attributes (size, rate of change, …)

-

Outliers and their attributes (number, degree of outlierness, relation to dense

regions, …)

-

Associations, correlations and any features between records, dimensions, or

individual values.

In fact, we can observe that a featureless dataset is not differentiable from random

noise: all values are equally likely. Features and relations can also vary in their mag-

nitude, certainty, complexity, and importance. Clusters may differ in size; outliers

may vary in their distance to the main body of data; features may be comprised of

many sub-features; in many cases, a feature that is significant to one observer may be

considered noise by another. Recently, researchers have proposed measuring and

counting insights [9], which are new knowledge gained during visual analysis. These

insights are generally specific to a particular task, some of which include [10]:

- Identify data characteristics

- Locate boundaries, critical points, other features

- Distinguish regions of different characteristics

- Categorize or classify

- Rank based on some order

- Compare to find similarities and differences

- Associate into relations

- Correlate by classifying relations.

For each of these tasks, we might have different accuracy requirements as well, which

can influence the resolution at which feature extraction is accomplished during com-

Information Visualization: Human-Centered Issues and Perspectives

Search WWH ::

Custom Search

Home