Visualizing Uncertainty in Predictive Models - Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization

Image Processing Reference

In-Depth Information

The construction of a probabilistic model begins with a set of training instances for

a given domain. Each instance consists of a vector of attribute values (for a fixed set

of attributes that are associated with that particular domain) and a class label (which

may be a binary “yes/no” label, or may be one of a set of categorical values). We

are currently using the Weka machine learning toolkit [ 6 ] to construct models. The

model maps from an unlabeled instance (attribute vector) to a probability distribution

over class values (i.e., an assignment of a real-valued probability in the range

]

to each class value, such that the sum of the probabilities is one). Once the model has

been built, it can be used to generate predictions for both the training data and a set of

previously unseen test instances. The test instances also have associated class labels,

so they can be used to understand prediction errors on previously unseen instances.

To begin the visualization process, a dimension reduction method is applied to

a set of instances. This process results in a mapping from the high-dimensional

attribute space to a two-dimensional display space. Ideally, the dimension reduction

process will preserve important properties of the instance distribution, so that similar

instances appear near each other in the display space. Finally, a set of instances (which

could be the training instances, the test instances, both of these sets, or a new set

of sample data generated using the model) is displayed in the display space, using

glyph-based representations to show the probabilistic class predictions associated

with each instance. We have developed and are currently evaluating two alternative

glyph representations: pie charts and a “speckled” texturing.

[

0

,

1

6.3.1 Dimension Reduction

The first step in developing a model visualization is to project the high-dimensional

instance space into a two-dimensional display space. The most effective dimen-

sion reduction methods for continuous spaces, such as the ones we are interested

in, produce clusters or projections in two-dimensional space that are based on the

distribution and similarity of data instances in the higher dimensions. These meth-

ods include principal components analysis, multi-dimensional scaling [ 3 ], relevance

maps [ 1 ], and self-organizing maps [ 10 , 13 ].

The figures in this paper show visualizations that use two dimension reduction

methods: feature selection (orthogonal projection using two selected attributes as

axes) and principal components analysis (a statistical method for computing an

orthogonal projection using linear combinations of the original attributes). We are

also implementingmultidimensional scaling (a similarity-preserving iterative dimen-

sion reduction technique) and self-organizing maps (an iterative method based on

neural network learning).

Figure 6.1 shows two projections of an income prediction model in the census

domain. Test instances are shown with circular glyphs. In both images, individuals

who are predicted to make a high income are colored white, while those predicted to

make a low income are colored green. In the left image, the model is projected using

feature selection, with education level on the x axis and hours worked per week on

Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization

Search WWH ::

Custom Search

Home