Visualizing Uncertainty in Predictive Models - Scientific Visualization: Uncertainty, Multifield, Biomedical, and Scalable Visualization

Image Processing Reference

In-Depth Information

these attributes interact during a single application or execution of the model. The

internal model parameters are the numeric controls on these behaviors, and are typi-

cally inferred from observations. In the census domain, the parameters might specify

the ways inwhich education level influences one's occupation. Finally, model outputs

refer to the values of summary measurements of interest, such as predicted income

level or the probability that an individual is in a particular occupation.

Predictive models can be constructed manually or can be learned from a col-

lection of example instances, identifying potentially complex relationships between

input attributes and output probabilities. Once these relationships are understood, a

predictive model can give the probability of different outcomes, given the known

values of input attributes. A predictive model might output the probability that an

individual with particular attributes will be in a high-income bracket. Predictive mod-

els can be constructed using classification mechanisms (which group observations

into one of a small, discrete number of classes), regression techniques (which fit

mathematical relationships between attributes and continuous outcomes), or density

estimation methods (which build probabilistic models that capture the distribution

and relationships among objects within a domain of interest).

It is straightforward to compute and then visualize a single model output for a

particular set of attribute values. Inmany cases, however, it is important to understand

model predictions more broadly. Understanding the overall behavior of the model

across the range of possible attribute values is important for understanding the model

as a whole. Inspecting single predictions is a very slow and inefficient way to develop

this broader understanding. Rather, a summary analysis or visualization that can

convey individual predictions or probabilistic distributions of predictions across all

sample locations would provide valuable insight into the overall model behavior.

We have identified four core discovery tasks, corresponding to four categories of

questions that an analyst may wish to answer:

What are the predicted outcomes associated with specific input attribute values,

or with a region of the input space?

What predictions and errors does the model make in input regions in which little

training data is available?

Which input values or regions result in low-confidence and/or incorrect predic-

tions?

Where and how should model refinement efforts (e.g., data gathering or label

correction) be concentrated?

6.3 Approach

The framework that we are developing is implemented as a pipeline, constructed of

a series of computational steps that “flow” from training data, through model con-

struction, to visualization and interaction. The framework is intended to be domain-

independent and applicable to a wide range of classification problems.

Search WWH ::

Custom Search

Home