Image Processing Reference
In-Depth Information
these attributes interact during a single application or execution of the model. The
internal model parameters are the numeric controls on these behaviors, and are typi-
cally inferred from observations. In the census domain, the parameters might specify
the ways inwhich education level influences one's occupation. Finally, model outputs
refer to the values of summary measurements of interest, such as predicted income
level or the probability that an individual is in a particular occupation.
Predictive models can be constructed manually or can be learned from a col-
lection of example instances, identifying potentially complex relationships between
input attributes and output probabilities. Once these relationships are understood, a
predictive model can give the probability of different outcomes, given the known
values of input attributes. A predictive model might output the probability that an
individual with particular attributes will be in a high-income bracket. Predictive mod-
els can be constructed using classification mechanisms (which group observations
into one of a small, discrete number of classes), regression techniques (which fit
mathematical relationships between attributes and continuous outcomes), or density
estimation methods (which build probabilistic models that capture the distribution
and relationships among objects within a domain of interest).
It is straightforward to compute and then visualize a single model output for a
particular set of attribute values. Inmany cases, however, it is important to understand
model predictions more broadly. Understanding the overall behavior of the model
across the range of possible attribute values is important for understanding the model
as a whole. Inspecting single predictions is a very slow and inefficient way to develop
this broader understanding. Rather, a summary analysis or visualization that can
convey individual predictions or probabilistic distributions of predictions across all
sample locations would provide valuable insight into the overall model behavior.
We have identified four core discovery tasks, corresponding to four categories of
questions that an analyst may wish to answer:
￿
What are the predicted outcomes associated with specific input attribute values,
or with a region of the input space?
￿
What predictions and errors does the model make in input regions in which little
training data is available?
￿
Which input values or regions result in low-confidence and/or incorrect predic-
tions?
￿
Where and how should model refinement efforts (e.g., data gathering or label
correction) be concentrated?
6.3 Approach
The framework that we are developing is implemented as a pipeline, constructed of
a series of computational steps that “flow” from training data, through model con-
struction, to visualization and interaction. The framework is intended to be domain-
independent and applicable to a wide range of classification problems.
 
Search WWH ::




Custom Search