Which attributes most affect the outcome of my prediction?
Which attributes contribute most to defining good clusters? Which
attributes should I eliminate when building a model? These are some
of the questions answered by attribute importance for both super-
vised and unsupervised learning.
Business analysts often want to know what factors, or predictor
attributes, most influence an outcome, such as a customer's decision to
churn, buy a product, or respond to a campaign. Knowing which
attributes most influence an outcome enables business analysts to focus
their attention on the data most relevant to their problem, perhaps
when querying data, manipulating it in an OLAP cube, or building a
model. Attribute importance can identify where greater effort should
be made to ensure the accuracy of certain data. Similarly, it can identify
the attributes that do not contribute useful information to model
building and consequently these attributes can be eliminated from
the build data. Attribute importance results may influence what data
is maintained in the data warehouse, or what data is purchased from
third-party data providers.
Consider a company that purchases data from a third-party
supplier. This data may be quite rich, consisting of hundreds if not
thousands of attributes. But, which ones are most useful for data
mining? Since data can be expensive to purchase, instead of purchas-
ing as much as possible, a business analyst may choose a relatively
small sample of data with a wide range of attributes. Using attribute
importance, the analyst can determine which of the attributes are
most useful for building models to solve particular problems. Then,
only those attributes that add value to the accuracy of the models
need to be purchased for the remaining cases.
As noted above, attribute importance can assist in determining
which attributes are most relevant for building a model. Eliminat-
ing unnecessary attributes in the build data can reduce model
building time. If fewer attributes are used to build a model, fewer are
required to apply that model, hence scoring will be faster as well.
Studies have shown that eliminating “noise” attributes from data can
also improve model accuracy or quality. Noise attributes are those
reported by attribute importance as not contributing to the model, or
actually reducing model quality.
Attribute importance produces a model that ranks attributes
according to how each attribute contributes to model quality, for