Mining Functions and Algorithms - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

4.4

Attribute Importance

Which attributes most affect the outcome of my prediction?

Which attributes contribute most to defining good clusters? Which

attributes should I eliminate when building a model? These are some

of the questions answered by attribute importance for both super-

vised and unsupervised learning.

Business analysts often want to know what factors, or predictor

attributes, most influence an outcome, such as a customer's decision to

churn, buy a product, or respond to a campaign. Knowing which

attributes most influence an outcome enables business analysts to focus

their attention on the data most relevant to their problem, perhaps

when querying data, manipulating it in an OLAP cube, or building a

model. Attribute importance can identify where greater effort should

be made to ensure the accuracy of certain data. Similarly, it can identify

the attributes that do not contribute useful information to model

building and consequently these attributes can be eliminated from

the build data. Attribute importance results may influence what data

is maintained in the data warehouse, or what data is purchased from

third-party data providers.

Consider a company that purchases data from a third-party

supplier. This data may be quite rich, consisting of hundreds if not

thousands of attributes. But, which ones are most useful for data

mining? Since data can be expensive to purchase, instead of purchas-

ing as much as possible, a business analyst may choose a relatively

small sample of data with a wide range of attributes. Using attribute

importance, the analyst can determine which of the attributes are

most useful for building models to solve particular problems. Then,

only those attributes that add value to the accuracy of the models

need to be purchased for the remaining cases.

As noted above, attribute importance can assist in determining

which attributes are most relevant for building a model. Eliminat-

ing unnecessary attributes in the build data can reduce model

building time. If fewer attributes are used to build a model, fewer are

required to apply that model, hence scoring will be faster as well.

Studies have shown that eliminating “noise” attributes from data can

also improve model accuracy or quality. Noise attributes are those

reported by attribute importance as not contributing to the model, or

actually reducing model quality.

Attribute importance produces a model that ranks attributes

according to how each attribute contributes to model quality, for

Search WWH ::

Custom Search

Home