Database Reference
In-Depth Information
2 denote high voice usage, in terms of both frequency and duration of calls. The
generated scores can then be used in subsequent modeling tasks.
The interpretation of the derived components is an essential part of the data
reduction procedure. Since the derived components will be used in subsequent
tasks, it is important to fully understand the information they convey. Although
there are many formal criteria for selecting the number of factors to be retained,
analysts should also examine their business meaning and only keep those that
comprise interpretable and meaningful measures.
Simplicity is the key benefit of data reduction techniques, since they drastically
reduce the number of fields under study to a core set of composite measures.
Some data mining techniques may run too slow or not at all if they have to handle
a large number of inputs. Situations like these can be avoided by using the derived
component scores instead of the original fields. An additional advantage of data
reduction techniques is that they can produce uncorrelated components. This is
one of the main reasons for applying a data reduction technique as a preparatory
step before other models. Many predictive modeling techniques can suffer from
the inclusion of correlated predictors, a problem referred to as multicollinearity.
By substituting the correlated predictors with the extracted components we can
eliminate collinearity and substantially improve the stability of the predictive
model. Additionally, clustering solutions can also be biased if the inputs are
dominated by correlated ''variants'' of the same attribute. By using a data reduction
technique we can unveil the true data dimensions and ensure that they are of equal
weight in the formation of the final clusters.
In the next chapter, we will revisit data reduction techniques and present
PCA in detail.
FINDING ''WHAT GOES WITH WHAT'' WITH ASSOCIATION
OR AFFINITY MODELING TECHNIQUES
When browsing a bookstore on the Internet you may have noticed recommen-
dations that pop up and suggest additional, related products for you to consider:
''Customers who have bought this topic have also bought the following topics.''
Most of the time these recommendations are quite helpful, since they take into
account the recorded preferences of past customers. Usually they are based on
association or affinity data mining models.
These models analyze past co-occurrences of events, purchases, or attributes
and detect associations. They associate a particular outcome category, for instance
a product, with a set of conditions, for instance a set of other products. They
are typically used to identify purchase patterns and groups of products purchased
together.
Search WWH ::




Custom Search