Database Reference
In-Depth Information
REDUCING THE DIMENSIONALITY OF DATA WITH DATA REDUCTION
TECHNIQUES
As their name implies, data reduction techniques aim at effectively reducing the
data's dimensions and removing redundant information. They do so by replacing the
initial set of fields with a core set of compound measures which simplify subsequent
modeling while retaining most of the information of the original attributes.
Factor analysis and PCA are among the most popular data reduction tech-
niques. They are unsupervised, statistical techniques which deal with continuous
input attributes. These attributes are analyzed and mapped to representative fields,
named factors or components. The procedure is illustrated in Figure 2.12.
Factor analysis and PCA are based on the concept of linear correlation. If
certain continuous fields/attributes tend to covary then they are correlated. If their
relationship is expressed adequately by a straight line then they have a strong
linear correlation. The scatterplot in Figure 2.13 depicts the monthly average SMS
and MMS (Multimedia Messaging Service) usage for a group of mobile telephony
customers.
As seen in the scatterplot, most customer points cluster around a straight line
with a positive slope that slants upward to the right. Customers with increased SMS
Figure 2.12 Data reduction techniques.
Search WWH ::




Custom Search