Graphics Reference
In-Depth Information
In order to avoid the aforementioned problems, a very typical transformation used
for DMmethods is tomap each nominal attribute to a set of newly generated attributes.
If N is the number of different values the nominal attribute has, we will substitute
the nominal variable with a new set of binary attributes, each one representing one of
the N possible values. For each instance, only one of the N newly created attributes
will have a value of 1, while the rest will have the value of 0. The variable having
the value 1 is the variable related to the original value that the old nominal attribute
had. This transformation is also referred in the literature as 1-to- N transformation.
As [ 30 ] and [ 28 ] state, the new set of attributes are linearly dependent. That means
that one of the attribute can be dismissed without loss of information as we can infer
the value of one of the new attributes by knowing the values of the rest of them. A
problemwith this kind of transformation appears when the original nominal attribute
has a large cardinality. In this case, the number of attributes generated will be large as
well, resulting in a very sparse data set which will lead to numerical and performance
problems.
3.5.9 Transformations via Data Reduction
In the previous sections, we have analyzed the processes to transform or create new
attributes from the existing ones. However, when the data set is very large, performing
complex analysis and DMcan take a long computing time. Data reduction techniques
are applied in these domains to reduce the size of the data set while trying to maintain
the integrity and the information of the original data set as much as possible. In this
way, mining on the reduced data set will be much more efficient and it will also
resemble the results that would have been obtained using the original data set.
The main strategies to perform data reduction are Dimensionality Reduction (DR)
techniques. They aim to reduce the number of attributes or instances available in
the data set. Well known attribute reduction techniques are Wavelet transforms or
Principal Component Analysis (PCA). Chapter 7 is devoted to attribute DR. Many
techniques can be found for reducing the dimensionality in the number of instances,
like the use of clustering techniques, parametric methods and so on. The reader
will find a complete survey of IS techniques in Chap. 8 . The use of binning and
discretization techniques is also useful to reduce the dimensionality and complexity
of the data set. They convert numerical attributes into nominal ones, thus drastically
reducing the cardinality of the attributes involved. Chapter 9 presents a thorough
presentation of these discretization techniques.
References
1. Agrawal, R., Srikant, R.: Searching with numbers. IEEE Trans. Knowl. Data Eng. 15 (4), 855-
870 (2003)
2. Berry, M.J., Linoff, G.: Data Mining Techniques: For Marketing, Sales, and Customer Support.
Wiley, New York (1997)
 
Search WWH ::




Custom Search