Database Reference
In-Depth Information
Clustering as dimensionality reduction
The clustering models we covered in the previous chapter can also be used for a form of di-
mensionality reduction. This works in the following way:
• Assume that we cluster our high-dimensional feature vectors using a K-means
clustering model, with k clusters. The result is a set of k cluster centers.
• We can represent each of our original data points in terms of how far it is from
each of these cluster centers. That is, we can compute the distance of a data point
to each cluster center. The result is a set of k distances for each data point.
• These k distances can form a new vector of dimension k . We can now represent our
original data as a new vector of lower dimension, relative to the original feature di-
mension.
Depending on the distance metric used, this can result in both dimensionality reduction and
a form of nonlinear transformation of the data, allowing us to learn a more complex model
while still benefiting from the speed and scalability of a linear model. For example, using a
Gaussian or exponential distance function can approximate a very complex nonlinear fea-
ture transformation.
Search WWH ::




Custom Search