Database Reference
In-Depth Information
4.3 Additional Algorithms
The k-means clustering method is easily applied to numeric data where the concept
of distance can naturally be applied. However, it may be necessary or desirable
to use an alternative clustering algorithm. As discussed at the end of the previous
section, k-means does not handle categorical data. In such cases, k-modes [3] is
a commonly used method for clustering categorical data based on the number of
differences in the respective components of the attributes. For example, if each
object has four attributes, the distance from (a, b, e, d) to (d, d, d, d) is 3. In R, the
function kmode() is implemented in the klaR package.
Because k-means and k-modes divide the entire dataset into distinct groups, both
approaches are considered partitioning methods. A third partitioning method is
known as Partitioning around Medoids (PAM) [4]. In general, a medoid is a
representative object in a set of objects. In clustering, the medoids are the objects
in each cluster that minimize the sum of the distances from the medoid to the
other objects in the cluster. The advantage of using PAM is that the “center” of
each cluster is an actual object in the dataset. PAM is implemented in R by the
pam() function included in the cluster R package. The fpc R package includes a
function pamk() , which uses the pam() function to find the optimal value for k.
Other clustering methods include hierarchical agglomerative clustering and density
clustering methods. In hierarchical agglomerative clustering, each object is initially
placed in its own cluster. The clusters are then combined with the most similar
cluster. This process is repeated until one cluster, which includes all the objects,
exists. The R stats package includes the hclust() function for performing
hierarchical agglomerative clustering. In density-based clustering methods, the
clusters are identified by the concentration of points. The fpc R package includes
a function, dbscan() , to perform density-based clustering analysis. Density-based
clustering can be useful to identify irregularly shaped clusters.
Search WWH ::




Custom Search