Database Reference
In-Depth Information
The k-means algorithm to find k clusters can be described in the following four
steps.
1. Choose the value of k and the k initial guesses for the centroids.
In this example, k = 3, and the initial centroids are indicated by the points
shaded in red, green, and blue in Figure 4.2 .
2. Compute the distance from each data point to each centroid.
Assign each point to the closest centroid. This association defines the first
k clusters.
In two dimensions, the distance, d , between any two points,
and
, in the Cartesian plane is typically expressed by using the
Euclidean distance measure provided in Equation 4.1 .
4.1
In Figure 4.3 , the points closest to a centroid are shaded the
corresponding color.
3. Compute the centroid, the center of mass, of each newly defined cluster
from Step 2.
In Figure 4.4 , the computed centroids in Step 3 are the lightly shaded
points of the corresponding color. In two dimensions, the centroid (
)
of the m points in a k-means cluster is calculated as follows in Equation
4.2 .
4.2
Thus, is the ordered pair of the arithmetic means of the
coordinates of the m points in the cluster. In this step, a centroid is
computed for each of the k clusters.
4. Repeat Steps 2 and 3 until the algorithm converges to an answer.
a. Assign each point to the closest centroid computed in Step 3.
b. Compute the centroid of newly defined clusters.
 
Search WWH ::




Custom Search