Data Mining Techniques for Segmentation - Data Mining Techniques in CRM: Inside Customer Segmentation

Database Reference

In-Depth Information

center and the record's input values. This portion, and thus the magnitude of

the change in the weights, is determined by a change or learning rate parameter

referred to as eta. Typically, the first phase has a relatively large eta to learn the

overall data structure and the second phase incorporates a smaller eta to fine-tune

the cluster centers.

Although quite similar, Kohonen networks and K-means also have significant

differences. First of all, clusters in Kohonen networks are spatially arranged in a

grid map. Moreover, the ''winning'' of records by a neuron/cluster also affects the

weights of the surrounding neurons. Output neurons symmetrically around the

''winning'' neuron comprise a ''neighborhood'' of nearby units. Record assignment

adjusts the weights of all neighboring neurons. Because of this neighborhood

adaptation, the topology of the output map has a practical meaning, with similar

clusters appearing close together as nearby neurons.

Output units with no winning records are removed from the solution. The

retained output units represent the probable clusters. Users can specify the

topology of the solution, that is, the maximum width and length dimensions of the

output grid map. Selecting the right number of rows and columns for the output

map requires trial and error.

Analysts should also evaluate the geometry/similarity and the density/

frequency of the proposed clusters. Kohonen networks involve many iterations

and weight adjustments and consequently they are considerably slower than

the TwoStep and K-means. Nevertheless, they are worth trying as a clustering

alternative, especially because of the geometrical representation of the cluster

similarity that they provide.

In Kohonen network models, cluster assignment is represented by two

generated fields which denote the grid map co-ordinates (for instance, X

=

1,

Y

3) of each record. These two fields should normally be concatenated into a

single cluster membership field. A common and useful graphical representation

of the geometry of the derived solution is through a simple scatterplot, with

all records placed in the two-dimensional space defined by the grid co-ordinate

fields. A scatterplot like that for a nine-cluster (3

=

3) solution is presented

in Figure 3.10, depicting the values of the cluster membership field. This plot

visually represents the density and the relative position, and hence similarity, of

the resulting clusters.

×

Recommended Kohonen Network/SOM Options

Figures 3.11 and 3.12 and Table 3.11 explain the settings of the IBM SPSS

Modeler Kohonen network/SOMmodel and provide suggestions for fine tuning of

the algorithm.

Data Mining Techniques in CRM: Inside Customer Segmentation

Search WWH ::

Custom Search

Home