Using the JDM API - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Table 9-20

(continued)

mapClusters(

ClusteringApplyContent content,

String baseDestPhysAttrName)

Maps all clusters in the model and the specified content

value to a set of named destination attributes. When this

method is used, the apply output data will have apply con-

tents for all the leaf clusters. The base attribute name speci-

fied by the user will be used to generate the columns in the

apply output data. For example, when a user calls the fol-

lowing methods where the input model has four leaf clusters

i.e., { 1, 2, ,3 ,4 }, the apply task creates apply output data

with columns ClusterId_1, ClusterId_2, ClusterId_3,

ClusterId_4, Probability_1, Probability_2,

Probability_3, Probability_4. The column Probability_1

has the probability value associated with the cluster id value

in column ClusterId_1. Similarly, the other columns will

have cluster ids and associated probabilities.

mapClusters

(ClusteringApplyContent.clusterIdentifier,

“ClusterId”);

mapClusters

(ClusteringApplyContent..probability,

“Probability”);

Listing 9-17 shows the code that illustrates the use of clustering

interfaces for the customer segmentation problem discussed in

Section 7.5. Lines 34 to 41 show the creation of the clustering settings

object that specifies the aggregation function as euclidean and the

attribute comparison function for age attribute as absolute difference in

values. All other attributes use the DME's default attribute compari-

son function. In addition, the maximum number of clusters is speci-

fied as 50 and the cluster case count must be between 500 and 100,000

cases. Building this segmentModel is similar to building the other

types of models, as shown from lines 69 to 71. Once the segmentModel

is built, we apply this model to the apply input data to find the most

probable cluster id using the ClusterApplySettings.mapTopCluster

method. Lines 47 to 53 show the creation of the apply settings object

and lines 74 to 79 show the execution of the dataset (batch) apply

task. Similar to classification and regression, clustering models can

also support real-time single record apply operations. Lines 95 to 119

show retrieving the clustering model and each cluster's details. In

this example, we show retrieving the age attribute statistics details

such as frequencies and how applications can obtain further cluster

details from the model.

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home