Java Reference
In-Depth Information
OLAP tools can be used to get the basic counts and frequencies on
the different populations of customers, but these tools do not really
help when the user is looking for the most distinctive attributes. We
will show code that computes the profiles on the population “belong-
ing” and “not belonging” to a cluster, and that returns the distance
between these two distributions. This distance can be used to sort the
attributes. Attributes with the largest distance can be considered as
distinctive for the specified cluster.
In the next section, we will show how to compare distributions of
discrete variables for one cluster with respect to all the others. This
will be done through expressions using the SQL “group by” state-
ment. We have chosen this design for the method computeProfile . In
this method, the user specifies a table name and an attribute name to
be profiled. Again, this table does not have to be the one used for
model building as long as there is an identifier attribute that can be
used to merge the information from this table and the generated
cluster assignments.
In the following code for method computeProfile, iInputTableName
is the table containing the attribute to be profiled and iAttributeName
is the attribute to be profiled. The argument iClusterCount is a way to
point to interest, because each model was created with a different
number of clusters, and iClusterIdx is the cluster identifier to be
profiled against all the others. In this code, we assume that cluster
identifiers are integers, which is normally the case.
1. public double computeProfile(String iInputDataSet,
2.
String iAttributeName,
3.
int iClusterCount,
4.
int iClusterIdx)
5.
throws JDMException, InterruptedException, SQLException {
6.
double lDistance 0.0;
7.
String lClusterTableName mApplyOutputPrefix;
8.
String lClusterAttributeName mModelPrefix "_" iClusterCount;
9.
if (mUseApplyOutPrefix)
10.
lClusterTableName "_" iClusterCount;
11.
String lSQLCountQuery "select count(*), " iAttributeName " from "
12.
iInputDataSet " a left outer join "
13.
lClusterTableName " b on a."
14.
mIdentifierColumnName " b." mIdentifierColumnName
15.
" where (b." lClusterAttributeName " " iClusterIdx ")"
16.
" group by a." iAttributeName;
17.
Statement lStatement mJDBCConnection.createStatement();
18.
ResultSet lResultSetCount
19.
lStatement.executeQuery(lSQLCountQuery);
 
Search WWH ::




Custom Search