Data Analytics: Exploiting the Data Warehouse - Data Warehouse Systems: Design and Implementation - page 351

Database Reference

In-Depth Information

FROM NWDW Clustering.CONTENT

WHERE NODE TYPE = 5 AND NODE SUPPORT > 75

The above query yields the following result:

Name Caption Support

Description

001

Cluster 1

102

City Key=66, City Key=74, City Key=17, City

Key=7, City Key=75, City Key=33, City Key=70,

City Key=3, City Key=108, City Key=59, Per-

manent Employees=0, 1971 < =Year Established

< =1992, Business Type=Delicatessen, [ ... ]

002

Cluster 2

89

City Key=30, City Key=49, City Key=12, City

Key=2, City Key=28, Business Type=Pub, Business

Type=Restaurant, [ ... ]

006

Cluster 6

77

City Key=95, City Key=53, City Key=54, 1950

< =Year Established < =1957, Permanent Employ-

ees=6, City Key=11, Business Type=Minimart, [ ... ]

We can also ask for the discriminating factors of clusters. The following

query returns a table that indicates the primary discriminating factors

between two clusters with node IDs 001 and 002 :

CALL System.Microsoft.AnalysisServices.System.DataMining.Clustering.

GetClusterDiscrimination( ' NWDW Clustering ' , ' 001 ' , ' 002 ' , 0.0005, true)

The query uses a system stored procedure, although the query could also

be performed manually. Attributes with positive score favor the cluster with

ID 001 , whereas attributes with negative values favor the cluster with ID

002 . For example, if we analyze the table below with respect to Fig. 9.10 ,

we can see that Cluster 2 contains an important proportion of records with

BusinessType = ' Restaurant ' compared to Cluster 1. This is explained by the

score

−

63.7781. The result of this query is given next:

Attributes Values Score

Store Surface 142 - 1,924 100

Store Surface 1,925 - 7,857 -99.9999

Date Last Order 12/03/1998 - 06/05/1998 76.1429

Date Last Order 18/07/1996 - 12/03/1998 -68.3848

Business Type

Restaurant

-63.7781

...

...

...

We can use the model to make predictions about the outcome using the

predictable attributes in the model, which are handled depending on whether

the attribute is set to Predict or PredictOnly (Fig. 9.7 ). In the first case, the

values for the attribute are added to the clustering model and appear as

attributes in the finished model. In the second case, the values are not used to

create clusters. Instead, after the model is completed, the clustering algorithm

creates new values for the PredictOnly attribute based on the clusters to which

each case belongs. This is our case in this example, as it can be seen in Fig. 9.7 .

Next Page

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home