Data Analytics: Exploiting the Data Warehouse - Data Warehouse Systems: Design and Implementation - page 349

Database Reference

In-Depth Information

[NWDW Decision Tree].[Permanent Employees] =

T.[PermanentEmployees] AND

[NWDW Decision Tree].[Year Established] = T.[YearEstablished] AND

[NWDW Decision Tree].[Store Surface] = T.[StoreSurface] AND

[NWDW Decision Tree].[Parking Surface] = T.[ParkingSurface]

WHERE [High Value Cust] = 1

This query results in the following table, where values in the column Prob=1

indicate the probability of being a high-valued customer:

CompanyName BusinessType PE

YE

AP

SS

PS

Prob=1 Prob=0

L ' Amour Fou

Restaurant

4

1955

2

1178

918

0.7551

0.2448

Le Tavernier

Pub

1

1984

1

2787

438

0.5326

0.4673

Potemkine

Restaurant

5

1956

1

773

460

0.7551

0.2448

Flamingo

Restaurant

3

1960

2

2935 1191

0.6041

0.3958

Pure Bar

Pub

3

1989

2

1360

307

0.5326

0.4673

···

···

···

···

···

···

···

···

···

Clustering

We will now show how to build a clustering model to find out a customer

profile structure, using the view TargetCustomers and the parameters depicted

in the right-hand side of Fig. 9.7 . Then, given the table of prospective

customers ( NewCustomers ), we can predict to which profile each new customer

is likely to belong. Figure 9.9 shows the result of the clustering algorithm. The

shadow and thickness of the lines linking the clusters indicate the strength

of the relationship between the clusters, the darker and thicker the line, the

stronger the link between two clusters. The profiles of some of the clusters

are given in Fig. 9.10 . These profiles indicate, for example, the number of

elements in each cluster and the distribution of the attribute values within

each cluster. We can see, for example, that Cluster 5 contains few high-valued

customers.

In clustering models, a content query asks for details about the clusters

that were found. A prediction query may ask to which cluster a new data

point is most likely to belong.

Once the model is built (in this case, called NWDW Clustering ), we can find

out the characteristics of the clusters produced. Since in Analysis Services, the

clustering structure is a tree such that below the root ( NODE TYPE =1) there

is a collection of flat nodes (i.e., NODE TYPE =5). Thus, since all clusters

have a node type of 5, we can easily retrieve a list of the clusters by querying

the model content for only the nodes of that type. We can also filter the

nodes by support. The query shown below displays the identifier, the name,

the support (the number of elements in the cluster), and the description (the

Next Page

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home