Data Analytics: Exploiting the Data Warehouse - Data Warehouse Systems: Design and Implementation

Database Reference

In-Depth Information

9.1.7 Data Mining in Analysis Services

In the previous section, we studied how data mining algorithms create a

mining model by analyzing a data set to find hidden patterns and trends.

Once the parameters of the mining model are defined, they are applied

across the entire data set to extract interesting knowledge. As we have seen,

the mining model can take various forms, like a decision tree that predicts

whether a particular customer will buy a product, a model that forecasts

sales, a set of clusters describing the customers' profiles, a set of rules that

describe which products are ordered together, and so on.

The two tools that we have used to implement our examples, namely,

Microsoft Analysis Services and Pentaho Business Analytics, provide data

mining capabilities that implement the concepts studied above. The former

also provides a query language denoted DMX (Data Mining eXtensions).

This language can be used for querying data mining models in Analysis

Services. There are two kinds of DMX queries: content queries ,which

provide details about the patterns discovered during the analysis, and

prediction queries , which use the patterns in the model, for example, to

make predictions for new data. On the other hand, Pentaho Data Mining is

based on Weka (Waikato Environment for Knowledge Analysis). Weka is a

machine learning and data mining software written in Java. Weka includes

classification, regression, association rules, and clustering algorithms. In this

section and to avoid redundancy, we use Analysis Services to illustrate how

data mining concepts can be used in practice.

We next study three algorithms provided by Analysis Services, namely,

decision trees, clustering, and association rules. We will present these

algorithms using the Northwind case study, extended with the demographic

and new customer data described in Sect. 9.1 .SincetheoriginalNorthwind

database is not big enough to obtain meaningful results (in particular,

because of the low number of customers), we have extended such data set

ten times with generated data.

To create a decision tree and cluster models, we will use the Customer

and CustomerDemographics dimension tables, together with the Sales fact

table, to produce a view TargetCustomers . This view includes a class Boolean

attribute HighValueCust which indicates that the customer had placed an

order with a total amount higher than $3,500. This attribute represents the

class variable whose value the model will forecast, as we explain below. The

view TargetCustomers is defined as follows:

CREATE VIEW TargetCustomers AS

WITH CustOrderTotal AS (

SELECT S.CustomerKey, S.OrderNo,

SUM(S.SalesAmount) AS TotalAmount

FROM Sales S

GROUP BY S.CustomerKey, S.OrderNo ),

CustMaxAmount AS (

Data Warehouse Systems: Design and Implementation

Search WWH ::

Custom Search

Home