Database Reference
In-Depth Information
9.1.7 Data Mining in Analysis Services
In the previous section, we studied how data mining algorithms create a
mining model by analyzing a data set to find hidden patterns and trends.
Once the parameters of the mining model are defined, they are applied
across the entire data set to extract interesting knowledge. As we have seen,
the mining model can take various forms, like a decision tree that predicts
whether a particular customer will buy a product, a model that forecasts
sales, a set of clusters describing the customers' profiles, a set of rules that
describe which products are ordered together, and so on.
The two tools that we have used to implement our examples, namely,
Microsoft Analysis Services and Pentaho Business Analytics, provide data
mining capabilities that implement the concepts studied above. The former
also provides a query language denoted DMX (Data Mining eXtensions).
This language can be used for querying data mining models in Analysis
Services. There are two kinds of DMX queries: content queries ,which
provide details about the patterns discovered during the analysis, and
prediction queries , which use the patterns in the model, for example, to
make predictions for new data. On the other hand, Pentaho Data Mining is
based on Weka (Waikato Environment for Knowledge Analysis). Weka is a
machine learning and data mining software written in Java. Weka includes
classification, regression, association rules, and clustering algorithms. In this
section and to avoid redundancy, we use Analysis Services to illustrate how
data mining concepts can be used in practice.
We next study three algorithms provided by Analysis Services, namely,
decision trees, clustering, and association rules. We will present these
algorithms using the Northwind case study, extended with the demographic
and new customer data described in Sect. 9.1 .SincetheoriginalNorthwind
database is not big enough to obtain meaningful results (in particular,
because of the low number of customers), we have extended such data set
ten times with generated data.
To create a decision tree and cluster models, we will use the Customer
and CustomerDemographics dimension tables, together with the Sales fact
table, to produce a view TargetCustomers . This view includes a class Boolean
attribute HighValueCust which indicates that the customer had placed an
order with a total amount higher than $3,500. This attribute represents the
class variable whose value the model will forecast, as we explain below. The
view TargetCustomers is defined as follows:
CREATE VIEW TargetCustomers AS
WITH CustOrderTotal AS (
SELECT S.CustomerKey, S.OrderNo,
SUM(S.SalesAmount) AS TotalAmount
FROM Sales S
GROUP BY S.CustomerKey, S.OrderNo ),
CustMaxAmount AS (
Search WWH ::




Custom Search