Databases Reference
In-Depth Information
• An understanding of the quality and meaning of the data in the warehouse.
• Business insight gained using other tools and the warehouse.
• An understanding of a business issue being driven by too many variables to model
outcomes in any other way.
In other words, data mining tools are not a replacement for the analytical skills of data
warehouse users.
The data mining tools themselves can rely on a number of techniques to produce the
relationships, such as:
• Extended statistical algorithms, such as those provided by R and other statistical
tools, to highlight statistical variations in the data.
• Clustering techniques that show how business outcomes can fall into certain
groups, such as insurance claims versus time for various age brackets. In this ex‐
ample, once a low-risk group is found or classified, further research into influencing
factors or “associations” might take place.
• Logic models (if A occurs, then B or C is a possible outcome) validated against small
sample sets and then applied to larger data models for prediction, commonly known
as decision trees .
• Neural networks “trained” against small sets, with known results to be applied later
against a much larger set.
• Anomaly detection used to detect outliers and rare events.
• Visualization techniques used to graphically plot variables and understand which
variables are key to a particular outcome.
Data mining is often used to solve difficult business problems such as fraud detection
and churn in micro-opportunity marketing, as well as in other areas where many vari‐
ables can influence an outcome. Companies servicing credit cards use data mining to
track unusual usage—for example, the unexpected charging to a credit card of expensive
jewelry in a city not normally traveled to by the cardholder. Discovering clusters of
unusual buying patterns within certain small groups might also drive micro-
opportunity market campaigns aimed at small audiences with a high probability of
purchasing products or services.
Oracle first began to embed algorithms packaged as the Data Mining Option into the
Oracle9 i database. Algorithms now in the Advanced Analytics Option include Naïve
Bayes, Associations, Adaptive Bayes Networks, Clustering, Expectation Maximization
(EM), Support Vector Machines (SVM), Nonnegative Matrix Factorization (NMF),
Decision Trees, Generalized Linear Models (supporting Binary Logistic Regression and
Multivariate Linear Regression), Principal Component Analysis (PCA), and Singular
Search WWH ::




Custom Search