Databases Reference
In-Depth Information
strategy in commercial applications of machine learning. Moreover, PMML combines
these definitions into an expression of business process for a complex data workflow.
Overall, that maps to Cascading quite closely—input and output variables in PMML
correspond to tuple flows, with the Cascading flow planners providing parallelization
for predictive model algorithms on Hadoop clusters.
Currently there are several companies collaborating on the Pattern project. Besides the
Random Forest and Logistic Regression algorithms, other PMML implementations in‐
clude the following:
Linear Regression
K-Means Clustering
Hierarchical Clustering
Support Vector Machines
Linear regression is probably the most common form of predictive model, such as in
Microsoft Excel spreadsheets. K-means is widely used for customer segmentation,
document search, and other kinds of predictive models.
Other good PMML resources include the following:
Data Mining Group —XML standards and supported vendors
Zementis PMML validator
PMML group on LinkedIn
“Representing predictive solutions in PMML” by Alex Guazzelli
Books Related to Pattern
For more information about PMML and predictive models in general, check out these
topics:
PMML in Action by Alex Guazzelli, Wen-Ching Lin, and Tridivesh Jena (Create‐
Space)
Mining of Massive Datasets by Anand Rajaraman and Jeffrey Ullman (Cambridge
University Press)
 
Search WWH ::




Custom Search