Database Reference
In-Depth Information
MADlib version 1.6 modules [5] are described in Table 11.4 .
Table 11.4 MADlib Modules
Module
Description
Generalized
Linear Models
Includes linear regression, logistic regression, and
multinomial logistic regression
Cross Validation Evaluates the predictive power of a fitted model
Linear Systems
Solves dense and sparse linear system problems
Matrix
Factorization
Performs low-rank matrix factorization and singular value
decomposition
Association
Rules
Implements the Apriori algorithm to identify frequent item
sets
Clustering
Implements k-means clustering
Topic Modeling
Provides a Latent Dirichlet Allocation predictive model for a
set of documents
Descriptive
Statistics
Simplifies the computation of summary statistics and
correlations
Inferential
Statistics
Conducts hypothesis tests
Support Modules Provides general array and probability functions that can also
be used by other MADlib modules
Dimensionality
Reduction
Enables principal component analyses and projections
Time Series
Analysis
Conducts ARIMA analyses
http://doc.madlib.net/latest/modules.html
In the following example, MADlib is used to perform a k-means clustering analysis,
as described in Chapter 4, “Advanced Analytical Theory and Methods: Clustering,”
on the web retailer's customers. Two customer attributes—age and total sales
since 2013—have been identified as variables of interest for the purposes of the
clustering analysis. The customer's age is available in the
customer_demographics table. The total sales for each customer can be
computed from the orders_recent table. Because it was decided to include
Search WWH ::




Custom Search