Database Reference
In-Depth Information
• Determine if the situation warrants a single model or a series of
techniques as part of a larger analytic workflow. A few example models
include association rules (Chapter 5, “Advanced Analytical Theory and
Methods: Association Rules”) and logistic regression (Chapter 6,
“Advanced Analytical Theory and Methods: Regression”). Other tools,
such as Alpine Miner, enable users to set up a series of steps and analyses
and can serve as a front-end user interface (UI) for manipulating Big Data
sources in PostgreSQL.
In addition to the considerations just listed, it is useful to research and understand
how other analysts generally approach a specific kind of problem. Given the kind
of data and resources that are available, evaluate whether similar, existing
approaches will work or if the team will need to create something new. Many times
teams can get ideas from analogous problems that other people have solved in
different industry verticals or domain areas. Table 2.2 summarizes the results of
an exercise of this type, involving several domain areas and the types of models
previously used in a classification type of problem after conducting research on
churn models in multiple industry verticals. Performing this sort of diligence gives
the team ideas of how others have solved similar problems and presents the team
with a list of candidate models to try as part of the model planning phase.
Table 2.2 Research on Model Planning in Industry Verticals
Market Sector Analytic Techniques/Methods Used
Consumer
Packaged Goods
Multiple linear regression, automatic relevance determination
(ARD), and decision tree
Retail Banking Multiple regression
Retail Business Logistic regression, ARD, decision tree
Wireless Telecom Neural network, decision tree, hierarchical neurofuzzy
systems, rule evolver, logistic regression
2.4.1 Data Exploration and Variable Selection
Although some data exploration takes place in the data preparation phase, those
activities focus mainly on data hygiene and on assessing the quality of the data
itself. In Phase 3, the objective of the data exploration is to understand the
relationships among the variables to inform selection of the variables and methods
and to understand the problem domain. As with earlier phases of the Data
Analytics Lifecycle, it is important to spend time and focus attention on this
Search WWH ::




Custom Search