Database Reference
In-Depth Information
Figure 2.6 Model building phase
Although the modeling techniques and logic required to develop models can be
highly complex, the actual duration of this phase can be short compared to the
time spent preparing the data and defining the approaches. In general, plan to
spend more time preparing and learning the data (Phases 1-2) and crafting a
presentation of the findings (Phase 5). Phases 3 and 4 tend to move more quickly,
although they are more complex from a conceptual standpoint.
As part of this phase, the data science team needs to execute the models defined in
Phase 3.
During this phase, users run models from analytical software packages, such as
R or SAS, on file extracts and small datasets for testing purposes. On a small
scale, assess the validity of the model and its results. For instance, determine if the
model accounts for most of the data and has robust predictive power. At this point,
refine the models to optimize the results, such as by modifying variable inputs
or reducing correlated variables where appropriate. In Phase 3, the team may
have had some knowledge of correlated variables or problematic data attributes,
which will be confirmed or denied once the models are actually executed. When
immersed in the details of constructing models and transforming data, many small
decisions are often made about the data and the approach for the modeling. These
Search WWH ::




Custom Search