Database Reference
In-Depth Information
details can be easily forgotten once the project is completed. Therefore, it is vital to
record the results and logic of the model during this phase. In addition, one must
take care to record any operating assumptions that were made in the modeling
process regarding the data or the context.
Creating robust models that are suitable to a specific situation requires thoughtful
consideration to ensure the models being developed ultimately meet the objectives
outlined in Phase 1. Questions to consider include these:
• Does the model appear valid and accurate on the test data?
• Does the model output/behavior make sense to the domain experts? That
is, does it appear as if the model is giving answers that make sense in this
context?
• Do the parameter values of the fitted model make sense in the context of
the domain?
• Is the model sufficiently accurate to meet the goal?
• Does the model avoid intolerable mistakes? Depending on context, false
positives may be more serious or less serious than false negatives, for
instance. (False positives and false negatives are discussed further in
Chapter 3 and Chapter 7, “Advanced Analytical Theory and Methods:
Classification.”)
• Are more data or more inputs needed? Do any of the inputs need to be
transformed or eliminated?
• Will the kind of model chosen support the runtime requirements?
• Is a different form of the model required to address the business problem?
If so, go back to the model planning phase and revise the modeling
approach.
Once the data science team can evaluate either if the model is sufficiently robust
to solve the problem or if the team has failed, it can move to the next phase in the
Data Analytics Lifecycle.
2.5.1 Common Tools for the Model Building Phase
There are many tools available to assist in this phase, focused primarily on
statistical analysis or data mining software. Common tools in this space include,
but are not limited to, the following:
Search WWH ::




Custom Search