Database Reference
In-Depth Information
2.5 Phase 4: Model Building
In Phase 4, the data science team needs to develop datasets for training, testing,
and production purposes. These datasets enable the data scientist to develop the
analytical model and train it (“training data”), while holding aside some of the data
(“hold-out data” or “test data”) for testing the model. (These topics are addressed
in more detail in Chapter 3.) During this process, it is critical to ensure that the
training and test datasets are sufficiently robust for the model and analytical
techniques. A simple way to think of these datasets is to view the training dataset for
conducting the initial experiments and the test sets for validating an approach once
the initial experiments and models have been run.
In the model building phase, shown in Figure 2.6 , an analytical model is developed
and fit on the training data and evaluated (scored) against the test data. The phases
of model planning and model building can overlap quite a bit, and in practice one
can iterate back and forth between the two phases for a while before settling on a
final model.
Search WWH ::




Custom Search