Databases Reference
In-Depth Information
iii. Drag newly created clustering up to display and drop.
iv. Select SOM Viewer.
v. Right-click on any cluster cell to “Make dataset from cluster”.
Creating training, validation, and test sets
Datasets to be used by modelers allow the partitioning of the dataset into
training and validation partitions. Observations in the training partition are used
to actually build the model. Rows in the validation partition are used during
model construction to avoid overtraining, and after construction to assess the
model's generalizability. Once a model is constructed, other datasets having the
same structure (same column names and types) may be applied to the model to
generate predictions. These datasets, applied later, are herein referred to as
“test” sets.
Use the Control Center to partition into training and validation sets:
1. Right-click on dataset to be partitioned.
2. Select “Create derived dataset”.
3. Check the “Columns to Include”.
4. The number of training rows is specified in “Rows for new derived set”.
5. The number of validation rows is specified in “Rows for validation set”.
6. Once created, drag and drop modeler over dataset to build model.
Apply a test dataset:
1. Drag a compatible dataset (same column names and types) over the model
and drop.
2. Select “Test model performance” to create a test set that can be viewed in
the confusion viewer, ROC viewer or other model viewers.
3. Select “Generate Predictions” to create a new dataset with a new “predicted
values” column.
Balancing/stratified sampling
In many datasets used for classification, the frequency of positive response
observations is small relative to the frequency of the negative response observa-
tions. These imbalances can make it difficult for the modeler to successfully
extract rules for positive response prediction. To get around this problem, it is
Search WWH ::




Custom Search