VisMiner Reference by Task - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

iii. Drag newly created clustering up to display and drop.

iv. Select SOM Viewer.

v. Right-click on any cluster cell to “Make dataset from cluster”.

Creating training, validation, and test sets

Datasets to be used by modelers allow the partitioning of the dataset into

training and validation partitions. Observations in the training partition are used

to actually build the model. Rows in the validation partition are used during

model construction to avoid overtraining, and after construction to assess the

model's generalizability. Once a model is constructed, other datasets having the

same structure (same column names and types) may be applied to the model to

generate predictions. These datasets, applied later, are herein referred to as

“test” sets.

Use the Control Center to partition into training and validation sets:

1. Right-click on dataset to be partitioned.

2. Select “Create derived dataset”.

3. Check the “Columns to Include”.

4. The number of training rows is specified in “Rows for new derived set”.

5. The number of validation rows is specified in “Rows for validation set”.

6. Once created, drag and drop modeler over dataset to build model.

Apply a test dataset:

1. Drag a compatible dataset (same column names and types) over the model

and drop.

2. Select “Test model performance” to create a test set that can be viewed in

the confusion viewer, ROC viewer or other model viewers.

3. Select “Generate Predictions” to create a new dataset with a new “predicted

values” column.

Balancing/stratified sampling

In many datasets used for classification, the frequency of positive response

observations is small relative to the frequency of the negative response observa-

tions. These imbalances can make it difficult for the modeler to successfully

extract rules for positive response prediction. To get around this problem, it is

Search WWH ::

Custom Search

Home