VisMiner Reference by Task - Visual Data Mining: The VisMiner Approach

Databases Reference

In-Depth Information

Model performance

Most measures of model performance may be computed using any of the three

applicable datasets - training, validation, and test. These sets can also be used to

compare actual outputs to predicted outputs.

For the test set only, the performancemeasures are not automatically computed.

The dataset must first be applied to the model. (Drag and drop test dataset on

model, then choose “Test model performance”.)

Classification

1. Classification error rate

a. Compare error rate of model to baseline error rate which is one minus

the rate of the most frequently occurring class. For example, if the most

frequently occurring class is found in 52% of the observations, then a

model prediction error rate of 40% would be an improvement over the

baseline error rate of 48%. However, if the rate of the most frequently

occurring class is 95%, then a model error rate of 10% would be worse

than the baseline error rate of 5%.

2. View model error rates using the confusion viewer, the ROC viewer, and

the class model viewer.

3. False positive and false negative error rates - available in the confusion

viewer. Depending on the intended model application, the costs of the

different types of errors may be quite different. If one error type is more

costly than another, focus on that type of error.

4. Area under curve (AUC) - available in ROC curve viewer. Maximum

AUC is 1.0. The closer to 1.0 the better.

5. Model lift - available within ROC curve viewer. Represents error rate

found when only the top n% of the observations are chosen.

6. Model applications costs - available within ROC curve viewer. Allows

user to apply monetary costs to compute benefits of model with respect to

false positive and false negative errors.

Regression

1. R 2 - measure of regression fitness. Any value greater than zero is an

improvement on the baseline model (output attribute mean). Available

in the regression model viewer and the regression summary where

applicable.

2. F-statistic and P-value - statistical measures of goodness-of-fit with

respect to the regression as a whole and to input coefficients. Available

for linear and polynomial regressions only via regression summary.

Search WWH ::

Custom Search

Home