Information Technology Reference
In-Depth Information
Table 8.3. Classification Results
Optimized Not optimized
precision recall F1-value precision recall F1-value
Micro averages
0.95
0.95
0.95
0.90
0.90
0.90
Macro averages
0.94
0.86
0.87
0.82
0.79
0.78
Support Vector Machines solve a binary classification problem. The SVM score
associated with an instance of the considered events is its signed distance to the
separating hyperplane in units of the SVM margin. In order to solve multiclass
problems, a series of Support Vector Machines have to be trained, e.g., in the case
of a one-vs-all training schema, the number of SVMs trained is given by the number
of classes. The scores between these different machines are not directly compara-
ble and the scores must be calibrated such that at least for a given classification
instance the scores are on an equal scale. In this application, the scores not only
must be comparable between classes for a given classification instance (page), but
also between different classification instances (pages), i.e., the SVM scores must be
mapped to probabilities. Platt [13] uses SVM scores that are calibrated to class
membership probabilities by adopting the interpretation of the score being propor-
tional to the logarithmic ratio of class membership probability. He determines the
class membership probability as a funcion of the SVM score by fitting a sigmoid
function to the empirically observed class membership probabilities as a function
of the SVM score. The fit parameters are the slope of the sigmoid function and/or
a translational offset. The latter parameter, given the interpretation of the SVM
scores discussed above, is the logarithmic ratio of the class prior probabilities. The
method used here [14] fixes the translational offset and only fits the slope parame-
ter. In addition, the Support Vector Machines are trained using cost factors for the
positive as well as for the negative class and optimize the two costs independently.
Empirical studies performed by the authors showed that cost factor optimization
in conjunction with fitting the slope parameter of the mapping function from SVM
scores to probabilities yields superior probability estimates than fitting the slope
and the translational offset without cost factor optimization, fitting the slope and
the translational offset with cost factor optimization, and fitting the slope only.
Table 8.3 summarizes the classification results for different loan forms. The re-
sults shown in the Optimized heading are the classification results obtained with the
class membership probabilities using cost factor optimization and fitting the slope
of the sigmoid function. Using SVM scores directly without calibration and cost
factor optimization yields the results under the heading Not Optimized . The macro
averages, especially, illustrate the effectiveness of the elected method. The observed
improvement is a combined effect of using probabilities instead of SVM scores and
cost factor optimization. An added benefit of optimizing the positive and negative
cost factors is an improved handling of the OCR noise. As discussed in section 8.3,
OCR increases the feature space considerably and cost factor optimization becomes
important in order to avoid overfitting to the training corpus.
In summary, the effects of cost factor optimization can be interpreted as follows:
The ratio of positive to negative cost factors determines the right class prior prob-
Search WWH ::




Custom Search