Information Technology Reference
In-Depth Information
7.4 SIMULATION
To provide a more comprehensive insight into the algorithms introduced in
Section 7.2, simulations are conducted to compare their performance against
both synthetic and real-world benchmark datasets, and are listed and configured
as follows.
•The REA algorithm uses k -nearest neighbors to determine the qualification
of previous minority class examples for making up the minority class ratio
f in post-balanced training data chunks. k issettobe10and f is set to be
0 . 5.
•The SERA algorithm uses Mahalanobis distance to determine the qualifica-
tion of previous minority class examples for making up the minority class
ratio f in post-balanced training data chunks. f is set to be 0 . 5.
•The UB algorithm uses all previous minority class examples to balance the
training data chunk.
•The SMOTE algorithm employs the synthetic minority over-sampling tech-
nique to create a number of synthetic minority class instances for making
up the minority class ratio f in post-balanced training data chunk. f is set
to be 0 . 5.
•The Normal approach directly learns on the training data chunk; it is the
baseline of this simulation.
7.4.1 Metrics
Following the routine of imbalanced learning study, the minority class data and
the majority class data belong to positive and negative classes, respectively. Let
{
denote the
predicted positive and negative class labels; the confusion matrix for the binary
classification problem can be defined as in Figure 7.4.
By manipulating the confusion matrix, the overall prediction accuracy (OA)
can be defined as
p, n
}
denote the positive and negative true class labels and
{
Y, N
}
TP + TN
OA =
(7.31)
TP
+
TN
+
FP
+
FN
Figure 7.4 Confusion matrix for binary classification.
Search WWH ::




Custom Search