Information Technology Reference
In-Depth Information
Similarly, for an “x-vote union” we take the x -th smallest ranking as the score
of the ensemble. e.g. Union-2
A : min2(1,3,1) = 1
B : min2(2,2,3) = 2
C : min2(3,1,2) = 2 Points with the score n appeared in the top- n outlier list of
at least x detectors.
The ALOI Outlier Data-Set was used for the experiment. Some details are given
below:
1. The ALOI [17] dataset is a set of 110250 color images taken from 1000
small objects under varying conditions (i.e., approximately 100 pictures per
object).
2. In order to be appropriate for use as an outlier dataset, the ALOI dataset was
converted into an RGB histogram form, with 3 bins for each color channel,
and the number of images was reduced to 50000, with 1508 outliers.
3. To create these outliers, 1-5 images taken from the photo galleries of 562
objects such that there were a total of 1508 images to be used as the outliers.
While the other image galleries were left intact to serve as non-outliers. The
result was a dataset of 50000 with a dimensionality of 27.
For our candidate algorithms we used KNN, Aggregated KNN, LOF [24],
LDOF[25] and LoOP[11], which all have a single parameter k. k was adjusted
from 3 to 30 for a total of 5*28 = 140 candidates.
In the comparison made we compared our proposed “one union vote”(at least
one ADA has marked it as an outlier), the “140-union vote” (all the ADAs
agree that a certain instance is an outlier, where we use 140 versions of various
ADAs), the greedy fusion proposed by [6], and the simple average of all ADAs
termed the “Mean Ensemble”. We also include the result of initializing the greedy
ensemble method using the labels themselves. This is obviously not possible in
practice and is done to get the upper bound on performance for benchmarking.
The performance of the various methods are measured by the ROC curve. The
Receiver-Operator Curve (ROC) graphically displays a classifier's TPR vs. it's
FPR as the discrimination threshold is varied. It is often used to compare the
goodness of the rankings, scores, or probabilities produced by different classifiers.
The curve always starts at the bottom left (0,0) and ends at (1,1) representing
the extremes of a threshold so high that no instances will be considered positive,
and a threshold so low such that all instances become positive.
The ROC of an ideal classifier reaches TPR=1 when FPR=0 (is tight with the
y-axis and the top left corner), implying that there exists a decision threshold
where the classes are split perfectly. In terms of rankings, this means all instances
of the positive class are ordered before the negative class. The area-under-curve
of this ideal ROC curve is the area of the entire plot and is taken to be 1,
normalized. A classifier that randomly labels instances as positive or negative
will have a ROC curve approaching the diagonal and an AUC of 0.5.
Since there may not be a pre-specified acceptable rate of false positives or a
decision threshold, oftentimes the area-under-curve(AUC) is used as a crude way
 
Search WWH ::




Custom Search