Information Technology Reference
In-Depth Information
Table 1.1. Information gain of the metrics
biological neurological
Metric
data set
data set
0.909
0.382
Number of Colors
0.854
0.473
Contour Sharpness
0.829
0.496
Smallest Dimension
0.68
0.663
Prevalent Color
0.663
0.595
Dimension Ratio
0
0.133
Saturation
are counted as C . The sharpness value is then C/S . The threshold is set halfway
between black and white. While photos of experimental results tend to have
fuzzy borders with slow colour changes, graphs and text have a lot of black and
white changes.
3. Smallest Dimension : represents the actual size of the picture or in fact
the minimum of width and height in centimetres. This is used to sort out very
big pictures, which might indicate a full text page or very small pictures that
might indicate a logo.
4. Prevalent Colour : is the percentage of the prevalent, usually the back-
ground, colour in the picture. The idea is that strong backgrounds indicate
graphics. Unfortunately, pure gel pictures are mostly gray and mixed pictures
are mostly white, so the information gain for the biological data set is limited.
Still it is the most important metric for the neurological data set.
5. Dimension Ratio : calculates the ratio between height and width of the
picture. In our test sets, it was mostly between 1 and 2, due to the standard
dimension of figures. Outliers weakly indicate gel pictures, while whole pages
and groups of logos have a characteristical dimension ratio.
6. Saturation : measures the percentage of gray-scale pixels, compared to
coloured ones. While this is no help in the biological data set at all, due to com-
plete lack of coloured pictures, it is of some help in the more modern neurological
data set.
Table 1.1 also shows the decline of importance of the first three metrics over
time. In fact, in the newer data set the model category are more similar to the
raw data pictures. We assume that this is a result of the newer technologies,
making it possible to produce models that more detailed and colourful.
1.5.2
Classification Results
We tested the method on both a neurological and a biological data set. The
results (cf. table 1.2) are about the same as for the biological data set (up to
94.5%). This score can be considered almost perfect, since the estimated error
rate in the data set is in the same region. Compared to the other data set,
the classification task was de facto harder, as there was an additional category.
 
Search WWH ::




Custom Search