Information Technology Reference
In-Depth Information
Raw data
Full page
Model
Logo
Fig. 1.4. Sample pictures from the neurological data set. The category is written below
the image.
In a neurological test set, we used four categories: full pages, raw data, model
and logo. In the biological test set there were no logo, hence there were only 3
categories. The distribution was quite different, instead of a large number of full
pages, we had many logos. We estimate the error rate of this set in fact higher
than on the biological data set, because of the increased number of categories.
Unlike in the biological data set the distribution is much more biased with only
13 instances of full pages and over 1000 models.
1.5.1
Method for Classification
For image classification, a feature-based approach seems best, because we do not
classify based on the object seen in the image, but on the representation of that
object, e.g. gel blots vs. graph points. Other algorithms, like the random window
approach, tend to repress those representation details. We base our method on
[26], a method originally used to distinguish between computer-made images and
real life photos, since that is a closely related problem.
In order to classify the pictures, we calculate 6 metrics or features based on
the picture. The calculations for the metrics are all linear, so the calculation
takes less than a second for an average picture. The small number of attributes
allows fast learning and classification. An information gain estimate is given in
table 1.1. The features are explained below. Also included is an interpretation
of how useful these features were to our task.
1. Number of Colours : counts the number of occurring colours in the pic-
ture. We assume that many colours indicate slow colour changes typical for
photos of experimental results, while graphs are usually black and white.
2. Contour Sharpness : measures the occurrence of hard changes in the
colour values. First, it compares each pixel with its neighbouring pixels, to find
the biggest colour difference between them. Then, all pixels with a maximum
difference bigger than 0 are counted as S and those bigger then a threshold t
 
Search WWH ::




Custom Search