Image Processing Reference
In-Depth Information
with the following notations:{( x (1) , y (1) ), ( x (2) , y (2) ), … , ( x ( m ) , y ( m ) )} is the (input, output) vector,
h θ ( x ) ∈ R K is the hypothesis function, L is the number of layers, s l is the number of neurons in
a specific layer l, K is the number of classes, with y R K , and Θ , the matrix which stores the
weights for each layer.
4 Experimental setup
We aimed to create a scalable application, portable between different architectures and oper-
ating systems. Therefore, in what regards the programming language, we have chosen python
over Matlab or Octave, for practical reasons, especially for the libraries which facilitate the
user interface generation, socket management, and data processing. The operating system is a
12.04 LTS 32 bit Ubuntu, running on a machine with two cores with hyper threading and 4 GB
of RAM. The software architecture is modular to facilitate any subsequent refactoring; each
sub-module is implemented in a class. In order to make beter use of the hardware, the mod-
ules that require resources intensively use multi-threading and multi-processing techniques.
The data are stored in a MySQL database, based on MyISAM. The application follows the
standard client-server architecture, in order to facilitate the exposure of functionalities to mul-
tiple users at once. So far, we have disregarded the user management problems.
We have restricted the number of recognized classes at 10 so far. The training was conducted
on a CIFAR data set, provided by Ref. [ 14 ] ; it includes 60,000 small (32 × 32 pixels) color im-
ages. The author's tests involved feed forward neural networks as well, with performances re-
volving around 87%.
The document scans data consisted of 1380 images, obtained from two sources:
• scans of old, degraded documents, used as a benchmark in the ICDAR 2007 conference [ 15 ] ;
• high quality copies, containing mostly manuals and documentation for the Ubuntu 12.04
operating system. In order to be able to use them, we have previously converted them from
the pdf format to the jpeg one.
Initially we have tried replicating the CIFAR benchmark results. We have also used a neural
network approach, based on RGB descriptors only; we obtained similar performances (85%).
However, when we tried to classify an image originating from a different image set (document
scans), the accuracy dropped significantly, by more than 10%. Therefore, we started experi-
menting with various combinations of RGB/c1c2c3/l1l2l3 descriptors. The results are described
in Table 1 [ 16 ] .
Search WWH ::

Custom Search