A content-based image retrieval approach based on document queries - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

(8)

with the following notations:{( x (1) , y (1) ), ( x (2) , y (2) ), … , ( x ( m ) , y ( m ) )} is the (input, output) vector,

h θ ( x ) ∈ R K is the hypothesis function, L is the number of layers, s l is the number of neurons in

a specific layer l, K is the number of classes, with y ∈ R K , and Θ , the matrix which stores the

weights for each layer.

4 Experimental setup

We aimed to create a scalable application, portable between different architectures and oper-

ating systems. Therefore, in what regards the programming language, we have chosen python

over Matlab or Octave, for practical reasons, especially for the libraries which facilitate the

user interface generation, socket management, and data processing. The operating system is a

12.04 LTS 32 bit Ubuntu, running on a machine with two cores with hyper threading and 4 GB

of RAM. The software architecture is modular to facilitate any subsequent refactoring; each

sub-module is implemented in a class. In order to make beter use of the hardware, the mod-

ules that require resources intensively use multi-threading and multi-processing techniques.

The data are stored in a MySQL database, based on MyISAM. The application follows the

standard client-server architecture, in order to facilitate the exposure of functionalities to mul-

tiple users at once. So far, we have disregarded the user management problems.

We have restricted the number of recognized classes at 10 so far. The training was conducted

on a CIFAR data set, provided by Ref. [ 14 ] ; it includes 60,000 small (32 × 32 pixels) color im-

ages. The author's tests involved feed forward neural networks as well, with performances re-

volving around 87%.

The document scans data consisted of 1380 images, obtained from two sources:

• scans of old, degraded documents, used as a benchmark in the ICDAR 2007 conference [ 15 ] ;

• high quality copies, containing mostly manuals and documentation for the Ubuntu 12.04

operating system. In order to be able to use them, we have previously converted them from

the pdf format to the jpeg one.

Initially we have tried replicating the CIFAR benchmark results. We have also used a neural

network approach, based on RGB descriptors only; we obtained similar performances (85%).

However, when we tried to classify an image originating from a different image set (document

scans), the accuracy dropped significantly, by more than 10%. Therefore, we started experi-

menting with various combinations of RGB/c1c2c3/l1l2l3 descriptors. The results are described

in Table 1 [ 16 ] .

Search WWH ::

Custom Search

Home