A content-based image retrieval approach based on document queries - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

problem, the image processing techniques have been improved, combined, and extended

across a vast number of fields, like duplicate detection and copyright, creating image collec-

tions, medical applications, video surveillance and security, document analysis, face and print

recognition, industrial, military, and so on. The term of “content” implies that the images are

deconstructed into descriptors, which are analyzed and interpreted as image features, as op-

posed to image metadata, like annotations, geo-tags, file name, or camera properties (lash

light on/off, exposure, etc.).

The traditional CBIR approaches try to solve this problem by extracting a set of character-

istics from one image and comparing it with another one, representing a different image. The

results obtained until now are promising but still far from covering all the requirements risen

by a real world scenario. Also, the current approaches target specific problems in the image

processing context. Because of that most of the CBIR implementations work in a rather similar

way, on homogenous data. This causes significant performance drops whenever the test data

originate from a different area than the training set.

This chapter proposes a CBIR architecture model with descriptors originating in different

search areas. In order to be able to classify images originating in document scans, we have

added an extra module, responsible for the document image segmentation stage. The user is

ofered the possibility of querying the engine with both document scans and regular images

in order to retrieve the best N matches.

During the implementation stage we have faced multiple problems, as specified below:

• image preprocessing;

• extraction of characteristics from various spaces;

• implementation of a supervised machine learning module;

• document image segmentation;

• benchmarking the overall performance.

We have reached the conclusion that a CBIR engine can obtain beter results in the presence

of multiple sets of descriptors, from different search spaces or from the same one, even if the

test images originate in very different areas.

2 Related Work

The CBIR engines are trying to mimic the human behavior when executing a classification pro-

cess. This task is very difficult to accomplish due to a large series of factors.

The CBIR queries may take place at different levels [ 1 ] :

• feature level (find images with X % red and Y % blue);

• semantic level (find images with churches);

• affective level (find images where a certain mood prevails). There is no complete solution

for the affective queries.

All the CBIR implementations use a vector of (global or local) characteristics which originate

in different search spaces—color, texture, or shape.

In the color space there are many models but recently the focus is set on various normaliz-

ations of the RGB one in the atempt of obtaining invariant elements. Two of the most inter-

esting ones are c1c2c3 (which eliminates both shadows and highlight areas) and l1l2l3 (which

eliminates only the shadows, but keeps the highlight areas) [ 2 ] .

There are four large categories for determining the texture descriptors [ 3 ] :

• statistical (Tamura features, Haralick's co-occurrence matrices);

• geometrical (Voronoi tesselation features);

• spectral (wavelets [ 4 ] , Gabor and ICA filters);

Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Search WWH ::

Custom Search

Home