Image Processing Reference
problem, the image processing techniques have been improved, combined, and extended
across a vast number of fields, like duplicate detection and copyright, creating image collec-
tions, medical applications, video surveillance and security, document analysis, face and print
recognition, industrial, military, and so on. The term of “content” implies that the images are
deconstructed into descriptors, which are analyzed and interpreted as image features, as op-
posed to image metadata, like annotations, geo-tags, file name, or camera properties (lash
light on/off, exposure, etc.).
The traditional CBIR approaches try to solve this problem by extracting a set of character-
istics from one image and comparing it with another one, representing a different image. The
results obtained until now are promising but still far from covering all the requirements risen
by a real world scenario. Also, the current approaches target specific problems in the image
processing context. Because of that most of the CBIR implementations work in a rather similar
way, on homogenous data. This causes significant performance drops whenever the test data
originate from a different area than the training set.
This chapter proposes a CBIR architecture model with descriptors originating in different
search areas. In order to be able to classify images originating in document scans, we have
added an extra module, responsible for the document image segmentation stage. The user is
ofered the possibility of querying the engine with both document scans and regular images
in order to retrieve the best N matches.
During the implementation stage we have faced multiple problems, as specified below:
• image preprocessing;
• extraction of characteristics from various spaces;
• implementation of a supervised machine learning module;
• document image segmentation;
• benchmarking the overall performance.
We have reached the conclusion that a CBIR engine can obtain beter results in the presence
of multiple sets of descriptors, from different search spaces or from the same one, even if the
test images originate in very different areas.
2 Related Work
The CBIR engines are trying to mimic the human behavior when executing a classification pro-
cess. This task is very difficult to accomplish due to a large series of factors.
• feature level (find images with X % red and Y % blue);
• semantic level (find images with churches);
• affective level (find images where a certain mood prevails). There is no complete solution
for the affective queries.
All the CBIR implementations use a vector of (global or local) characteristics which originate
in different search spaces—color, texture, or shape.
In the color space there are many models but recently the focus is set on various normaliz-
ations of the RGB one in the atempt of obtaining invariant elements. Two of the most inter-
esting ones are c1c2c3 (which eliminates both shadows and highlight areas) and l1l2l3 (which
• statistical (Tamura features, Haralick's co-occurrence matrices);
• geometrical (Voronoi tesselation features);