Information Technology Reference
In-Depth Information
7.2
Background
Image mining has been the target of many researches in the field of data mining
and information retrieval in current years. A major challenge of the image mining
field is to effectively relate low-level features (automatically extracted from image
pixels) to high-level semantics based on the human perception.
According to [11], researches in image mining can be generally classified
into two main directions: domain-specific and general-purpose directions. The
domain-specific direction focuses on image processing techniques, where the goal
is to process the image and to extract the features that best contribute to dif-
ferentiate images from different types. The general-purpose direction focuses on
mining algorithms that aim at reducing the semantic gap between high-level
human perception of images and low-level image feature representation. Indeed,
general-purpose techniques work improving the accuracy of specific-domain tech-
niques, working in a complementary way.
Mining images demands the extraction of their main features regarding spe-
cific criteria. After extracted, the feature vector and the image descriptions are
submitted to the mining process.
When working with image databases, high-level data manually supplied by
domain experts can also be employed together with low-level features in image
mining processes. However, with the growing of large-scale image repositories,
manual annotation of images has become unfeasible because of its inherent prob-
lems of subjectivity, non-scalability, and non-uniformity of vocabulary. CBIR sys-
tems are proposed to overcome these limitations, where the most similar images
of a given one are retrieved based on comparisons of visual features (automati-
cally extracted from images). The retrieved images can be employed to label a
new one or, in case of medical images, to help the decision making process of
diagnosing a new image.
In this section, two tasks of data mining are discussed: feature selection and
association rule mining. We concentrate on applying these techniques to extract
patterns from images and to improve content-based search in medical image
databases. In this section, we also describe how to evaluate the results of a CBIR
system using the precision versus recall curves (P&R). We employed P&R graphs
to evaluate our experiments.
7.2.1
Feature Selection
Dimensionality reduction (also called dimension reduction) is the process of re-
ducing the number of features (attributes) used to represent a dataset under
consideration. Dimensionality reduction approaches diminish the feature vector
size by removing redundant, correlated and noisy data. In most cases, dimension-
ality reduction speeds up the processing of data mining algorithms and improves
their accuracy.
The dimensionality reduction approaches are also classified into feature selec-
tion and feature transformation approaches. There is no agreement about the
 
Search WWH ::




Custom Search