Mining Statistical Association Rules to Select the Most Relevant Medical Image Features - Mining Complex Data

Information Technology Reference

In-Depth Information

multiple databases was dealt with. In [25] an algorithm for mining association

rules in data warehouses was presented.

Mining association rules in image datasets has been a great challenge. Proce-

dures of association rule mining do not produce interesting results by themselves.

Images should be previously pre-processed by image processing algorithms to

produce the image data that is submitted to the mining processes.

7.2.3

Content-Based Retrieval Evaluation

When working with content-based retrieval, performing exact searches on image

datasets are not useful, since searching for the same data already under analysis

has very few applications. Therefore, the retrieval of complex data is mainly

performed regarding similarity. The most well-known and useful types of simi-

larity queries are the k -nearest neighbor (for instance: “given the Thorax-XRay

of John Doe, find the five images most similar to it from the image database”),

and range queries (for instance: “given the Thorax-XRay of John Doe, find the

images that differ from it up to three units”). Similarity search is performed

comparing the feature vectors using a distance function to quantify how close

(or similar) each pair of vectors is.

This chapter is focused on medical images, more specifically on the feature

vectors employed to compare and retrieve the images by similarity. The moti-

vation is to reduce the usually large number of extracted features, because for

PACS and CAD systems, it is usual to gather as many image characteristics as

possible, leading to high-dimensional feature vectors, which encompasses much

redundant information. Consequently, it is necessary to sift the features that

keep the most meaningful information. Notice that the proposed approach can

be straightforwardly extended to work on other types of complex data beyond

images, since similarity queries are generally the most suitable for complex data.

In this chapter, we present a technique that uses association rules to improve

the content-based image retrieval on medical domain. One important issue re-

lated to CBIR systems consists on how to evaluate their ecacy. A standard

approach to evaluate the accuracy of the similarity queries is the precision and

recall (P&R) graph [5]. Precision and recall are defined in Equation 7.3 and

Equation 7.4.

P recision = TRS

TS

(7.3)

TRS

TR

Recall =

(7.4)

In Equations 7.3 and 7.4, TR is the total number of relevant images for a given

query; TRS is the number of relevant images actually returned in the query, and

TS is the total number of images returned in the query. In our experiments we

use precision and recall (P&R) curves in order to analyze our proposed algorithm

StARMiner.

Mining Complex Data

Search WWH ::

Custom Search

Home