Mining Statistical Association Rules to Select the Most Relevant Medical Image Features - Mining Complex Data

Information Technology Reference

In-Depth Information

mechanisms to store and retrieve them. Therefore, content-based image retrieval

(CBIR) techniques have been intensively investigated in the last years [16].

CBIR techniques rely on image processing algorithms to extract relevant char-

acteristics (features) from the images. The characteristics are grouped into fea-

ture vectors, which are stored and organized by indexing structures aiming at

achieving fast and e cient image retrieval. Generally, CBIR techniques use in-

trinsic visual features of images, such as color, shape and texture [13] yielding

vectors with hundreds or even thousands of features. Unlike one would think,

having a large number of features actually represents a problem. As the number

of the extracted features grows, the process of storing, indexing, retrieving, and

comparing them becomes more and more time consuming. Moreover, in several

situations, many features are correlated, meaning that they bring redundant in-

formation about the images that can deteriorate the ability of the system to

correctly distinguish them. The large number of features leads CBIR systems to

face the problem known as the “dimensionality curse” [17]. Beyer [7] has proved,

as the number of features increases, the significance of each feature tends to

diminish. Hence, it is important to keep the number of features as low as pos-

sible, establishing a tradeoff between the representation power and the feature

vector size.

Image features are also commonly employed in the classification task. A signif-

icant example is the classification of tumor masses detected in mammograms as

benign or malignant. Initially, the radiologist classifies the images based on the

shape of the lesion. Malignant tumors generally infiltrate the surrounding tissue,

resulting in an irregular or hardly-distinguishable contour, while benign masses

have a smooth contour. Figure 7.1 illustrates two examples of tumor masses.

This chapter discusses how to apply techniques of mining statistical asso-

ciation rules to improve content-based image retrieval in medical domain. We

present a new algorithm (the StARMiner - S tatistical A ssociation R ule M iner)

to determine a minimal set of representative features. The algorithm uses sta-

tistical measurements, which describe the behavior of the features considering

the image categories, to find representative rules. We compare the ecacy of

StARMiner and other well- known feature selection algorithms, Relief-F and

DTM (Decision Tree Method) in the task of feature selection using a case

study.

Fig. 7.1. Typical breast tumor masses: benign (left) and malignant (right)

Search WWH ::

Custom Search

Home