Mining Statistical Association Rules to Select the Most Relevant Medical Image Features - Mining Complex Data

Information Technology Reference

In-Depth Information

In [12], the chi-square distribution is used to infer data distribution and to

promote feature selection. Indeed, other statistical tests can be used to infer data

distribution. In this chapter, as we will describe in the next section, we use a

statistical test to generate association rules. A performance comparison among

feature selection methods can be found in [22].

7.2.2

Association Rules

In this chapter we show how to use association rules for feature selection. As-

sociation rule mining is a descriptive task of data mining, where the goal is

to find relevant relationships among data items. It was initially motivated by

business applications, such as catalog design, store layout, and customer cate-

gorization [3]. However, finding associations has also been widely used in many

other applications such as data classification and summarization [20, 24].

The problem of mining association rules was first stated in [1] as follows. Let

I =

I is called an itemset.

Let R be a table with transactions t involving elements that are subsets of I .

An association rule is an expression of the form X

{

i 1 ,...,i n }

be a set of literals called items. A set X

∈

Y ,where X and Y are

itemsets. X is called body or antecedent of the rule, and Y is called head or

consequent of the rule.

Let

→

be the total

number of occurrences of the itemset Z in transactions of relation R . Support

and confidence measures (Equations 7.1 and 7.2) are used to determine the rules

returned by the mining process.

|

R

|

be the number of transactions in relation R .Let

|

Z

|

Support = |

X

∪

Y

|

(7.1)

|

R

|

Confidence = |

X

∪

Y

|

(7.2)

|

X

|

The problem of mining association rules, as it was first stated, involves finding

rules that satisfy the restrictions of minimum support and minimum confidence

specified by the user.

Apriori [2] is one of the first and widely used association rule mining algo-

rithm. One drawback of the Apriori algorithm is its low performance because of

the successive dataset scans carried out by the algorithm. Some algorithms were

developed for speeding up the association rule mining. Examples of such algo-

rithms include Partition [26], FP-Growth [10] and Eclat [28]. These algorithms

were developed to mine the first and the simplest type of association rules, the

Boolean association rules, which are rules that correlate categorical (nominal)

data items. In [4, 23, 27] procedures for mining quantitative association rules

are presented. Quantitative association rules relate continuous-valued attributes.

In [29, 30] the problem of generating association rules correlating items from

Mining Complex Data

Search WWH ::

Custom Search

Home