Information Technology Reference
In-Depth Information
7.3
Feature Selection through Statistical Association
Rule Mining
A statistical association rule is a type of rule that shows an interesting relation-
ship among subsets of data based on the distribution of the quantitative values.
The term statistical association rule is given to any association rule whose gen-
eration process uses statistical tests to confirm its validity. The goal of working
with statistical association rules is that they do not require data discretization.
A discretization process often leads to a loss of information and can distort the
results of a mining algorithm.
Feature vectors describe the images quantitatively. Hence, a suitable approach
to find association rules should consider quantitative data. In this section we
present StARMiner (Statistical Association Rule Miner), a new algorithm for
statistical association rule mining. The goal of StARMiner is to find statistical
association rules to select a minimal set of features that preserves the ability of
discerning image according to their types. The method proposed here extends
the techniques of statistical association rule mining proposed in [4].
Let
x
j
be a category of an image
f
i
an image feature (attribute). The rules
returned by the StARMiner algorithm have the format
x
j
f
i
.StARMiner
only returns rules that satisfy the Condition 1 and Condition 2, as follows.
→
Condition 1.
The feature
f
i
must have a behavior in images from category
x
j
different from its behavior in images from all the other categories.
Condition 2.
The feature
f
i
must present a uniform behavior in every image
from category
x
j
.
Conditions 1 and 2 are implemented in the StARMiner algorithm incorporat-
ing restrictions of interest in the mining process, in the way described as follows.
Let
T
be a dataset of medical images,
x
j
an image category,
T
x
j
∈
T
the subset
of images of category
x
j
,
f
i
the
i
th
feature of the feature vector
F
,and
f
i
k
the
value of feature
f
i
in the image
k
.Let
μ
f
i
(
Z
)and
σ
f
i
(
Z
) be, respectively, the
mean and standard deviation of the values of feature
f
i
in the subset of images
Z
.
The algorithm uses three thresholds defined by the user:
Δμ
min
- the minimum
allowed difference between the average of the feature
f
i
in images from category
x
j
and the average of
f
i
in the remaining dataset;
σ
max
- the maximum standard
deviation of
f
i
values allowed in a category and;
γ
min
- the minimum confidence
to reject the hypothesis
H
0. StARMiner mines rules of the form
x
j
→
f
i
,ifthe
conditions given in Equations 7.7, 7.8 and 7.9 are satisfied.
μ
f
i
(
V
)=
k∈V
(
f
i
k
)
|
(7.5)
V
|
(
k∈V
(
f
i
k
−
μ
f
i
(
V
))
2
σ
f
i
(
V
)=
)
(7.6)
|
V
|
μ
f
i
(
T
x
j
)
−
μ
f
i
(
T
−
T
x
j
)
≥
Δμ
min
(7.7)