Information Technology Reference
In-Depth Information
7.3
Feature Selection through Statistical Association
Rule Mining
A statistical association rule is a type of rule that shows an interesting relation-
ship among subsets of data based on the distribution of the quantitative values.
The term statistical association rule is given to any association rule whose gen-
eration process uses statistical tests to confirm its validity. The goal of working
with statistical association rules is that they do not require data discretization.
A discretization process often leads to a loss of information and can distort the
results of a mining algorithm.
Feature vectors describe the images quantitatively. Hence, a suitable approach
to find association rules should consider quantitative data. In this section we
present StARMiner (Statistical Association Rule Miner), a new algorithm for
statistical association rule mining. The goal of StARMiner is to find statistical
association rules to select a minimal set of features that preserves the ability of
discerning image according to their types. The method proposed here extends
the techniques of statistical association rule mining proposed in [4].
Let x j be a category of an image f i an image feature (attribute). The rules
returned by the StARMiner algorithm have the format x j
f i .StARMiner
only returns rules that satisfy the Condition 1 and Condition 2, as follows.
Condition 1. The feature f i must have a behavior in images from category x j
different from its behavior in images from all the other categories.
Condition 2. The feature f i must present a uniform behavior in every image
from category x j .
Conditions 1 and 2 are implemented in the StARMiner algorithm incorporat-
ing restrictions of interest in the mining process, in the way described as follows.
Let T be a dataset of medical images, x j an image category, T x j
T the subset
of images of category x j , f i the i th feature of the feature vector F ,and f i k the
value of feature f i in the image k .Let μ f i ( Z )and σ f i ( Z ) be, respectively, the
mean and standard deviation of the values of feature f i in the subset of images Z .
The algorithm uses three thresholds defined by the user: Δμ min - the minimum
allowed difference between the average of the feature f i in images from category
x j and the average of f i in the remaining dataset; σ max - the maximum standard
deviation of f i values allowed in a category and; γ min - the minimum confidence
to reject the hypothesis H 0. StARMiner mines rules of the form x j
f i ,ifthe
conditions given in Equations 7.7, 7.8 and 7.9 are satisfied.
μ f i ( V )= k∈V ( f i k )
|
(7.5)
V
|
( k∈V ( f i k
μ f i ( V )) 2
σ f i ( V )=
)
(7.6)
|
V
|
μ f i ( T x j )
μ f i ( T
T x j )
Δμ min
(7.7)
Search WWH ::




Custom Search