Information Technology Reference
In-Depth Information
In [12], the chi-square distribution is used to infer data distribution and to
promote feature selection. Indeed, other statistical tests can be used to infer data
distribution. In this chapter, as we will describe in the next section, we use a
statistical test to generate association rules. A performance comparison among
feature selection methods can be found in [22].
7.2.2
Association Rules
In this chapter we show how to use association rules for feature selection. As-
sociation rule mining is a descriptive task of data mining, where the goal is
to find relevant relationships among data items. It was initially motivated by
business applications, such as catalog design, store layout, and customer cate-
gorization [3]. However, finding associations has also been widely used in many
other applications such as data classification and summarization [20, 24].
The problem of mining association rules was first stated in [1] as follows. Let
I =
I is called an itemset.
Let R be a table with transactions t involving elements that are subsets of I .
An association rule is an expression of the form X
{
i 1 ,...,i n }
be a set of literals called items. A set X
Y ,where X and Y are
itemsets. X is called body or antecedent of the rule, and Y is called head or
consequent of the rule.
Let
be the total
number of occurrences of the itemset Z in transactions of relation R . Support
and confidence measures (Equations 7.1 and 7.2) are used to determine the rules
returned by the mining process.
|
R
|
be the number of transactions in relation R .Let
|
Z
|
Support = |
X
Y
|
(7.1)
|
R
|
Confidence = |
X
Y
|
(7.2)
|
X
|
The problem of mining association rules, as it was first stated, involves finding
rules that satisfy the restrictions of minimum support and minimum confidence
specified by the user.
Apriori [2] is one of the first and widely used association rule mining algo-
rithm. One drawback of the Apriori algorithm is its low performance because of
the successive dataset scans carried out by the algorithm. Some algorithms were
developed for speeding up the association rule mining. Examples of such algo-
rithms include Partition [26], FP-Growth [10] and Eclat [28]. These algorithms
were developed to mine the first and the simplest type of association rules, the
Boolean association rules, which are rules that correlate categorical (nominal)
data items. In [4, 23, 27] procedures for mining quantitative association rules
are presented. Quantitative association rules relate continuous-valued attributes.
In [29, 30] the problem of generating association rules correlating items from
Search WWH ::




Custom Search