Information Technology Reference
In-Depth Information
Example 1. Consider the database shown in Table 1. At high min_sup, say min_sup =
4, we will miss the frequent patterns involving the rare items {221} and {222}. In
order to exploit the frequent patterns containing {221} and {222}, we have to specify
low min_sup.
2.2
Extended Version of Frequent Patterns
In order to face the rare item problem, B. Liu et al. [7] proposed the extended version
of mining frequent patterns with “multiple min_sup framework” to solve this prob-
lem. In this extended version, each item in the transaction database is specified with a
support constraint known as minimum item support (MIS, in short). Moreover, the
min_sup of a itemset is represented with minimal MIS value among all its items.
Xmin , ,…,
(1)
where ,…, , 1kn , is a pattern and MIS( ), 1jk , means the
MIS of an item ∈X .
This extended model enables users to generate rare item rules without causing
frequent items to generate too many meaningless patterns. However, in real-world
applications, users cannot specify applicable min_sup at once and always tune MIS of
each item constantly. It is very time-consuming and costly because it must rescan
database many times. Therefore, in this paper, an automatic tuning MIS approach is
proposed. The concept of the Central Limit Theorem is utilized to specify MIS
values automatically. We clarify the Central Limit Theorem in Theorem 1.
Theorem 1. (Central Limit Theorem)
Given certain conditions, the arithmetic mean of a sufficiently large number of
iterates of independent random variables, each with a well-defined expected value and
variance, will be approximately normally distributed. [8]
That is, suppose that a sample is obtained containing a large number of observa-
tions, the central limit theorem says that the computed values of the average will be
distributed according to the normal distribution. Moreover, in our ontology, it can
divide into different taxonomies and each of them is independent of each other. Ac-
cording to the 68-95-99.7 rule which means nearly all values lie within three standard
deviations of the mean in a normal distribution, we can know that if MIS value =
mean-standard deviation, approximately 84% item combinations will be found.
Therefore, it can avoid that occur combinatorial explosion, producing too many mea-
ningless frequent patterns. We define the MIS value of each item as follows:
(2)
MIS
(3)
means the average frequency of the items ,…, , where items ,…, be-
long to the same level nodes of each category in the ontology.
Search WWH ::




Custom Search