Information Technology Reference
In-Depth Information
Mining Frequent Patterns with Multiple Item Support
Thresholds in Tourism Information Databases
Yi-Chun Chen, Grace Lin, Ya-Hui Chan, and Meng-Jung Shih *
Advanced Research Institute, Institute for Information Industry, Taipei, 105 Taiwan, R.O.C.
{divienchen,gracelin,yhchan,mengjungshih}@iii.org.tw
Abstract. Frequent pattern mining is an important model in data mining. Cer-
tain frequent patterns with low minimum support can provide useful informa-
tion in many real datasets. However, the predefined minimum support value as
a threshold needs to be set properly, or it may cause rare item problem. A too
high threshold causes missing of rare items, whereas a too low threshold causes
combinatorial explosion. In this paper, we proposed an improved FP-growth
based approach to solve the rare item problem with multiple item supports,
where each item has its own minimum support. Considering the difficulty of
setting appropriate thresholds for all items, an automatic tuning multiple item
support (MIS) approach is proposed, which is based on Central Limit Theorem.
A series of experimental results on various tourism information datasets shows
that the proposed approach can enhance frequent pattern mining with better ef-
ficiency and efficacy.
Keywords: frequent pattern mining, multiple item support, automatic turning
minimum support.
1 Introduction
Frequent patterns are an important class of regularities that exist in databases. Since it
was first introduced in [1], the problem of mining frequent patterns has received a great
deal of attention [3]. Most of the frequent pattern mining algorithms (e.g., Apriori [2] and
FP-growth [4]) use the single minimum support framework to discover complete set of
frequent patterns, where the setting of minimum support (min_sup) plays the key role to
this model's success. The frequent patterns discovered with this framework satisfy
downward closure property . That is, “all non-empty subsets of a frequent pattern must
also be frequent.” This property holds the key for minimizing the search space in all of
the single min_sup based frequent pattern mining algorithms [2, 3]. However, this algo-
rithm has a strong assumption that all items in the data are of the same nature and/or have
similar frequencies in the database. This is generally way from features of data in real
applications. In many applications, some items appear very frequently in the data, while
others rarely appear. A valuable part is on these rare items, which appear with low fre-
quency and may be missed easily. Frequent patterns containing rare items usually can
* Corresponding author.
 
Search WWH ::




Custom Search