Mining Frequent Patterns with Multiple Item Support Thresholds in Tourism Information Databases - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Mining Frequent Patterns with Multiple Item Support

Thresholds in Tourism Information Databases

Yi-Chun Chen, Grace Lin, Ya-Hui Chan, and Meng-Jung Shih *

Advanced Research Institute, Institute for Information Industry, Taipei, 105 Taiwan, R.O.C.

{divienchen,gracelin,yhchan,mengjungshih}@iii.org.tw

Abstract. Frequent pattern mining is an important model in data mining. Cer-

tain frequent patterns with low minimum support can provide useful informa-

tion in many real datasets. However, the predefined minimum support value as

a threshold needs to be set properly, or it may cause rare item problem. A too

high threshold causes missing of rare items, whereas a too low threshold causes

combinatorial explosion. In this paper, we proposed an improved FP-growth

based approach to solve the rare item problem with multiple item supports,

where each item has its own minimum support. Considering the difficulty of

setting appropriate thresholds for all items, an automatic tuning multiple item

support (MIS) approach is proposed, which is based on Central Limit Theorem.

A series of experimental results on various tourism information datasets shows

that the proposed approach can enhance frequent pattern mining with better ef-

ficiency and efficacy.

Keywords: frequent pattern mining, multiple item support, automatic turning

minimum support.

1 Introduction

Frequent patterns are an important class of regularities that exist in databases. Since it

was first introduced in [1], the problem of mining frequent patterns has received a great

deal of attention [3]. Most of the frequent pattern mining algorithms (e.g., Apriori [2] and

FP-growth [4]) use the single minimum support framework to discover complete set of

frequent patterns, where the setting of minimum support (min_sup) plays the key role to

this model's success. The frequent patterns discovered with this framework satisfy

downward closure property . That is, “all non-empty subsets of a frequent pattern must

also be frequent.” This property holds the key for minimizing the search space in all of

the single min_sup based frequent pattern mining algorithms [2, 3]. However, this algo-

rithm has a strong assumption that all items in the data are of the same nature and/or have

similar frequencies in the database. This is generally way from features of data in real

applications. In many applications, some items appear very frequently in the data, while

others rarely appear. A valuable part is on these rare items, which appear with low fre-

quency and may be missed easily. Frequent patterns containing rare items usually can

* Corresponding author.

Search WWH ::

Custom Search

Home