Information Technology Reference
In-Depth Information
reveal information with high profitable potential for further decision support. This kind of
phenomenon is often seen in consumer market. The necessities, consumables and low-
price products such as bread or jam are bought frequently, but luxury goods, electric
appliance and high-price such as bed or fridge are generally bought once for a long pe-
riod. In such a situation, if the min_sup is set too high, all of the frequent patterns con-
taining rare items will be missed. On the other hand, if the min_sup is set too low, many
meaningless frequent patterns will be included and misleading focus. The problem to find
mining frequent patterns containing both frequent and rare items with “single min_sup
framework” is so called as the rare item problem .
For addressing rare item problem, B. Liu et al.[7] proposed mining frequent patterns
with “multiple min_sup framework”, with which each pattern can satisfy a different
min_sup depending upon the items within it. The frequent patterns discovered through
“multiple min_sup framework” do not satisfy downward closure property and therefore
deemed not applicable for minimizing the search space in multiple min_sup based fre-
quent pattern mining algorithms. An Apriori-based algorithm known as Multiple Support
Apriori (MSApriori) was proposed in this literature to find frequent patterns with “mul-
tiple min_sup framework”[7]. Also, two FP-growth based algorithms, Conditional Fre-
quent Pattern-growth (CFP-growth) [5] and CFP-growth++ [6], have been proposed to
mine frequent patterns. Since downward closure property no longer holds in “multiple
min_sup framework,” the CFP-growth algorithm [5] has to carry out exhaustive search in
the constructed tree structure. In [6], it proposed an improved CFP-growth approach,
CFP-growth++, by introducing four pruning techniques to reduce the search space. Mul-
tiple min_sup framework is reasonable but brings another question: users generally have
trouble specifying “good” support value for each item--they need constantly tune the
support value. Updating items' min_sup is a costly work—it takes time and efforts to
scan database and then to execute the mining algorithm once again—and thus not appro-
priate to rerun so frequently. Therefore, it is necessary to design a maintenance approach
for specifying min_sup automatically.
This study proposes an automatic tuning multiple item support (MIS) approach,
with an illustrative example on tourism information database. We firstly categorized
the data from the tourism information database to five classes, who, what, when,
where and why, according to predefined ontology structure which constructed by
domain experts. Fig. 1 illustrates the ontology schema applied in this case. As a gen-
eral database in real world, tourism information database contains abundant data of
popular sites on holidays, with significantly less records on items not so popular—
frequencies of items vary widely and it is a rare item problem. Moreover, the cost of
tuning multiple minimum supports is far higher than those of tuning single minimum
support, because it is to tune all min_sup thresholds for all different items, which is
large in number. Therefore, we developed an improved CFP-growth++ approach to
solve the rare item problem with multiple item supports. An automatic tuning MIS
approach is designed based on the Central Limit Theorem [8]. A series of experimen-
tal results on various tourism information datasets shows that the proposed approach
can enhance frequent pattern mining with better efficiency and efficacy.
The remaining of the paper is organized as follows. In section 2, we explain the
necessary background. In section 3, we proposed improved CFP-growth++ approach
Search WWH ::




Custom Search