Databases Reference
In-Depth Information
A Compressed Vertical Binary Algorithm
for Mining Frequent Patterns
J. Hdez. Palancar, R. Hdez. Leon, J. Medina Pagola, and A. Hechavarrıa
Advanced Technologies Application Center (CENATAV), 7a 21812 e/ 218 y 222,
Rpto. Siboney, Playa, C.P. 12200, Ciudad de la Habana, Cuba
jpalancar@cenatav.co.cu , rhernandez@cenatav.co.cu , jmedina@cenatav.co.cu ,
ahechavarria@cenatav.co.cu
Summary. A new algorithm named Compressed Binary Mine (CBMine) for mining
association rules and frequent patterns is presented in this chapter. Its e ciency is
based on a compressed vertical binary representation of the database. CBMine was
compared with several a priori implementations, like Bodon's a priori algorithm, and
MAFIA, another vertical binary representation method. The experimental results
have shown that CBMine has significantly better performance, especially for sparse
databases.
1 Introduction
Mining association rules in transaction databases have been demonstrated to
be useful and technically feasible in several application areas [14,18,21] partic-
ularly in retail sales, and it becomes every day more important in applications
that use document databases [11, 16, 17]. Although research in this area has
been going on for more than one decade; today, mining such rules is still one
of the most popular methods in knowledge discovery and data mining.
Various algorithms have been proposed to discover large itemsets [2, 3, 6,
9, 11, 19]. Of all of them, a priori has had the biggest impact [3], since its
general conception has been the base for the development of new algorithms
to discover association rules.
Most of the previous algorithms adopt an a priori-like candidate set
generation-and-test approach. However, candidate set generation is still costly,
especially when there are many items, when the quantity of items by transac-
tion is high, or the minimum support threshold is quite low. These algorithms
need to scan the database several times to check the support of each can-
didate, and it is a time-consuming task for very sparse and huge databases.
The weak points of the a priori algorithm are these aspects: the candidate
generation and the count of each candidate support.
Performance of the algorithm could be significantly improved if we find a
way to reduce the computational cost of the tasks above mentioned.
J.H. Palancar et al.: A Compressed Vertical Binary Algorithm for Mining Frequent Patterns ,
Search WWH ::




Custom Search