A Compressed Vertical Binary Algorithm for Mining Frequent Patterns - Data Mining: Foundations and Practice

Databases Reference

In-Depth Information

A Compressed Vertical Binary Algorithm

for Mining Frequent Patterns

J. Hdez. Palancar, R. Hdez. Leon, J. Medina Pagola, and A. Hechavarrıa

Advanced Technologies Application Center (CENATAV), 7a 21812 e/ 218 y 222,

Rpto. Siboney, Playa, C.P. 12200, Ciudad de la Habana, Cuba

jpalancar@cenatav.co.cu , rhernandez@cenatav.co.cu , jmedina@cenatav.co.cu ,

ahechavarria@cenatav.co.cu

Summary. A new algorithm named Compressed Binary Mine (CBMine) for mining

association rules and frequent patterns is presented in this chapter. Its e ciency is

based on a compressed vertical binary representation of the database. CBMine was

compared with several a priori implementations, like Bodon's a priori algorithm, and

MAFIA, another vertical binary representation method. The experimental results

have shown that CBMine has significantly better performance, especially for sparse

databases.

1 Introduction

Mining association rules in transaction databases have been demonstrated to

be useful and technically feasible in several application areas [14,18,21] partic-

ularly in retail sales, and it becomes every day more important in applications

that use document databases [11, 16, 17]. Although research in this area has

been going on for more than one decade; today, mining such rules is still one

of the most popular methods in knowledge discovery and data mining.

Various algorithms have been proposed to discover large itemsets [2, 3, 6,

9, 11, 19]. Of all of them, a priori has had the biggest impact [3], since its

general conception has been the base for the development of new algorithms

to discover association rules.

Most of the previous algorithms adopt an a priori-like candidate set

generation-and-test approach. However, candidate set generation is still costly,

especially when there are many items, when the quantity of items by transac-

tion is high, or the minimum support threshold is quite low. These algorithms

need to scan the database several times to check the support of each can-

didate, and it is a time-consuming task for very sparse and huge databases.

The weak points of the a priori algorithm are these aspects: the candidate

generation and the count of each candidate support.

Performance of the algorithm could be significantly improved if we find a

way to reduce the computational cost of the tasks above mentioned.

J.H. Palancar et al.: A Compressed Vertical Binary Algorithm for Mining Frequent Patterns ,

Search WWH ::

Custom Search

Home