An Introduction to Frequent Pattern Mining - Frequent Pattern Mining

Database Reference

In-Depth Information

the typical frequent pattern mining methods are multi-pass methods. Multiple

passes are clearly not possible in the context of data streams [ 22 , 39 ].

The streaming scenario also presents numerous challenges in the context of data of

advanced types. For example, graph streams are often encountered in the context

of network data. In such cases, methods need to be designed for determining dense

groups of nodes in real time [ 16 ]. Methods for mining frequent items and itemsets

in data streams are discussed in Chap. 9.

3.2

Frequent Pattern Mining with Big Data

The big data scenario poses numerous challenges for the problem of frequent pattern

mining. A major problem arises when the data is large enough to be stored in a

distributed way. Therefore, significant costs are incurred in shuffling around data or

intermediate results of the mining process across the distributed nodes. These costs

are also referred to as data transfer costs. When data sets are very large, then the

algorithms need to designed to take into account both the disk access constraint and

the data transfer costs. In addition, many distributed frameworks such as MapReduce

[ 28 ] require specialized algorithms for frequent pattern mining. The focus of big-

data framework is somewhat different from streams, in that it is closely related to the

issue of shuffling large amounts of data around for the mining process. Interestingly,

it is sometimes easier to process the algorithms in a single pass in streaming fashion,

than when they have already been stored in distributed frameworks where access

costs become a major issue. Algorithms for frequent pattern mining with big data are

discussed in detail in Chap. 10. This chapter discusses both the parallel algorithms

and the big-data algorithms that are based on the MapReduce framework.

4

Frequent Pattern Mining with Advanced Data Types

although the frequent pattern mining problem is naturally defined on sets, it can be

extended to various advanced data types. The most natural extension of frequent

pattern mining algorithms is to the case of temporal data. This was one of the earliest

proposed extensions and is referred to as sequential pattern mining . Subsequently,

the problem has been generalized to other advanced data types, such as spatiotem-

poral data, graphs, and uncertain data. Many of the developed algorithms are basic

variations of the frequent pattern mining problem. In general, the basic frequent

pattern mining algorithms need to be modified carefully to address the variations

required by the advanced data types.

Search WWH ::

Custom Search

Home