Database Reference
In-Depth Information
the typical frequent pattern mining methods are multi-pass methods. Multiple
passes are clearly not possible in the context of data streams [ 22 , 39 ].
The streaming scenario also presents numerous challenges in the context of data of
advanced types. For example, graph streams are often encountered in the context
of network data. In such cases, methods need to be designed for determining dense
groups of nodes in real time [ 16 ]. Methods for mining frequent items and itemsets
in data streams are discussed in Chap. 9.
3.2
Frequent Pattern Mining with Big Data
The big data scenario poses numerous challenges for the problem of frequent pattern
mining. A major problem arises when the data is large enough to be stored in a
distributed way. Therefore, significant costs are incurred in shuffling around data or
intermediate results of the mining process across the distributed nodes. These costs
are also referred to as data transfer costs. When data sets are very large, then the
algorithms need to designed to take into account both the disk access constraint and
the data transfer costs. In addition, many distributed frameworks such as MapReduce
[ 28 ] require specialized algorithms for frequent pattern mining. The focus of big-
data framework is somewhat different from streams, in that it is closely related to the
issue of shuffling large amounts of data around for the mining process. Interestingly,
it is sometimes easier to process the algorithms in a single pass in streaming fashion,
than when they have already been stored in distributed frameworks where access
costs become a major issue. Algorithms for frequent pattern mining with big data are
discussed in detail in Chap. 10. This chapter discusses both the parallel algorithms
and the big-data algorithms that are based on the MapReduce framework.
4
Frequent Pattern Mining with Advanced Data Types
although the frequent pattern mining problem is naturally defined on sets, it can be
extended to various advanced data types. The most natural extension of frequent
pattern mining algorithms is to the case of temporal data. This was one of the earliest
proposed extensions and is referred to as sequential pattern mining . Subsequently,
the problem has been generalized to other advanced data types, such as spatiotem-
poral data, graphs, and uncertain data. Many of the developed algorithms are basic
variations of the frequent pattern mining problem. In general, the basic frequent
pattern mining algorithms need to be modified carefully to address the variations
required by the advanced data types.
Search WWH ::




Custom Search