Information Technology Reference
In-Depth Information
Similar to above discuss, FDM-LUP and FDM-LPP all can obtain candidate
itemset less than FDM-LP. However, they request more storage and
communication messages for local support counts. Their efficiency mainly depen
on the data distribution.
12.7 Parallel Mining of Association Rules
Data mining researchers expect parallelism to relieve mining methods from the
sequential bottleneck, providing scalability to massive data sets and improving
response time. The main challenges include synchronization and communication
minimization, workload balancing, finding good data layout and data
decomposition, and disk I/O minimization The parallel design space spans three
main components: the hardware platform, the type of parallelism, and the
load-balancing strategy.
Table 12.3 Parallel algorithm for association rules (Zaki, 1999)
ALGORITHM S
CHARACTERISTIC
Count Distribution(CD)
Apriori-base
PEAR
Candidate prefix tree
Hash table for 2-itemsets, parallel candidate
generation
PDM
NPA
Only master does sum reduction
FDM
Local and global pruning, count polling
FPM
Local and global pruning, skewness handling
CCPD
Shared memory
Data Distribution
Exchange full database per iteration
SPA
Same as Data Distribution
IDD
Ring-based broadcast, item-based candidate
partitioning
PCCD
Shared memory (logical database exchange)
Hybrid Distribution
Combines Count and Data Distribution
Candidate Distribution
Selectively replicated database, asynchronous
HPA
No database replication, exchange itemsets
HPA-ELD
Replicate frequent itemsets
ParEclat
Eclat-based, asynchronous, hierarchical
ParMaxEclat
MaxEclat-based, asynchronous, hierarchical
ParClique
Clique-based, asynchronous, hierarchical
ParMaxClique
MaxClique-based, asynchronous, hierarchical
APM
DIC-based, shared memory, asynchronous
PPAR
Partition-based, horizontal database
Search WWH ::




Custom Search