Database Reference
In-Depth Information
5.7 Diagnostics
Although the Apriori algorithm is easy to understand and implement, some of the
rules generated are uninteresting or practically useless. Additionally, some of the
rules may be generated due to coincidental relationships between the variables.
Measures like confidence, lift, and leverage should be used along with human
insights to address this problem.
Another problem with association rules is that, in Phase 3 and 4 of the Data
Analytics Lifecycle (Chapter 2), the team must specify the minimum support prior
to the model execution, which may lead to too many or too few rules. In related
research, a variant of the algorithm [13] can use a predefined target range for the
number of rules so that the algorithm can adjust the minimum support accordingly.
Section 5.2 presented the Apriori algorithm, which is one of the earliest and the
most fundamental algorithms for generating association rules. The Apriori
algorithm reduces the computational workload by only examining itemsets that
meet the specified minimum threshold. However, depending on the size of the
dataset, the Apriori algorithm can be computationally expensive. For each level of
support, the algorithm requires a scan of the entire database to obtain the result.
Accordingly, as the database grows, it takes more time to compute in each run. Here
are some approaches to improve Apriori's efficiency:
Partitioning: Any itemset that is potentially frequent in a transaction
database must be frequent in at least one of the partitions of the transaction
database.
Sampling: This extracts a subset of the data with a lower support
threshold and uses the subset to perform association rule mining.
Transaction reduction: A transaction that does not contain frequent
k -itemsets is useless in subsequent scans and therefore can be ignored.
Hash-based itemset counting: If the corresponding hashing bucket
count of a k -itemset is below a certain threshold, the k -itemset cannot be
frequent.
Dynamic itemset counting: Only add new candidate itemsets when all
of their subsets are estimated to be frequent.
Search WWH ::




Custom Search