Advanced Analytical Theory and Methods: Association Rules - Data Science and Big Data Analytics

Database Reference

In-Depth Information

[1] "whole milk, cereals"

[2] "tropical fruit, other vegetables, white bread,

bottled water, chocolate"

[3] "citrus fruit, tropical fruit, whole milk, butter,

curd, yogurt, flour, bottled water, dishes"

[4] "beef"

[5] "frankfurter, rolls/buns, soda"

[6] "chicken, tropical fruit"

[7] "butter, sugar, fruit/vegetable juice, newspapers"

[8] "fruit/vegetable juice"

[9] "packaged fruit/vegetables"

[10] "chocolate"

[11] "specialty bar"

The next section shows how to generate frequent itemsets from the Groceries

dataset.

5.5.2 Frequent Itemset Generation

The apriori() function from the arule package implements the Apriori

algorithm to create frequent itemsets. Note that, by default, the apriori()

function executes all the iterations at once. However, to illustrate how the Apriori

algorithm works, the code examples in this section manually set the parameters of

the apriori() function to simulate each iteration of the algorithm.

Assume that the minimum support threshold is set to 0.02 based on management

discretion. Because the dataset contains 9,853 transactions, an itemset should

appear at least 198 times to be considered a frequent itemset. The first iteration

of the Apriori algorithm computes the support of each product in the dataset

and retains those products that satisfy the minimum support. The following code

identifies 59 frequent 1-itemsets that satisfy the minimum support. The

parameters of apriori() specify the minimum and maximum lengths of the

itemsets, the minimum support threshold, and the target indicating the type of

association mined.

itemsets <- apriori(Groceries, parameter=list(minlen=1,

maxlen=1,

support=0.02, target="frequent itemsets"))

parameter specification:

Search WWH ::

Custom Search

Home