Java Reference

In-Depth Information

Support:

(A
➔
B) = P(AB)

Transaction ID

Purchased Items

{milk, eggs, bread}

1

Confidence:

(A

2

{milk, cheese}

➔

B) = P(AB)/P(A)

3

{milk, bread}

Rule Length:

number of items in the rule

AB

4

{eggs, ham, ketchup}

C

Rule Length = 3

➔

milk

➔

bread:

Support = 2/4 = 50%

Confidence = 2/3 = 66%

bread
➔
milk:

Support = 2/4 = 50%

Confidence = 2/2 = 100%

Figure 4-4

Computing support and confidence of an association rule.

taking the number of times we saw both milk and bread (the support)

and dividing it by the number of transactions that have milk alone,

which is 3. This gives us a confidence of 66 percent (2/3). Intuitively,

we want to know that if we see milk, how likely are we to also see

bread. In the dataset provided, we see that 2 out of 3 times.

Let's turn this around for a moment. The association model also

contains the rule “bread implies milk.” You could ask, why is this

different? The support is the same as in the previous rule; however,

the confidence is 100 percent. This is because every time we saw

bread (2 times), we also saw milk, as in transactions 1 and 3.

Input data for association models typically comes in one of two

forms. Figure 4-5 illustrates a standard one-row-per-case format,

referred to as
single record case
, which has also been used for the other

mining functions above. In the association data, however, each pre-

dictor attribute indicates whether the product was purchased or not.

A “1” indicates the item was purchased in the transaction, a “0” indi-

cates it was not.

Another representation, more common for association data, is

often called “transactional format” and referred to as
multirecord
case

in JDM. In this format, we capture only the items purchased, as

opposed to those purchased and not purchased. Each item purchased

has a row in the table. The items are linked together by their transac-

tion or case identifier. This is illustrated in Figure 4-6.

Search WWH ::

Custom Search