Java Reference
In-Depth Information
(A B) = P(AB)
Transaction ID
Purchased Items
{milk, eggs, bread}
{milk, cheese}
B) = P(AB)/P(A)
{milk, bread}
Rule Length:
number of items in the rule
{eggs, ham, ketchup}
Rule Length = 3
Support = 2/4 = 50%
Confidence = 2/3 = 66%
bread milk:
Support = 2/4 = 50%
Confidence = 2/2 = 100%
Figure 4-4
Computing support and confidence of an association rule.
taking the number of times we saw both milk and bread (the support)
and dividing it by the number of transactions that have milk alone,
which is 3. This gives us a confidence of 66 percent (2/3). Intuitively,
we want to know that if we see milk, how likely are we to also see
bread. In the dataset provided, we see that 2 out of 3 times.
Let's turn this around for a moment. The association model also
contains the rule “bread implies milk.” You could ask, why is this
different? The support is the same as in the previous rule; however,
the confidence is 100 percent. This is because every time we saw
bread (2 times), we also saw milk, as in transactions 1 and 3.
Input data for association models typically comes in one of two
forms. Figure 4-5 illustrates a standard one-row-per-case format,
referred to as single record case , which has also been used for the other
mining functions above. In the association data, however, each pre-
dictor attribute indicates whether the product was purchased or not.
A “1” indicates the item was purchased in the transaction, a “0” indi-
cates it was not.
Another representation, more common for association data, is
often called “transactional format” and referred to as multirecord case
in JDM. In this format, we capture only the items purchased, as
opposed to those purchased and not purchased. Each item purchased
has a row in the table. The items are linked together by their transac-
tion or case identifier. This is illustrated in Figure 4-6.
Search WWH ::

Custom Search