Database Reference
In-Depth Information
The second step of the algorithm computes the candidate set
C
1
with all
the 1-itemsets in
db
that are not in
L
1
.Weonlyhave
I
4
=
4
in this situation.
Since
I
4
is in both transactions in
db
,wehave
I
4
.s
db
=2
>
0
.
5
×
2, and thus,
I
4
will be added to
L
1
.
Finally, the updated support count is given in the following table, where
in light gray we indicate the items
I
with support less than
minsup
×
6:
Item
Count
1
4
2
3
3
2
4
3
5
1
6
1
The association analysis studied so far operates over the items in a
database of transactions. However, we have seen that dimension hierarchies
are a way of defining a hierarchy of concepts along which transaction items
can be classified. This leads to the notion of
hierarchical association
rules
. For example, in the Northwind data warehouse, products are organized
into categories. Assume now that in the original transaction database in our
example above, items
1
and
2
belong to category
A
,items
3
and
4
to category
B
, and items
5
and
6
to category
C
. The transaction table with the categories
instead of the items is given below:
TransactionId
Items
1000
{
A,A,B
}
2000
{
A,B
}
3000
{
A,B
}
4000
{
A,C,C
}
Suppose now that we require
minsup
= 75% over the items database, we
would obtain no rules as a result. However, aggregating items over categories,
like in the table above, would result in the rules
A
A
since
categories
A
and
B
have support larger than the minimum, namely, 1 and
0.75, respectively. That means we could not say that each time a given item
X
appears in the database, an item
Y
will appear, but we could say that each
time an item of category
A
appears, an item of category
B
will be present
too. This is called a hierarchical association rule. Note that combinations of
items at different granularities can also appear, for example, rules like “Each
time a given item
X
appears in the database, an item of a category
C
will
also appear.”
⇒
B
and
B
⇒
Search WWH ::
Custom Search