Database Reference
In-Depth Information
Table 3.3 The transaction
database T
Transaction ID
Items in transaction
in Example 3
10
a , b , c , d , f
20
b , c , d , f , g , h
30
a , c , d , e , f
40
c , e , f , g
Table 3.4 The profit of each
item in Example 3
Item
a
b
c
d
e
f
g
h
20
30
10
Value
40
0
10
30
20
3
Pushing More Constraints in Pattern-Growth Mining
Frequent pattern mining often generates a large number of frequent itemsets and
rules, which reduces not only the efficiency but also the effectiveness of mining
since users have to sift through a large number of mined rules to find useful ones.
Recent work has highlighted the importance of the paradigm of constraint-based
mining: the user is allowed to express his focus in mining, by means of a rich class
of constraints that capture application semantics. Besides allowing user exploration
and control, the paradigm allows many of these constraints to be pushed deep inside
mining, thus pruning the search space and achieving high performance.
Previous studies [ 6 , 15 , 20 , 25 ] have identified three classes of constraints, anti-
monotone, monotone , and succinct , which can be pushed deep in frequent itemset
mining. While these cover a large class of useful constraints, many other useful and
natural constraints remain. For example, consider the constraints avg ( S ) θ v , and
sum ( S ) θ v ( θ
). The first is neither anti-monotone, nor monotone, nor
succinct. The second is anti-monotone when θ is
∈{≥
,
≤}
and all items have non-negative
values . But if S can contain items of arbitrary values, the constraint is rather like the
first one. This means these constraints are hard to optimize.
With the development of frequent pattern growth method, databases can be pro-
jected and partitioned in an organized way, as well as the patterns to be searched for.
Thus some constraints which are hard to optimize under the Apriori mining frame-
work can be optimized with the frequent pattern growth method. Let's examine one
example.
Example 3 Let Table 3.3 be our transaction database
T
, with a set of items I
=
{
2. Also, let
each item have an attribute value (such as profit ), with the concrete value shown in
Table 3.4 .
The constraint C avg
a , b , c , d , e , f , g , h
}
. Let the support threshold be min _ suppor t
=
avg ( S )
25 is not anti-monotone (nor monotone, nor
succinct). For example, avg ( df )
30) / 2 < 25, violates the constraint.
However, upon adding one more item a , avg ( adf )
=
(10
+
=
(40
+
10
+
30) / 3
25, adf
satisfies C avg .
This example shows that a constraint like avg ( S )
v cannot be pushed deep into
the Apriori mining algorithm because the subsets (supersets) of a valid itemset could
 
Search WWH ::




Custom Search