Databases Reference
In-Depth Information
satisfy the constraint. Similarly, if items in a transaction are added to an itemset in
price-descending order, it becomes monotonic, because if the itemset satisfies the con-
straint (i.e., with an average price no greater than $10), then adding cheaper items into
the current itemset will still make the average price no greater than $10. Aside from
avg
/ v ,” given in Table 7.2, there are many other convertible
constraints such as “ variance
.
S
/ v ” and “ avg
.
S
/ v ,” and so on.
Note that the previous discussion does not imply that every constraint is convertible.
For example, “ sum
.
S
/ v ” “ standard deviation
.
S
2f , g and each element in S could be of any
real value, is not convertible. Therefore, there is yet a fifth category of constraints, called
inconvertible constraints . The good news is that although there still exist some tough
constraints that are not convertible, most simple SQL expressions with built-in SQL
aggregates belong to one of the first four categories to which efficient constraint mining
methods can be applied.
.
S
/
v ,” where
Pruning Data Space with Data Pruning Constraints
The second way of search space pruning in constraint-based frequent pattern mining
is pruning data space . This strategy prunes pieces of data if they will not contribute to
the subsequent generation of satisfiable patterns in the mining process. We consider two
properties: data succinctness and data antimonotonicity .
Constraints are data-succinct if they can be used at the beginning of a pattern mining
process to prune the data subsets that cannot satisfy the constraints. For example, if a
mining query requires that the mined pattern must contain digital camera , then any
transaction that does not contain digital camera can be pruned at the beginning of the
mining process, which effectively reduces the data set to be examined.
Interestingly, many constraints are data-antimonotonic in the sense that during the
mining process , if a data entry cannot satisfy a data-antimonotonic constraint based on
the current pattern, then it can be pruned. We prune it because it will not be able to
contribute to the generation of any superpattern of the current pattern in the remaining
mining process.
Example 7.9 Data antimonotonicity. A mining query requires that C 1 : sum
/ $100 , that is,
the sum of the prices of the items in the mined pattern must be no less than $100. Sup-
pose that the current frequent itemset, S , does not satisfy constraint C 1 (say, because the
sum of the prices of the items in S is $50). If the remaining frequent items in a transac-
tion T i are such that, say, f i 2 . price D $ 5, i 5 . price D $ 10, i 8 . price D $ 20g, then T i will not
be able to make S satisfy the constraint. Thus, T i cannot contribute to the patterns to be
mined from S , and thus can be pruned.
Note that such pruning cannot be done at the beginning of the mining because at
that time, we do not know yet if the total sum of the prices of all the items in T i will
be over $100 (e.g., we may have i 3 . price D $ 80). However, during the iterative mining
process, we may find some items (e.g., i 3 ) that are not frequent with S in the transaction
data set, and thus they would be pruned. Therefore, such checking and pruning should
be enforced at each iteration to reduce the data search space.
.
I . price
 
Search WWH ::




Custom Search