Databases Reference
In-Depth Information
Notice that constraint C 1 is a monotonic constraint with respect to pattern space
pruning. As we have seen, this constraint has very limited power for reducing the
search space in pattern pruning. However, the same constraint can be used for effective
reduction of the data search space.
For an antimonotonic constraint, such as C 2 : sum
/ $ 100, we can prune
both pattern and data search spaces at the same time. Based on our study of pattern
pruning, we already know that the current itemset can be pruned if the sum of the prices
in it is over $100 (since its further expansion can never satisfy C 2 ). At the same time, we
can also prune any remaining items in a transaction T i that cannot make the constraint
C 2 valid. For example, if the sum of the prices of items in the current itemset S is $90,
any patterns over $10 in the remaining frequent items in T i can be pruned. If none of
the remaining items in T i can make the constraint valid, the entire transaction T i should
be pruned.
Consider pattern constraints that are neither antimonotonic nor monotonic such
as “ C 3 : avg
.
I . price
/ 10.” These can be data-antimonotonic because if the remaining
items in a transaction T i cannot make the constraint valid, then T i can be pruned as well.
Therefore, data-antimonotonic constraints can be quite useful for constraint-based data
space pruning.
Notice that search space pruning by data antimonotonicity is confined only to a pat-
tern growth-based mining algorithm because the pruning of a data entry is determined
based on whether it can contribute to a specific pattern. Data antimonotonicity cannot
be used for pruning the data space if the Apriori algorithm is used because the data
are associated with all of the currently active patterns. At any iteration, there are usu-
ally many active patterns. A data entry that cannot contribute to the formation of the
superpatterns of a given pattern may still be able to contribute to the superpattern of
other active patterns. Thus, the power of data space pruning can be very limited for
nonpattern growth-based algorithms.
.
I . price
7.4 Mining High-Dimensional Data and Colossal Patterns
The frequent pattern mining methods presented so far handle large data sets having
a small number of dimensions. However, some applications may need to mine high-
dimensional data (i.e., data with hundreds or thousands of dimensions). Can we use
the methods studied so far to mine high-dimensional data? The answer is unfortunately
negative because the search spaces of such typical methods grow exponentially with the
number of dimensions.
Researchers have overcome this difficulty in two directions. One direction extends a
pattern growth approach by further exploring the vertical data format to handle data
sets with a large number of dimensions (also called features or items , e.g., genes) but
a small number of rows (also called transactions or tuples , e.g., samples). This is use-
ful in applications like the analysis of gene expressions in bioinformatics, for example,
where we often need to analyze microarray data that contain a large number of genes
 
Search WWH ::




Custom Search