Advanced Pattern Mining - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

Notice that constraint C 1 is a monotonic constraint with respect to pattern space

pruning. As we have seen, this constraint has very limited power for reducing the

search space in pattern pruning. However, the same constraint can be used for effective

reduction of the data search space.

For an antimonotonic constraint, such as C 2 : sum

/ $ 100, we can prune

both pattern and data search spaces at the same time. Based on our study of pattern

pruning, we already know that the current itemset can be pruned if the sum of the prices

in it is over $100 (since its further expansion can never satisfy C 2 ). At the same time, we

can also prune any remaining items in a transaction T i that cannot make the constraint

C 2 valid. For example, if the sum of the prices of items in the current itemset S is $90,

any patterns over $10 in the remaining frequent items in T i can be pruned. If none of

the remaining items in T i can make the constraint valid, the entire transaction T i should

be pruned.

Consider pattern constraints that are neither antimonotonic nor monotonic such

as “ C 3 : avg

.

I . price

/ 10.” These can be data-antimonotonic because if the remaining

items in a transaction T i cannot make the constraint valid, then T i can be pruned as well.

Therefore, data-antimonotonic constraints can be quite useful for constraint-based data

space pruning.

Notice that search space pruning by data antimonotonicity is confined only to a pat-

tern growth-based mining algorithm because the pruning of a data entry is determined

based on whether it can contribute to a specific pattern. Data antimonotonicity cannot

be used for pruning the data space if the Apriori algorithm is used because the data

are associated with all of the currently active patterns. At any iteration, there are usu-

ally many active patterns. A data entry that cannot contribute to the formation of the

superpatterns of a given pattern may still be able to contribute to the superpattern of

other active patterns. Thus, the power of data space pruning can be very limited for

nonpattern growth-based algorithms.

.

I . price

7.4 Mining High-Dimensional Data and Colossal Patterns

The frequent pattern mining methods presented so far handle large data sets having

a small number of dimensions. However, some applications may need to mine high-

dimensional data (i.e., data with hundreds or thousands of dimensions). Can we use

the methods studied so far to mine high-dimensional data? The answer is unfortunately

negative because the search spaces of such typical methods grow exponentially with the

number of dimensions.

Researchers have overcome this difficulty in two directions. One direction extends a

pattern growth approach by further exploring the vertical data format to handle data

sets with a large number of dimensions (also called features or items , e.g., genes) but

a small number of rows (also called transactions or tuples , e.g., samples). This is use-

ful in applications like the analysis of gene expressions in bioinformatics, for example,

where we often need to analyze microarray data that contain a large number of genes

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home