Information Technology Reference
In-Depth Information
Input: i=1, j=1, sequence x, pattern t
Output: sum, number of t included in x
number(String[] x, int i, String[] t, int j) {
if (x[i]=(“ ”)) i++; // If we have a star, skip it, it was already used
// If the star was the last character, found another match.
if (i = m AND x[i] = (“ ”)) return ++sum;
if (j = n)
}
if (i = 0 AND j = 0) sum = 0;
// The “ i > 0 ” test simulates a starting star.
if (i
{
return sum;
{
if (x[i] = (t[j]) AND i = (m - 1)) { sum++; }
else if (x[i] = (t[j]))
>
0 AND x[i - 1]
=(“
”))
{
number(x, i + 1, t, j + 1);
}}
else {
for (int p = j; p < n; p++) {
if (x[i] = (t[p]) AND i = (m - 1))
}
else if (x[i] = (t[p])) { number(x, i + 1, t, p + 1); }
}
{
sum++;
}
return sum;
}
Fig. 4. An algorithm to find out how frequent is each pattern within a certain number
of records
first ( k -2) items are similar. This set of candidate is denoted C ( k ,0). Second,
the prune step: C ( k ,0) is a superset of L ( k ,0), that is, its elements may or may
not be frequent, but all of the frequent k -itemsets are included in C ( k ,0), even
if C ( k ,0) is very large. In fact any ( k -1)-itemset that is not frequent cannot
be a subset of a frequent k -itemset. Hence, if any ( k -1)-subset of a candidate
k -itemset is not in L ( k -1,0), then the candidate cannot be frequent either, and
so, can be removed from C ( k ,0). Suppose there are n records in the original
data set, to find all n large sequences, the number of connection will be 2 n .To
build the signature of an attack with around 100 records, this structure is not
suitable.
In contrast, when we search for 1-itemsets candidate, C (1,0), with our pro-
posed algorithm, we need to scan the original records once and count all items,
the same as the Apriori algorithm. When searching for frequent 1-itemsets,
L (1,0), instead of scanning original records, we only need to scan C (1,0) which
is composed of original records and much less than the original data. After gen-
erating all L ( k ,0), we scan the original records once, and every C ( k ,0) is also
scanned once. In total, k times of scanning are performed. Since any L ( k,l )
is extracted from the corresponding L ( k ,0), we only need to scan the data
stored in the temporary database instead of the corresponding C ( k ,0) or origi-
nal records. The data quantity is reduced evidently. And the most important, by
taking out C (2,0), and only scan the corresponding L (1,0) which may compose
the C (2,0) in the temporary database. Then, the other C (2,0) is taken out in
turn.
Search WWH ::




Custom Search