Database Reference
In-Depth Information
Preventing false frequent patterns: A basic gapped sequence min-
ing algorithm considers all possible combinations of frequent sub-
sequences of the original sequence with no regard to the amount
of separation between individual events. As a result, it may gen-
erate subsequences combining events that are “too far apart” to
be causally related. To alleviate this problem, the algorithm was
extended to identify loops in the underlying program. Sequences
were not allowed to span different loop iterations, localizing the
search to the span of a single iteration of a program loop. A dy-
namic search window scheme was developed, where the first item
of any candidate sequence was used to determine the search win-
dow. The window extended only until the next occurrence of this
same item, and sequences were searched for only within individual
windows. Patterns that cross window boundaries were not consid-
ered.
Suppressing redundant subsequences: When frequent patterns were
found, where one is a subsequence of the other, only the longest
pattern was kept. This had impact on the ability to come up
with discriminative patterns. The rule makes sense in debugging
because not all subsets of “good” sequences are good. Forgetting
a step in a multi-step procedure may well cause a failure. Hence,
subsequences of good sequences could be bad. If the sequence ab
cd occurs frequently during normal operation, and sequence ac
d occurs frequently when a bug manifests, one must make sure
that discriminative mining identifies the latter pattern as indeed
discriminative despite the fact that it also occurs (as a subset of
the frequent pattern abcd )inthe“good”case. Otherwise,
failures that arise due to omission of some step (e.g., omission of
b above) would not be found. Closed item set mining [89, 99] is
commonly used in data mining to eliminate frequent subsets of a
super set. Using this approach, sequences that are a subsequence
of a longer pattern with similar support are eliminated.
Two-stage mining for infrequent events: In debugging, sometimes
less frequent patterns could be more indicative of the cause of fail-
ure than the more frequent patterns. A single mistake can cause
many instances of damage. For example, a single node reboot
event can cause a large number of message losses. In such cases,
the most frequent patterns may not include the real cause of the
problem. Fortunately, in the case of sensor network debugging,
a solution may be inspired by the nature of the problem domain.
The fundamental issue to observe is that much computation in sen-
Search WWH ::




Custom Search