A SURVEY OF DATAMINING METHODS FOR SENSOR NETWORK BUG DIAGNOSIS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

Preventing false frequent patterns: A basic gapped sequence min-

ing algorithm considers all possible combinations of frequent sub-

sequences of the original sequence with no regard to the amount

of separation between individual events. As a result, it may gen-

erate subsequences combining events that are “too far apart” to

be causally related. To alleviate this problem, the algorithm was

extended to identify loops in the underlying program. Sequences

were not allowed to span different loop iterations, localizing the

search to the span of a single iteration of a program loop. A dy-

namic search window scheme was developed, where the first item

of any candidate sequence was used to determine the search win-

dow. The window extended only until the next occurrence of this

same item, and sequences were searched for only within individual

windows. Patterns that cross window boundaries were not consid-

ered.

Suppressing redundant subsequences: When frequent patterns were

found, where one is a subsequence of the other, only the longest

pattern was kept. This had impact on the ability to come up

with discriminative patterns. The rule makes sense in debugging

because not all subsets of “good” sequences are good. Forgetting

a step in a multi-step procedure may well cause a failure. Hence,

subsequences of good sequences could be bad. If the sequence ab

cd occurs frequently during normal operation, and sequence ac

d occurs frequently when a bug manifests, one must make sure

that discriminative mining identifies the latter pattern as indeed

discriminative despite the fact that it also occurs (as a subset of

the frequent pattern abcd )inthe“good”case. Otherwise,

failures that arise due to omission of some step (e.g., omission of

b above) would not be found. Closed item set mining [89, 99] is

commonly used in data mining to eliminate frequent subsets of a

super set. Using this approach, sequences that are a subsequence

of a longer pattern with similar support are eliminated.

Two-stage mining for infrequent events: In debugging, sometimes

less frequent patterns could be more indicative of the cause of fail-

ure than the more frequent patterns. A single mistake can cause

many instances of damage. For example, a single node reboot

event can cause a large number of message losses. In such cases,

the most frequent patterns may not include the real cause of the

problem. Fortunately, in the case of sensor network debugging,

a solution may be inspired by the nature of the problem domain.

The fundamental issue to observe is that much computation in sen-

Search WWH ::

Custom Search

Home