Database Reference
In-Depth Information
when unknown bugs manifest themselves after deployment, thus moti-
vating the work surveyed in this chapter.
In view of the above, a number of automated techniques were recently
developed for troubleshooting sensor networks after deployment in or-
der to identify causes of anomalous behavior, recover from problems,
and reduce ownership costs. An important category of these techniques
leverage data mining literature on identification, classification, and un-
derstanding of complex patterns in large, highly coupled systems [92,
30, 97, 43, 39, 41, 21, 84] with applications ranging from biological pro-
cesses [77] to commercial databases [68]. Data mining techniques help
the discovery of hidden patterns that may be responsible for software
malfunction.
While the use of data mining in network troubleshooting is promising,
it is by no means a straightforward application of existing techniques to
a new problem. Networked software execution patterns are not governed
by “laws of nature”, DNA, business transactions, or social norms. They
are limited only by programmers' imagination. The increased diversity
and richness of software interaction patterns make it harder to zoom-in
on potential causes of problems without embedding some knowledge of
networking, programming, and debugging into the data mining engine.
This chapter describes cross-cutting solutions that leverage the power of
data mining to uncover hard-to-find bugs in distributed sensing systems.
There are two fundamentally different schools of thought when it
comes to using data mining tools for designing debugging solutions. The
first one adopts the belief that problems in large systems are inherently
localized. It is uncommon for independent failures to coincide. Hence,
when trouble occurs, while symptoms may be many, the challenge is to
find the single root cause (or the smallest set of independent causes)
that can trigger the observed avalanche of problems. Finding this single
root cause can be cast as a classification problem, in which leafs of the
classifer are the different diagnostic answers. A challenge is to determine
the rules or features that can reliably discriminate between the different
root cause failure scenarios. These approaches are covered in Section 2.
The second school of thought argues that in professional production
systems (that are well-designed and well-maintained) most failures arise
due to unexpected interactions between components. Individual com-
ponents are designed to high-standards and seldom misbehave on their
own. It is the large combination of such components that can result
in subtle problems because of interactions that may have not been en-
visioned at design time. Hence, the debugging tool should be looking
for an interaction pattern, such as a particular sequence or a particular
Search WWH ::




Custom Search