Adapting Grammar Inference Techniques to Mine State Machines - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

formed the basis for most state machine inference techniques in the domain

of software engineering [1, 11, 21{23, 31].

However, since then a host of advances have been made in the field of

grammar inference that have not been exploited in the software engineering

community [36]. The rest of this section will show how these can be applied to

solve three of the key problems with the k-tails technique that were outlined

in the previous section.

3.3.1 Negative Samples

As stated in Section 3.2.3 problem 1, the k-tails technique assumes that all

traces are valid. This means that it is never possible to obtain a characteristic

sample of traces (which requires negative samples { see the discussion of prob-

lem 3), making it impossible to ensure an accurate state machine. It is only

by providing a large and diverse set of positive as well as negative examples,

that the inference of accurate (or at least reasonably accurate) state machines

becomes a possibility. So what is meant by a \negative" program trace? If

the state machine inference process is akin to that shown in Figure 3.2, where

should these negative traces come from? Some answers to these questions are

provided below.

The usual assumption is that a trace must be by denition positive { it

represents an observed, valid program execution. This assumption is however

not necessarily correct. There are two circumstances that can give rise to

\negative" program traces:

1. Program executions that fail: Depending on the nature of the soft-

ware (and the choice of semantics attributed to program traces), execu-

tions that end by throwing an exception, or terminate with an outright

failure can be deemed to produce negative traces.

2. Infeasible executions: For most software systems, the vast majority

of theoretically conceivable program executions are in fact infeasible.

For example, with respect to the text editor in Figure 3.1, to begin an

execution with a \close" action is not even an option, because no le

has been opened in the first place.

These two notions enable us to distinguish between valid and invalid traces;

those that belong to the prescribed and proscribed language of the underlying

software system. This in turn enables us to employ more sophisticated infer-

ence algorithms than the k-tails algorithm, as will be discussed in following

subsections.

3.3.2 Improved State-Merging Heuristics

As detailed previously, the k-tails strategy is prone to merging states that

are not equivalent, and not merging states when they should be merged. The

Search WWH ::

Custom Search

Home