Adapting Grammar Inference Techniques to Mine State Machines - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

more in-depth discussion the reader is referred to the original paper by Lang

et al. [19] or the overview by Cicchello and Kremer [9].

3.3.3 Active Approaches

Any analysis approach that attempts to derive information about a soft-

ware system from traces alone is vulnerable to problem 3 in Section 3.2.3.

The selection of traces is critical to be able to draw valid conclusions about

the underlying software system [14]. This presents us with an important prob-

lem. In a setting where the developer is wanting to mine the specification for

the software system, it is unreasonable to expect very much (if any) prior

knowledge about how the system behaves. Consequently, it is unrealistic to

expect the ability to derive a collection of traces that is suciently diverse to

adequately exercise every aspect of software behavior.

There is an analogous problem in the field of grammar inference: How

does one obtain the set of sentences that are necessary to accurately infer a

grammar? One solution was proposed by Angluin in 1987 [2] in the form of

the L learning algorithm. This algorithm does not require any initial samples

at all; instead it proposes its own traces (called \membership queries"), and

assumes that there is an oracle (called a \minimally adequate teacher") that

can provide feedback to the algorithm. This feedback is provided in two forms:

(1) being able to state whether individual traces do or do not belong to the

target machine, and (2) being able to state whether a hypothesis machine is

equivalent to the target, and if not why not. Given that these answers can

be provided, Angluin established that it is possible to infer the exact state

machine in polynomial time.

The L algorithm works by constructing an \observation table," which

records whether certain sequences over are accepted or not. These se-

quences are asked in the form of membership queries, and the querying pro-

cess stops once the table has been fully populated (or in more formal terms {

it is closed and consistent ). Once this is the case, it is possible to conjecture

a hypothesis LTS. This solution is suitable for situations where the target

machine is relatively simple, and we have an oracle that can be asked as many

questions as are necessary. However, if the state machines are large and com-

plex, or there is a limit on the number of queries that can be answered by the

oracle, the algorithm becomes impractical.

Nonetheless, the L approach is appealing from the point of view that it

is active. The provision of an initial set of traces is no longer critical to its

ability to produce an accurate result. This idea of active inference has subse-

quently motivated work by Dupont et al., who developed an active variant of

the EDSM-Blue Fringe algorithm called the Question State Merging (QSM)

algorithm [12, 13]. The inference process is the same as with EDSM, but this

time the algorithm is equipped with the ability to ask questions after every

state-merge, to ensure that the merge is correct. In comparison to the L algo-

rithm, no guarantees can be made about the absolute correctness of the final

Mining Software Specifications: Methodologies and Applications

Search WWH ::

Custom Search

Home