Adapting Grammar Inference Techniques to Mine State Machines - Mining Software Specifications: Methodologies and Applications

Databases Reference

In-Depth Information

latter mistake tends to be amplified when the set of traces is sparse. Even

if the given sample of traces incorporated negative samples, the heuristic is

still error-prone. Since the k-tails was developed in the early 1970s, there have

however been notable advances in the grammar inference community, leading

to much more sophisticated merging heuristics. Two key advances came about

as a result of the \Abbadingo-One" competition [19]; (1) Rodney Price's Ev-

idence Driven State Merging heuristic takes advantage of the characteristics

of the given set of traces to make improved state-merging decisions. (2) The

associated \Blue-Fringe" search strategy substantially reduces the number of

pairs of candidate states that have to be compared with each other to assess

whether or not they should be merged. These two strategies are presented in

more detail below.

3.3.2.1

Evidence-Driven State Merging

The central idea behind Price's Evidence Driven State Merging (EDSM)

algorithm [19] is that states are merged based on the likelihood that they are

equivalent, based on the similarity of their outgoing paths. 2 This is in contrast

to the k-tails strategy that will simply merge the first pair of states to share

a tail of length k. The EDSM algorithm computes what is called a similarity

score for all pairs of states, and uses this scoring to select those state-pairs

that are deemed to be most similar to each other. The rationale is that this

reduces the number of merging errors, thus increasing the likelihood that the

final model is accurate.

Input: (Pos;Neg)

PLTS generateAPTA(Pos;Neg);

1

while (q;q 0 ) selectStatePair(PLTS) do

2

PLTS 0 merge(PLTS; (q;q 0 ));

3

if compatible(PLTS 0 ;Neg) then

4

PLTS PLTS 0 ;

5

end

6

end

7

return PLTS

8

Algorithm 2: EDSM algorithm

The EDSM algorithm is presented in algorithm 2. There are several impor-

tant differences between the EDSM and k-tails algorithms. The involvement

of negative traces means that the prefix-tree acceptor has to be augmented, to

distinguish between valid and invalid traces. This is referred to as the \Aug-

mented Prex-Tree Acceptor" [9], and is generated by the generateAPTA

function. Figure 3.4 shows the APTA for the set of traces in Figure 3.3 that

has been augmented with a negative trace (indicating that it is impossible to

2 This notion of equivalence is referred to as theirNerodeequivalencebased on Nerode's

early work [25].

Mining Software Specifications: Methodologies and Applications

Search WWH ::

Custom Search

Home