Databases Reference
In-Depth Information
latter mistake tends to be amplified when the set of traces is sparse. Even
if the given sample of traces incorporated negative samples, the heuristic is
still error-prone. Since the k-tails was developed in the early 1970s, there have
however been notable advances in the grammar inference community, leading
to much more sophisticated merging heuristics. Two key advances came about
as a result of the \Abbadingo-One" competition [19]; (1) Rodney Price's Ev-
idence Driven State Merging heuristic takes advantage of the characteristics
of the given set of traces to make improved state-merging decisions. (2) The
associated \Blue-Fringe" search strategy substantially reduces the number of
pairs of candidate states that have to be compared with each other to assess
whether or not they should be merged. These two strategies are presented in
more detail below.
3.3.2.1
Evidence-Driven State Merging
The central idea behind Price's Evidence Driven State Merging (EDSM)
algorithm [19] is that states are merged based on the likelihood that they are
equivalent, based on the similarity of their outgoing paths. 2 This is in contrast
to the k-tails strategy that will simply merge the first pair of states to share
a tail of length k. The EDSM algorithm computes what is called a similarity
score for all pairs of states, and uses this scoring to select those state-pairs
that are deemed to be most similar to each other. The rationale is that this
reduces the number of merging errors, thus increasing the likelihood that the
final model is accurate.
Input: (Pos;Neg)
PLTS generateAPTA(Pos;Neg);
1
while (q;q 0 ) selectStatePair(PLTS) do
2
PLTS 0 merge(PLTS; (q;q 0 ));
3
if compatible(PLTS 0 ;Neg) then
4
PLTS PLTS 0 ;
5
end
6
end
7
return PLTS
8
Algorithm 2: EDSM algorithm
The EDSM algorithm is presented in algorithm 2. There are several impor-
tant differences between the EDSM and k-tails algorithms. The involvement
of negative traces means that the prefix-tree acceptor has to be augmented, to
distinguish between valid and invalid traces. This is referred to as the \Aug-
mented Prex-Tree Acceptor" [9], and is generated by the generateAPTA
function. Figure 3.4 shows the APTA for the set of traces in Figure 3.3 that
has been augmented with a negative trace (indicating that it is impossible to
2 This notion of equivalence is referred to as theirNerodeequivalencebased on Nerode's
early work [25].
 
Search WWH ::




Custom Search