Databases Reference
In-Depth Information
Fig. 2.
Part of a search tree. Each node is a template with the number of true
positives (
x
+
) and false positives (
y
N
) shown, with a “?” for unknown values. Each
child node can be created by superset generalisation from any of its parents
a high precision, though possibly with a low recall. Which of these options is
more appropriate depends on the task at hand.
Every template has at least one parent (except for the single root tem-
plate), because they are created using superset generalisation; but most tem-
plates can be created in more than one way, from several parent templates.
Therefore, many templates will have more than one parent. The whole space
may be considered as a lattice. Consider the partial search graph shown in
Fig. 2. Here, templates
τ
0
-
τ
4
have been evaluated, and the number of true
positives and false positives are shown for each. For example,
τ
1
matches 20
fragments from
D
+
and 50 from
D
N
, and is therefore assumed to match 20
true positives and 50 false positives. At this stage of the search, the decision
to be made is which node to evaluate next:
τ
5
or
τ
6
? We can feed forward the
facts that the two parent templates of
τ
5
,
τ
1
and
τ
2
have 20 and 15 true pos-
itives respectively, and that therefore
τ
5
must have at least 20 true positives.
Similarly, it must have at least 60 false positives. On the other hand,
τ
6
must
have at least 40 true positives, and at least 55 false positives. We would there-
fore decide to evaluate
τ
6
in preference to
τ
5
at this point, because it has a
higher lower-bound on the number of true positives, and a lower lower-bound
on the number of false positives.
6.4 A Best-First Algorithm
We now present a formal definition of a best-first algorithm suitable for iden-
tifying good templates, followed by an example. We start by defining two
sets of templates. Set
O
(“open”) contains templates that have been created
(via initialise or generalise), but not yet evaluated. Set
C
(“closed”) contains
templates that have been evaluated, i.e. had values of TP and FP calculated.
Note that
O
∩
C
≡∅
.