Databases Reference
In-Depth Information
Fig. 2. Part of a search tree. Each node is a template with the number of true
positives ( x + ) and false positives ( y N ) shown, with a “?” for unknown values. Each
child node can be created by superset generalisation from any of its parents
a high precision, though possibly with a low recall. Which of these options is
more appropriate depends on the task at hand.
Every template has at least one parent (except for the single root tem-
plate), because they are created using superset generalisation; but most tem-
plates can be created in more than one way, from several parent templates.
Therefore, many templates will have more than one parent. The whole space
may be considered as a lattice. Consider the partial search graph shown in
Fig. 2. Here, templates τ 0 - τ 4 have been evaluated, and the number of true
positives and false positives are shown for each. For example, τ 1 matches 20
fragments from D + and 50 from D N , and is therefore assumed to match 20
true positives and 50 false positives. At this stage of the search, the decision
to be made is which node to evaluate next: τ 5 or τ 6 ? We can feed forward the
facts that the two parent templates of τ 5 , τ 1 and τ 2 have 20 and 15 true pos-
itives respectively, and that therefore τ 5 must have at least 20 true positives.
Similarly, it must have at least 60 false positives. On the other hand, τ 6 must
have at least 40 true positives, and at least 55 false positives. We would there-
fore decide to evaluate τ 6 in preference to τ 5 at this point, because it has a
higher lower-bound on the number of true positives, and a lower lower-bound
on the number of false positives.
6.4 A Best-First Algorithm
We now present a formal definition of a best-first algorithm suitable for iden-
tifying good templates, followed by an example. We start by defining two
sets of templates. Set O (“open”) contains templates that have been created
(via initialise or generalise), but not yet evaluated. Set C (“closed”) contains
templates that have been evaluated, i.e. had values of TP and FP calculated.
Note that O
C
≡∅
.
Search WWH ::




Custom Search