Databases Reference
In-Depth Information
1. begin
2. C ←∅ , O ←∅
3. initialise( f )= τ root → O
4. while not finished do
a) find estimated best template τ in O
b) evaluate τ
c) delete τ from O
d) add τ to C
e) expand τ by adding to O all templates that can be created by superset
generalisation of τ
f) update lower bounds on TP and FP for all templates in O
5. return best template from C .
6. end
Fig. 3. A best-first algorithm. C is the “closed” set of evaluated templates and O
is the “open” set of unevaluated templates. See Sect. 6.4 for further details
Figure 3 gives the algorithm. After initialisation, we take the best template
that has not yet been evaluated, and evaluate it, i.e. calculate its true posi-
tive and false positive scores. We then generalise it in every way possible to
create child templates, and update the lower bounds on the true and false
positive scores for these children. Because there are several ways that each
template could be created, some children will have more than one parent. In
these cases, the lower bounds are the maximum of the lower bounds of all
the parents.
By “best template” (Fig. 3, Steps 4a and 5), we mean select the template
with the highest lower-bound on the number of true positives, as inherited
from each template's ancestors. If more than one template has the same max-
imum true positive lower-bound, then we choose between them by selecting
the template with the smallest lower-bound on false positives. If this still se-
lects more than one template, we can either pick one randomly; use them all
successively; or use all the selected templates together in the subsequent steps
(i.e. evaluate and generalise more than one template in one pass through the
main loop).
In Step 4f we could choose to either update the bounds of just the descen-
dents of τ in O , or else update the bounds of the descendents of every template
in C . The latter would be more computationally expensive, but would lead to
better estimates of the lower bounds. For each new template created, it would
require a search through C to find all the other possible ancestors, besides τ ,
in order to calculate the new lower bounds.
Finally, we terminate the search when some pre-specified criterion is sat-
isfied. Possible criteria include terminating: when O is empty (in which case
every possible template has been evaluated); or when a limit on the number of
evaluations is reached (e.g. stop after 1,000 templates have been evaluated);
or when a certain number of true positives (or false positives) are matched by
the current best template. It would be quite possible for the user to stop the
search and consider the current best template, before re-starting the search if
necessary.
Search WWH ::




Custom Search