Databases Reference
In-Depth Information
7 Discussion and Extensions
We now briefly discuss a few of the possible extensions to the framework and
its implementation.
We could introduce other wildcards, such as a wildcard which matches an
entire phrase, which could itself be defined as a series of terms, much like a
template. This would allow optional subclauses, such as subordinate clauses,
to be matched. Let ? τ 1 designate an optional wildcard that matches a sequence
of literals defined by template τ 1 or matches nothing at all. Then if τ 1 =
<
>
would match the fragment < the, cat, which, was, black, sat > as well as < the,
cat, sat > .
An additional approach to finding good templates is to repeatedly merge
useful templates to produce more general templates [18]. Our framework could
easily be extended to allow this, by ensuring that the product of merging two
templates matches every fragment that either template matches. This could
be achieved by considering each pair of template elements in turn, and either
performing a simple set union if they belong to the same category, or else
generalising them both to the same category before such a union. In either
case, the new template would match the union of the true-positives matched
by the two parents, and the union of the false-positives, allowing the lower-
bounds on each to be calculated.
So far, we have considered template that exist in isolation, whereas in
practical systems, it is more common to apply a set of templates together.
Our framework can be extended to include this by using a sequential covering
algorithm. Suppose we have a template τ that matches some true positives
and some false positives. We could reduce the number of false positives by
creating a second template τ that is optimised to match just the false positive
fragments matched by τ . This could be achieved by defining two new versions
of D + and D N based on the fragments matched by τ , and using these to
guide the search for τ . We could then apply τ and τ together, predicting
interesting fragments as µ ( τ,D )
{
which
}
,
{
*
}
,
{
*
}
> , the template τ 2 = <
{
DT
}
,
{
ANIMAL
}
,
{
? τ 1 }
,
{
sat
}
µ ( τ ,D ) (i.e. fragments matched by τ but
not by τ .). In many practical applications, more than one template will be
applied to a set of documents, each designed to match a different piece of
information, or a different way of expressing that information.
We have assumed that we do not have a set of annotated examples, i.e.
fragments known in advance to be positive or negative. Creating and anno-
tating large sets of examples is extremely time consuming for a user, although
giving a yes/no response to automatic annotations is simpler [20]. One en-
hancement to our system therefore would be to start with the estimates of
true positive and false positive as outlined above, and search for a good tem-
plate, and then use this template to annotate a number of fragments and to
present these to the user. The user then marks each fragment as interest-
ing or not interesting, and this could then be used to improve the quality of
\
Search WWH ::




Custom Search