Databases Reference
In-Depth Information
Other categories may be introduced to capture other attributes, such as or-
thography (e.g. upper case, lower case or mixed case), word length, language
and so on. A parser could be used to label words as belonging to different
types of phrases, such as verb phrases and noun phrases. We could also treat
punctuation symbols as literals if required, or as a separate category. How-
ever, the categories described above are su cient to allow us to develop and
demonstrate the framework.
Definition 5. A category κ is set of attributes of words of the same type.
Common categories include “parts of speech” and “stems”.
For convenience, we will label certain categories in these and subsequent ex-
amples. This is not part of the framework but reflects categories likely to be
used in a practical implementation. In particular, we use Λ to label the cat-
egory “literals”; Π for “parts of speech”; Γ for “gazetteers”; Σ for “stems”;
and for “wildcards”.
Example:
κ Λ =
{
the, cat, sat, on, mat, mouse, . . .
}
κ Σ =
{
the stem, cat stem, sit stem, on stem, mat stem, mouse stem, ...
}
κ Π =
{
DT, NN, VBD, IN, . . .
}
κ Γ =
{
FELINE, RODENT, ANIMAL, FLOOR COVERING, . . .
}
}
We use the su x “ stem” in stem labels to avoid confusing them with the
corresponding literal.
κ =
{∗
, ?
Definition 6. Let K be a set of categories of attributes. Each element κ of
K is a single category of word attributes.
Example: K 1 =
{
κ Λ Σ Π Γ }
.
Definition 7. A term t is a value that an attribute may take, i.e. an element
of a category of word attributes.
Examples: t 1 = cat, t 2 =NN, t 3 = FELINE, where t 1
κ Λ , t 2
κ Π , t 3
κ Γ .
Definition 8. We define a template element T to be a set of terms belonging
to a single category. Let T =
{
t 1 ,t 2 ,...,t n }
, such that t i
T. Then t i
κ
⇐⇒
t j
κ,
t j
T.
Examples:
T 1 =
{
NN, VBD
}
T 2 =
{
FELINE, RODENT, FLOOR COVERING
}
is not a template element because “NN” and
“FELINE” belong to different categories, namely κ Π and κ Γ respectively.
The name “template element” refers to templates as defined in Definition
13 below.
The set
{
NN, FELINE
}
 
Search WWH ::




Custom Search