Information Technology Reference
In-Depth Information
We model our storage structures after the general notion of subject-action-object
triplets, as shown in Figure 5.2. Interlinked subject-action-object triples and their
respective modifiers can express most types of syntactic relations between various
entities within a sentence.
The index abstraction is presented in Table 5.2, where the additional column
“Dist” denotes degrees of separations (or distance) between primary Subject, Verb,
Object and each Modifier , and “Neg” keeps track of negated actions.
Table 5.2. The index abstraction of a sentence.
Subject-
Object-
Verb-
Subject
Modifier Object
Verb
Prep Pcomp Dist Neg
Modifier
Modifier
Washington George
appoint
1
F
commander
appoint
1
F
Army
Continental appoint
3
F
appoint
in
1775
2
F
Washington George
force
fighting
mold
2
F
force
fighting independence
win
1
F
Great
win
from Britain 3
F
win
eventually
1
F
InFact stores the normalized triplets into dedicated index structures that
are optimized for e cient keyword search
are optimized for e cient cross-document retrieval of arbitrary classes of rela-
tionships or events (see examples in the next section)
store document metadata and additional ancillary linguistic variables for filter-
ing of search results by metadata constraints (e.g., author, date range), or by
linguistic attributes (e.g., retrieve negated actions, search subject modifier field
in addition to primary subject in a relationship search)
(optionally) superimposes annotations and taxonomical dependencies from a
custom ontology or knowledge base.
With regard to the last feature, for instance, we may superimpose a [Country] entity
label on a noun phrase, which is the subject of the verb “to attack.” The index
supports multiple ontologies and entangled multiparent taxonomies.
InFact stores “soft events” instead of fitting textual information into a rigid
relational schema that may result in information loss. “Soft events” are data struc-
tures that can be recombined to form events and relationships. “Soft events” are
pre-indexed to facilitate thematic retrieval by action, subject, and object type. For
instance, a sentence like “The president of France visited the capital of Tunisia”
contains evidence of 1) a presidential visit to a country's capital and 2) diplomatic
relationships between two countries. Our storage strategy maintains both interpre-
tations. In other words, we allow more than one subject or object to be associated
with the governing verb of a sentence. The tuples stored in the database are there-
fore “soft events,” as they may encode alternative patterns and relationships found
in each sentence. Typically, only one pattern is chosen at search time, in response to
a specific user request (i.e., request #1: gather all instances of a president visiting a
country; request #2: gather all instances of interactions between any two countries).
Search WWH ::




Custom Search