Database Reference
In-Depth Information
10.1.2 Scoring Snippets
The second challenge is in making use of the atype to define a scoring
strategy. In traditional Information Retrieval (IR), documents and queries are
represented as vectors, and cosine similarity (or tweaks to it) define ranking.
Most later IR systems reward a document with a better score if the query
words appear close to each other. We continue to model the corpus as a
linear sequence of tokens, but some tokens are now attached to nodes in our
atype DAG (see Figure 10.1). Apart from general concepts, there may be
surface patterns
(such as a token having exactly four digits, or beginning
with an uppercase letter) that are strong indicators of the type of the entity
mentionedinatoken.
Name a
physicist
who searched
for intelligent life in the cosmos
type=
physicist
NEAR “cosmos”…
abstraction
entity
is-a
Where
was Sagan born?
type=
region
NEAR “Sagan”
region
person
city
scientist
When
was Sagan born?
type=
time
pattern=
isDDDD
NEAR
“Sagan” “born”
time
district
physicist
year
state
astronomer
hasDigit
isDDDD
Born in New York in 1934 , Sagan was
a noted astronomer whose lifelong passion
was searching for intelligent life in the cosmos.
4
FIGURE 10.1 (SEE
COLOR INSERT
FOLLOWING PAGE 130.)
:
Document as a linear sequence of tokens, some connected to a type hierarchy.
Some sample queries and their approximate translation to a semi-structured
form are shown.
In Figure 10.1, one or more nodes
a
in the atype DAG has/have been
designated as desired atypes for the given query. Some
candidate tokens
in
the corpus are descendants of
a
. We have to score and rank these candidates.
The merit of a candidate is decided by its proximity (defined as the number
of intervening tokens) to other tokens that match the non-atype part of the
query. In Section 10.3 we present a machine learning approach to design a
proximity scoring function of this form. We show that this has higher accuracy
than using a standard IR system to score fixed text windows against the query.
Search WWH ::
Custom Search