Database Reference
In-Depth Information
10.1.2 Scoring Snippets
The second challenge is in making use of the atype to define a scoring
strategy. In traditional Information Retrieval (IR), documents and queries are
represented as vectors, and cosine similarity (or tweaks to it) define ranking.
Most later IR systems reward a document with a better score if the query
words appear close to each other. We continue to model the corpus as a
linear sequence of tokens, but some tokens are now attached to nodes in our
atype DAG (see Figure 10.1). Apart from general concepts, there may be
surface patterns (such as a token having exactly four digits, or beginning
with an uppercase letter) that are strong indicators of the type of the entity
mentionedinatoken.
Name a physicist who searched
for intelligent life in the cosmos
type= physicist NEAR “cosmos”…
abstraction
entity
is-a
Where was Sagan born?
type= region NEAR “Sagan”
region
person
city
scientist
When was Sagan born?
type= time
pattern= isDDDD NEAR
“Sagan” “born”
time
district
physicist
year
state
astronomer
hasDigit
isDDDD
Born in New York in 1934 , Sagan was
a noted astronomer whose lifelong passion
was searching for intelligent life in the cosmos.
4
FIGURE 10.1 (SEE COLOR INSERT FOLLOWING PAGE 130.) :
Document as a linear sequence of tokens, some connected to a type hierarchy.
Some sample queries and their approximate translation to a semi-structured
form are shown.
In Figure 10.1, one or more nodes a in the atype DAG has/have been
designated as desired atypes for the given query. Some candidate tokens in
the corpus are descendants of a . We have to score and rank these candidates.
The merit of a candidate is decided by its proximity (defined as the number
of intervening tokens) to other tokens that match the non-atype part of the
query. In Section 10.3 we present a machine learning approach to design a
proximity scoring function of this form. We show that this has higher accuracy
than using a standard IR system to score fixed text windows against the query.
Search WWH ::




Custom Search