Database Reference
In-Depth Information
Chapter 10
Text Search-Enhanced with
Types and Entities
Soumen Chakrabarti, Sujatha Das, Vijay Krishnan, and Kriti
Puniyani
10.1 Entity-Aware Search Architecture
.......................................
233
10.2 Understanding the Question
.............................................
236
10.3 Scoring Potential Answer Snippets
......................................
251
.........................................
10.4 Indexing and Query Processing
260
10.5 Conclusion
...............................................................
272
10.1 Entity-Aware Search Architecture
Until recently, large-scale text and Web search systems regarded a document
as a sequence of string tokens. Queries were also comprised of string tokens,
and the search engine's job was to assign a score to each document based on
the extent of matches between query and document tokens, the rarity of the
query tokens in the corpus, and, more recently, the “prestige” of the Web
document in the social network of hyperlinks.
Several parallel and interrelated developments have changed this state
of affairs in the last few years. Some smaller scale search applications
were already more heavily invested in computational linguistics and natural
language processing (NLP), and those technologies are being imported into
and scaled up to benefit large-scale search. Machine learning techniques
for tagging entities mentioned in unstructured text have become quite
sophisticated, scalable and robust. XML is often used to represent typed
entity-relationship graphs, and query engines for XML already support graph
idioms that are common in entity extraction and NLP.
Gradually, Web search engines have turned to quite a bit of interpretation
of string tokens against the backdrop of our physical world. A five-digit
number is interpreted as a zipcode in some contexts. Many named entities
are recognized and exploited:
Recognizing that a query is a person name triggers a “diversity”
 
 
 
 
 
 
Search WWH ::




Custom Search