Information Technology Reference
In-Depth Information
5
A Case Study in Natural Language Based Web
Search
Giovanni Marchisio, Navdeep Dhillon, Jisheng Liang, Carsten Tusk, Krzysztof
Koperski, Thien Nguyen, Dan White, and Lubos Pochman
5.1 Introduction
Is there a public for natural language based search? This study, based on our experi-
ence with a Web portal, attempts to address criticisms on the lack of scalability and
usability of natural language approaches to search. Our solution is based on InFact R ,
a natural language search engine that combines the speed of keyword search with
the power of natural language processing. InFact performs clause level indexing, and
offers a full spectrum of functionality that ranges from Boolean keyword operators
to linguistic pattern matching in real time, which include recognition of syntactic
roles, such as subject/object and semantic categories, such as people and places. A
user of our search can navigate and retrieve information based on an understanding
of actions, roles and relationships. In developing InFact, we ported the functional-
ity of a deep text analysis platform to a modern search engine architecture. Our
distributed indexing and search services are designed to scale to large document
collections and large numbers of users. We tested the operational viability of InFact
as a search platform by powering a live search on the Web. Site statistics and user
logs demonstrate that a statistically significant segment of the user population is
relying on natural language search functionality. Going forward, we will focus on
promoting this functionality to an even greater percentage of users through a series
of creative interfaces.
Information retrieval on the Web today makes little use of Natural Language
Processing (NLP) techniques [1, 3, 11, 15, 18]. The perceived value of improved
understanding is greatly outweighed by the practical di culty of storing complex
linguistic annotations in a scalable indexing and search framework. In addition, any
champion of natural language techniques must overcome significant hurdles in user
interface design, as greater search power often comes at a price of more work in for-
mulating a query and navigating the results. All of these obstacles are compounded
by the expected resistance to any technological innovation that has the potential to
change or erode established models for advertising and search optimization, which
are based on pricing of individual keywords or noun phrases, rather than relation-
ships or more complex linguistic constructs.
Nevertheless, with the increasing amount of high value content made available on
the Web and increased user sophistication, we have reasons to believe that a segment
Search WWH ::




Custom Search