Information Technology Reference
In-Depth Information
contain columns referring to table names and column names. Such logic orga-
nization is referred to as catalog , and in SQL systems it is stored in a database
called Information Schema (IS for brevity). A fragment sample is shown in
Figure 1. IS can be inspected as a normal database, posing SQL queries to obtain
useful fields to build a new SQL query.
Instead of using tailored dictionaries, we can enrich our knowledge based on
the metadata added by the domain expert, when designing the database. For
example, an answer for the question “ Which rivers run through New York ”can
be found in the GeoQuery corpus (whose structure is stored in IS as shown in
Figure 1).
While we have a simple matching for the word rivers with table river and
column river name , there isn't a direct mapping between the word run in the
question and any of the columns in the metadata. However, the disambiguation of
the term run can be easily performed by looking at the less semantically distant
metadata entry, i.e., traverse . This matching is re-confirmed when investigating
on all possible interpretations of New York in this database (i.e. city name,
state name, etc.), by the existing reference between column traverse in table
river and column state name in table state.
However, a link between both words New and York is not so easy, since there
is no evidence of relatedness between the two words in the metadata: this means
that the whole database should be looked up for their stems. Words can be
matched with lots of values (e.g., ”New York” both as city and as state name,
but also with ”New Jersey”), as shown by Figure 2. We can generate all possible
(even ambiguous) queries exploiting related metadata information (i.e. primary
and foreign keys, constraints, datatypes, etc.) and select the most plausible one
using a re-ranker.
Last but no least, we deal with complex natural language (NL) questions,
containing subordinates, conjunctions and negations and nested SQL queries. In
particular, we designed a mapping algorithm that matches dependencies between
NL components and SQL structure that allows to build a set of possible queries
that answers a given question.
Fig. 1. A DBMS catalog containing GeoQuery and Sakila
 
Search WWH ::




Custom Search