Database Reference
In-Depth Information
HeadquarteredIn
Company City P
microsoft redmond 1.00
ibm san_jose 0.99 Y 1
emirates_airlines dubai
ProducesProduct
Company
Product
P
0.96
sony
walkman
0.96
microsoft
mac_os_x
0.96
X 1
0.93
ibm
personal_computer
adobe
adobe_illustrator
0.96
torrance 0.93
honda
0.9
microsoft
mac_os
0.93
horizon
seattle
adobe
adobe_indesign
0.9
egyptair
cairo
0.93
0.87
X 2
adobe
adobe_dreamweaver
san_jose 0.93 Y 2
adobe
Figure 1.2: NELL data stored in a relational database.
Thus, we can use an off-the-shelf relational database system to represent probabilistic data by
simply adding a probability attribute, then use regular SQL to compute the output probabilities and
to rank them by their output probabilities.
The goal of a probabilistic database system is to be a general platform for managing data
with probabilistic semantics. Such a system needs to scale to large database instances and needs to
support complex SQL queries, evaluated using probabilistic semantics. The system needs to perform
probabilistic inference in order to compute query answers. The probabilistic inference component
represents a major challenge. As we will see in subsequent chapters, most SQL queries require quite
complex probabilistic reasoning, even if all input tuples are independent, because the SQL query
itself introduces correlations between the intermediate results, and this makes probabilistic reasoning
difficult.
The type of data uncertainty that we have seen in NELL is called tuple-level uncertainty and is
defined by the fact that for each tuple the system has a degree of confidence in the correctness of that
tuple. Thus, each tuple is a random variable. In other settings, one finds attribute-level uncertainty ,
where the value of an attribute is a random variable: it can have one of several choices, and each
choice has an associated probability.
We illustrate attribute-level uncertainty with a second example. Google Squared 2 is an online
service that presents tabular views over unstructured data that are collected and aggregated from
public Web pages. It organizes the data into tables, where rows correspond to tuples and columns
correspond to attributes, but each value has a number of possible choices. For example, the square in
Figure 1.3 is computed by Google Squared in response to the keyword query “comedy movies”. The
default answer has 20 rows and 8 columns: each row represents a movie (“The Mask”, “Scary Movie”,
“Superbad”, etc.) and each column represents an attribute (“Item Name”, “Language”, “Director”,
2 http:/ /www.google.com/squared
 
Search WWH ::




Custom Search