Advanced Techniques - Probabilistic Databases

Database Reference

In-Depth Information

sampling; thus, Normal samples a value and returns its result. In general, a VG function in MCDB

may return relations, rather than values. The function Normal(mean,stdv) returns a relation with

a single row, and a single attribute value .

Instead of writing the parameters of the normal distribution in the view definition, we can

store them in a separate table, call it NormParam(mean,stdev) , which consists of a single row. This

allows users to change the statistical model easily by updating the NormParam table:

CREATE TABLE CustIncome(cid, name, region, gender, age, income)

FOR EACH d IN Customer

WITH Income as Normal

(SELECT p.mean, p.stdev

FROM NormParam p)

SELECT d.cid, d.name, d.region, d.gender, d.age, x.value

FROM Income x

Next, we will refine this example in two ways. First, we replace the normal function with

the gamma function, which always returns a positive value and has three parameters, shift, scale,

and shape 3 . Second, we store the three parameters using different approaches: customers have the

same shift parameter, which is stored in a single-row table CustShift(shift) ; the scale param-

eter depends on the customer's region, and it is stored in a table CustRegionScale(region,

scale) ; and the shape parameter is known for each customer individually, and stored in a table

CustShape(shape) . The new query becomes:

CREATE TABLE CustIncome(cid, name, region, gender, age, income)

FOR EACH d IN Customer

WITH Income as Gamma

((SELECT s.shift FROM CustShift s),

(SELECT s.scale FROM CustRegionScale s WHERE d.region=s.region),

(SELECT s.shape FROM CustShape s WHERE d.cid=s.cid))

SELECT d.cid, d.name, d.region, d.gender, d.age, x.value

FROM Income x

6.3.2 QUERY EVALUATION IN MCDB

Given the complexity of the probabilistic space, query evaluation in an MCDB is approached

differently from Chapter 4 and Chapter 5 . Instead of searching for tractable cases, MCDBs run

Monte Carlo simulations. This has the major advantage that it is a uniform approach applicable to

3 The gamma distribution with shift (or threshold) θ , scale σ> 0, and shape α> 0 has the density given by:

x

α − 1

exp

1

α(α)

−

θ

x

−

θ

p(x) =

−

when x> 0

σ

Probabilistic Databases

Search WWH ::

Custom Search

Home