We Need More Science - Database Design and Relational Theory

Databases Reference

In-Depth Information

a.

The result of the query is kept in the database under the specified name (TOTALS in the example) as a read-

only relvar (read-only, that is, apart from the periodic refresh─see point b. immediately following).

b.

Periodically (ON EVERY UPDATE in the example) the snapshot is refreshed ─that is, its current value is

discarded, the query is executed again, and the result of that new execution becomes the new snapshot value.

The general form of the REFRESH clause is

REFRESH EVERY <now and then>

where <now and then> might be, for example, MONTH or WEEK or DAY or HOUR or n MINUTES or

MONDAY or WEEKDAY (and so on). In particular, the specification REFRESH [ON] EVERY UPDATE means

the snapshot is kept permanently in synch with the relvar(s) from which it is derived─which is presumably just what

we want, in the case of Example 12.

Now, in this section so far I've concentrated on Example 12 and “derived data.” However, the fact is that all

forms of redundancy can be thought of as derived data: If x is redundant, then by definition x can be derived from

something else in the database. (Limiting use of the term derived data to the kind of situation illustrated by

Example 12 is thus misleading, and not recommended.) It follows that the foregoing analysis─in particular, the four

different approaches to dealing with derived data─can be generalized to apply to all kinds of redundancy, at least in

principle. Note in particular that the third and fourth of those approaches, using views and snapshots respectively,

both constitute examples of what's sometimes called controlled redundancy. Redundancy is said to be controlled if

it does exist (and the user is aware of it), but the task of “propagating updates” to ensure that it never leads to any

inconsistencies is managed by the system, not the user. Uncontrolled redundancy can be a problem, but controlled

redundancy shouldn't be. In fact, I want to go further─I want to say that while it's probably impossible, and maybe

not even desirable, to eliminate redundancy one hundred percent, any redundancy that isn't eliminated ought at least

to be controlled. In particular, we need support for snapshots. (Fortunately, many commercial products do now

support snapshots, albeit under the deprecated name materialized views .)

REFINING THE DEFINITION

I've deliberately left this section to the very end of the chapter (almost). Consider the shipments relvar SP, with its

predicate Supplier SNO supplies part PNO in quantity QTY . Consider also the relation shown as the value of that

relvar in Fig. 1.1 in Chapter 1. Observe that:

a.

Two of the tuples in that relation are (S1,P5,100) and (S1,P6,100).

b.

Both of those tuples include (S1,100) as a subtuple.

What do those two appearances of that subtuple mean? Well, the appearance in (S1,P5,100) means:

1.

Supplier S1 supplies some part in quantity 100.

(I've numbered this proposition─note that it is indeed a proposition─for purposes of future reference.) And the

appearance in (S1,P6,100) means exactly the same thing! So don't we have here a situation in which the database

contains two distinct appearances of some tuple that represent the very same proposition? In other words, in

accordance with the definition I gave in the introduction to this chapter, doesn't the database contain some

redundancy?

Search WWH ::

Custom Search

Home