RE-PRESENTING INTERNALIZED PIPELINE DATASETS - Managing Time in Relational Databases

Databases Reference

In-Depth Information

and backups, and their associated transaction logs, will usually

enable us to recreate any state that the database has been in.

They will allow us to re-present six of

the nine temporal

categories we have identified. 3

The three categories that cannot be re-presented from

backups and logfiles are the three categories of future claims—

things we are going to make our databases say (unless we

change our minds) about what things once were like, or are like

now, or may be like in the future. Future claims often start out as

scribbled notes on someone's desk. But once inside the machine,

they exist in transaction datasets, in collections of data that are

intended, at some time or other, to be applied to the database

and become currently asserted data.

In the previous chapter, we called the eight categories of

data which are not current claims about the present, pipeline

datasets, collections of data that exist at various points along

the pipelines leading into production tables or leading out from

them. As physically separate from those production tables, these

collections of data are generally not immediately available for

business use. Usually, IT technical personnel must do some work

on these physical files or tables before a business user can query

them for information.

This takes time, and until the work is complete, the informa-

tion is not available. By the time the work is complete, the busi-

ness value of the information may be much reduced. This work

also has its costs in terms of how much time those technicians

must spend to prepare that data to be queried. In addition, even

without special requests for information in them, these physical

datasets, taken together, constitute a significant management

cost for IT.

With multiple points of rest in the pipelines leading into and

out of production database tables, there are multiple points at

which data can be lost. For example, data can be accidentally

deleted before any copies are made. For datasets in the inflow

pipelines, and which have not yet made it into the database

itself, the only recourse for lost data is to reacquire or recreate

the data. If prior datasets in the pipeline have already been

3 That's the idea, anyway. In reality, this “data of last resort” isn't always there when

we go looking for it. Backups and logfiles are rarely kept forever, so the data we need

may have been purged or written over. There will inevitably be occasional intervals

during which the system hiccupped, and simply failed to capture the data in the first

place. If the data is still available, it might not be in a readily accessible format because

of schema changes made after it was captured.

Managing Time in Relational Databases

Search WWH ::

Custom Search

Home