Concurrency and Multiversioning - Oracle Database Transactions and Locking Revealed

Database Reference

In-Depth Information

A Common Data Warehousing Technique That Fails

A common data warehousing technique I've seen people employ goes like this:

They use a trigger to maintain a LAST_UPDATED column in the source table, much like the

method described in the Chapter 2 in the “Optimistic Locking” section.

1.

2.

To initially populate a data warehouse table, they remember what time it is right now by

selecting out SYSDATE on the source system . For example, suppose it is exactly 9:00 a.m.

right now.

3.

They then pull all of the rows from the transactional system—a full

SELECT * FROM TABLE —to get the data warehouse initially populated.

4.

To refresh the data warehouse, they remember what time it is right now again. For

example, suppose an hour has gone by—it is now 10:00 a.m. on the source system. They

will remember that fact. They then pull all changed records since 9:00 a.m. (the moment

before they started the first pull) and merge them in.

■ This technique may pull the same record twice in two consecutive refreshes. This is unavoidable due to the

granularity of the clock. a MERGE operation will not be affected by this (i.e., update existing record in the data warehouse

or insert a new record).

Note

They believe that they now have all of the records in the data warehouse that were modified since they did the

initial pull. They may actually have all of the records, but just as likely they may not. This technique does work on

some other databases—ones that employ a locking system whereby reads are blocked by writes and vice versa. But in

a system where you have nonblocking reads, the logic is flawed.

To see the flaw in this example, all we need to do is assume that at 9:00 a.m. there was at least one open,

uncommitted transaction. At 8:59:30 a.m., it had updated a row in the table we were to copy. At 9:00 a.m., when we

started pulling the data and thus reading the data in this table, we would not see the modifications to that row; we

would see the last committed version of it. If it was locked when we got to it in our query, we would read around the

lock. If it was committed by the time we got to it, we would still read around it since read consistency permits us to

read only data that was committed in the database when our statement began. We would not read that new version

of the row during the 9:00 a.m. initial pull, nor would we read the modified row during the 10:00 a.m. refresh. The

reason? The 10:00 a.m. refresh would only pull records modified since 9:00 a.m. that morning, but this record was

modified at 8:59:30 a.m. We would never pull this changed record.

In many other databases where reads are blocked by writes and a committed but inconsistent read is

implemented, this refresh process would work perfectly. If at 9:00 a.m. when we did the initial pull of data, we hit that

row and it was locked, we would have blocked and waited for it, and read the committed version. If it were not locked,

we would just read whatever was there, committed.

So, does this mean the preceding logic just cannot be used? No, it means that we need to get the “right now” time

a little differently. We need to query V$TRANSACTION and find out which is the earliest of the current time and the time

recorded in START_TIME column of this view. We will need to pull all records changed since the start time of the oldest

transaction (or the current SYSDATE value if there are no active transactions):

select nvl( min(to_date(start_time,'mm/dd/rr hh24:mi:ss')),sysdate)

from v$transaction;

Oracle Database Transactions and Locking Revealed

Search WWH ::

Custom Search

Home