Data Loading and Unloading - Expert Oracle Database Architecture

Database Reference

In-Depth Information

On lines 3 and 4, I query the df external table. I purposely use a materialize hint to force the optimizer to load the

df data into the equivalent of a global temporary table, because the query would tend to read and reread the external

table over and over, and the results of the df table could change as the query runs. This provides the logical equivalent

of a consistent read on the df data. Also, if the query plan did involve rereading the external table, I would receive an

error message at runtime:

KUP-04108 unable to reread file string

The documentation explains the error:

• Cause : The query that is executing requires that a datasource for an external table be read

multiple times. However, the datasource is a sequential device that cannot be reread.

Examples of this type of datasource are a tape or a pipe.

• Action : There are a few ways around this problem. One is to rewrite the query so that

the external table is referenced only once. Another option is to move the datasource to a

rereadable device such as a disk file. A third option is to load the data for the external table

into a temporary table and change the query so that it references the temporary table.

On lines 27-30, I join DBA_DATA_FILES to df data with a WHERE clause, using LIKE . This will join every file in

DBA_DATA_FILES to every possible mount point in the df output. I know that the goal, however, is to find the “longest”

matching mount point, so to accomplish that I assign—on lines 24-26—a ROW_NUMBER to each row. This ROW_NUMBER

will be sequentially assigned to each duplicated row in DBA_DATA_FILES , so if the FILE_NAME matches more than one

MOUNT , each FILE_NAME occurrence will be assigned a unique, increasing ROW_NUMBER value. This ROW_NUMBER will be

assigned after the data is sorted by the length of the MOUNT , from big to small.

Once I have that data, I apply a WHERE clause to save only the first entry for each FILE_NAME value—that predicate

is WHERE rn = 1 , on line 32. At the same time, I've added another column— TOT_MBYTES —on lines 18-19. This will

enable me to verify the 20 percent threshold.

The last step is to format and output the data. I print the columns I'm interested in and add a CASE statement on

lines 11-15 to verify that 20 percent of the total bytes of storage allocated on a given mount point does not exceed the

remaining available bytes of free storage.

So, now you see how to use external tables to query operating system output such as df , ps , find , and ls .

Additionally, you can use them to query anything that writes to standard out, including gunzip , sed , and so on.

Reading and Filtering Compressed Files in a Directory Tree

When working with large data loads, it's common to use compressed files. Normally when working with compressed

files as part of the data load processing, you would have an initial step that uncompresses the file and then another

step to load it into staging tables. With external tables, you can streamline this by instructing the external table to

uncompress the data as it is read from the compressed file.

An example will illustrate this. To set this up, suppose you have multiple zipped files that you want to read and

process. And furthermore, assume that the files are in different directories and that you need the process to search

through the various levels of directories, find the zipped files, and make the data available by selecting from an

external table.

To set this up, we'll first create three directories:

$ mkdir /tmp/base

$ mkdir /tmp/base/base2a

$ mkdir /tmp/base/base2b

Search WWH ::

Custom Search

Home