Database Reference
In-Depth Information
On lines 3 and 4, I query the df external table. I purposely use a materialize hint to force the optimizer to load the
df data into the equivalent of a global temporary table, because the query would tend to read and reread the external
table over and over, and the results of the df table could change as the query runs. This provides the logical equivalent
of a consistent read on the df data. Also, if the query plan did involve rereading the external table, I would receive an
error message at runtime:
KUP-04108 unable to reread file string
The documentation explains the error:
Cause : The query that is executing requires that a datasource for an external table be read
multiple times. However, the datasource is a sequential device that cannot be reread.
Examples of this type of datasource are a tape or a pipe.
Action : There are a few ways around this problem. One is to rewrite the query so that
the external table is referenced only once. Another option is to move the datasource to a
rereadable device such as a disk file. A third option is to load the data for the external table
into a temporary table and change the query so that it references the temporary table.
On lines 27-30, I join DBA_DATA_FILES to df data with a WHERE clause, using LIKE . This will join every file in
DBA_DATA_FILES to every possible mount point in the df output. I know that the goal, however, is to find the “longest”
matching mount point, so to accomplish that I assign—on lines 24-26—a ROW_NUMBER to each row. This ROW_NUMBER
will be sequentially assigned to each duplicated row in DBA_DATA_FILES , so if the FILE_NAME matches more than one
MOUNT , each FILE_NAME occurrence will be assigned a unique, increasing ROW_NUMBER value. This ROW_NUMBER will be
assigned after the data is sorted by the length of the MOUNT , from big to small.
Once I have that data, I apply a WHERE clause to save only the first entry for each FILE_NAME value—that predicate
is WHERE rn = 1 , on line 32. At the same time, I've added another column— TOT_MBYTES —on lines 18-19. This will
enable me to verify the 20 percent threshold.
The last step is to format and output the data. I print the columns I'm interested in and add a CASE statement on
lines 11-15 to verify that 20 percent of the total bytes of storage allocated on a given mount point does not exceed the
remaining available bytes of free storage.
So, now you see how to use external tables to query operating system output such as df , ps , find , and ls .
Additionally, you can use them to query anything that writes to standard out, including gunzip , sed , and so on.
Reading and Filtering Compressed Files in a Directory Tree
When working with large data loads, it's common to use compressed files. Normally when working with compressed
files as part of the data load processing, you would have an initial step that uncompresses the file and then another
step to load it into staging tables. With external tables, you can streamline this by instructing the external table to
uncompress the data as it is read from the compressed file.
An example will illustrate this. To set this up, suppose you have multiple zipped files that you want to read and
process. And furthermore, assume that the files are in different directories and that you need the process to search
through the various levels of directories, find the zipped files, and make the data available by selecting from an
external table.
To set this up, we'll first create three directories:
$ mkdir /tmp/base
$ mkdir /tmp/base/base2a
$ mkdir /tmp/base/base2b
 
Search WWH ::




Custom Search