Database Reference
In-Depth Information
On lines 3 and 4, I query the
df
external table. I purposely use a materialize hint to force the optimizer to load the
df
data into the equivalent of a global temporary table, because the query would tend to read and reread the external
table over and over, and the results of the
df
table could change as the query runs. This provides the logical equivalent
of a consistent read on the
df
data. Also, if the query plan did involve rereading the external table, I would receive an
error message at runtime:
KUP-04108 unable to reread file string
The documentation explains the error:
•
Cause
: The query that is executing requires that a datasource for an external table be read
multiple times. However, the datasource is a sequential device that cannot be reread.
Examples of this type of datasource are a tape or a pipe.
•
Action
: There are a few ways around this problem. One is to rewrite the query so that
the external table is referenced only once. Another option is to move the datasource to a
rereadable device such as a disk file. A third option is to load the data for the external table
into a temporary table and change the query so that it references the temporary table.
On lines 27-30, I join
DBA_DATA_FILES
to
df
data with a
WHERE
clause, using
LIKE
. This will join every file in
DBA_DATA_FILES
to every possible mount point in the
df
output. I know that the goal, however, is to find the “longest”
matching mount point, so to accomplish that I assign—on lines 24-26—a
ROW_NUMBER
to each row. This
ROW_NUMBER
will be sequentially assigned to each duplicated row in
DBA_DATA_FILES
, so if the
FILE_NAME
matches more than one
MOUNT
, each
FILE_NAME
occurrence will be assigned a unique, increasing
ROW_NUMBER
value. This
ROW_NUMBER
will be
assigned after the data is sorted by the length of the
MOUNT
, from big to small.
Once I have that data, I apply a
WHERE
clause to save only the first entry for each
FILE_NAME
value—that predicate
is
WHERE rn = 1
, on line 32. At the same time, I've added another column—
TOT_MBYTES
—on lines 18-19. This will
enable me to verify the 20 percent threshold.
The last step is to format and output the data. I print the columns I'm interested in and add a
CASE
statement on
lines 11-15 to verify that 20 percent of the total bytes of storage allocated on a given mount point does not exceed the
remaining available bytes of free storage.
So, now you see how to use external tables to query operating system output such as
df
,
ps
,
find
, and
ls
.
Additionally, you can use them to query anything that writes to standard out, including
gunzip
,
sed
, and so on.
Reading and Filtering Compressed Files in a Directory Tree
When working with large data loads, it's common to use compressed files. Normally when working with compressed
files as part of the data load processing, you would have an initial step that uncompresses the file and then another
step to load it into staging tables. With external tables, you can streamline this by instructing the external table to
uncompress the data as it is read from the compressed file.
An example will illustrate this. To set this up, suppose you have multiple zipped files that you want to read and
process. And furthermore, assume that the files are in different directories and that you need the process to search
through the various levels of directories, find the zipped files, and make the data available by selecting from an
external table.
To set this up, we'll first create three directories:
$ mkdir /tmp/base
$ mkdir /tmp/base/base2a
$ mkdir /tmp/base/base2b