Databases Reference
In-Depth Information
5.
On the new line, type the following code:
Where Exists([%Aircraft Type ID], AC_TYPEID);
6.
Now, locate the LOAD statement in which the
Aircraft_2010_Update.csv
table file is being loaded, and add the following Where clause at the end
in a similar manner (the final semicolon gets replaced):
Where Exists([%Aircraft Type ID], AC_TYPEID);
7.
Save
and
Reload
the script.
The non-matching aircrafts are no longer in the data model after the reload. The
code that we added to the script uses a
WHERE
clause combined with the
Exists()
function. We are essentially filtering out any records in which the
AC_TYPEID
field.
from the dimension table does not have a corresponding value in
the
%Aircraft Type ID
field already loaded in the
Main Data
table.
The
Exists()
function takes two parameters:
WHERE Exists([%Aircraft Type ID], AC_TYPEID);
The first parameter specifies the field on which we need to check to see if there are
any occurrences of the values contained in the second field, the one specified in the
second parameter.
In some cases, the two fields being compared have the same name in both the input
dimension table and the fact table already loaded. If that's the case, we could use a
simplified, one-parameter, syntax as follows:
Where Exists([%Aircraft Type ID]);
Depending on how the field names from the input table are defined, we should use
the appropriate syntax from the two presented above. The main advantage of the
second scenario (one-parameter syntax) is that, when loading from a QVD, it will
still perform as an optimized load, while the first scenario will not.
An alternative to using the
Exists()
function is the use of the
KEEP
prefix, which
will be added before the
LOAD
keyword. As shown in the previous chapter, by using
LEFT KEEP
or
RIGHT KEEP
, we can limit the records being loaded to those that have a
matching key in the already loaded fact table. A benefit of using this prefix is that the
result set can be limited on multiple fields, while the
Exists()
function can only use
a single field. However, script processing of the
KEEP
prefix can be a lot slower on
larger data sets, so the
Exists()
function is the preferred method whenever possible.
Let's take another approach to dealing with this problem this time using the
KEEP
prefix.