Databases Reference
In-Depth Information
The recursive join juxtaposes all the objects in an Itemset against all the
other objects in an Itemset. This is the crux of the Market Basket Analysis
application as it provides the first answer to the question in the Market
Basket Scope Statement. The recursive join juxtaposes the Driver and
Correlation Objects in a single row of data for the first time.
The inequality at the bottom of Figure  7.2 (A.OBJECT_KEY <>
B.OBJECT_KEY) prevents an object in the Itemset from joining to itself.
Without that inequality, all objects in an Itemset would have one row
wherein that object is both the Driver Object and the Correlation Object.
However, with that inequality in place, an object can be only the Driver
Object or the Correlation Object, never both.
The other inequality at the bottom of Figure 7.2 (A.OBJECT_KEY <>
0) prevents the zero-filled row that was manufactured for the case of the
Single Object Itemset from appearing as a Driver Object. Table 7.3 shows
the effect of this inequality. The first two rows of Table 7.3 show the Single
Object Itemset as it would be without the inequality. In one row, the Object_
Key 15 is the Driver Object and Object_Key 0 is the Correlation Object. In
the other row, the Object_Key 0 is the Driver Object and Object_Key 15 is
the Correlation Object. The meaning of the zero-filled row for the Single
Object Itemset is to indicate that nothing happened. When a customer
purchases one item, what else did that customer purchase? Nothing. So,
rather than allow a zero-filled row to be a Driver Object, the inequality
excludes rows wherein the Driver_Object_Key = 0.
The Market Basket Table was a staging table. Reading the data from
the Market Basket Table, the SQL in Figure  7.2 performs the recur-
sive join, prevents objects from being both the Driver Object and the
Correlation Object, and prevents the zero-filled object from being a
Driver Object as it then loads the data into the Market Basket BI Table.
The output of this SQL and the rows in the Market Basket BI Table are
shown in Table 7.4.
The DRIVER_COUNT and CORR_COUNT output fields facilitate a
count of the distinct objects in an Itemset. These two fields are manu-
factured in the SQL by the statements “1 AS DRIVER_COUNT” and
“1 AS CORR_COUNT.” The manufacture of these fields could be delayed
until the next step. However, understanding and validating the data in
the Market Basket tables is confusing enough on its own. The analysts in
a Market Basket Analysis effort are often better served by the presence of
these COUNT fields in the Market Basket BI Table than by assuming their
presence in the Market Basket BI View.
Search WWH ::




Custom Search