Databases Reference
In-Depth Information
multiplied by a factor of n *( n - 1), where n = the number of rows. That is
by definition a product join, otherwise known as a Cartesian join. While
the optimization suggestions mentioned above will avoid significant I/O,
the product join will incur significant CPU cycles. For that reason, data is
loaded into the Market Basket Table in manageable batches, which will be
discussed in Chapter 7. This will prevent the recursive join of the Market
Basket Table from overwhelming the RDBMS.
construction of the market Basket BI table
The Market Basket BI Table is also a class of Market Basket tables. A
Market Basket application may include multiple Market Basket BI Tables.
A Market Basket BI Table involved in the same analysis effort should be
based on the same DDL so the result set of the Market Basket Analysis BI
View will return comparable data. Multiple Market Basket BI Tables allow
an analyst to engage in iterative analysis without re-creating a Ma rket
Basket BI Table between every iteration of analysis.
The purpose of the Market Basket BI Table is to facilitate the SUM
and GROUP BY operations in the query in Figure  5.7. Unfortunately,
the query in Figure 5.7 uses more than one GROUP BY statement. The
Market Basket Analysis BI View has two GROUP BY statements that
apply directly to the Market Basket BI Table. The first GROUP BY, shown
in Figure 6.3, is used to build the Driver half of the Itemsets. The second
GROUP BY, shown in Figure 6.4, is used to build the Correlation half of
the Itemsets.
The approach to optimize the GROUP BY statements in Figure 6.3 and
Figure  6.4 is a combination of a primary index and a secondary index,
which are available in all RDBMS platforms. One approach is to apply
a primary index to the Driver Group By in Figure 6.3, and a secondary
index to the Correlation Group By in Figure 6.4. The other approach is to
apply a primary index to the Correlation Group By in Figure 6.4, and a
secondary index to the Driver Group By in Figure 6.3. The index method
chosen for a Market Basket BI Table can be based on either approach—
primary Driver/secondary Correlation or primary Correlation/secondary
driver. The decision to use one approach rather than the other is based
only on the performance delivered by the index method. The SUM and
GROUP BY operations have always been the bread and butter of relational
tables. The optimization of a SUM and GROUP BY operation, therefore,
should be rather simple.
Search WWH ::




Custom Search