Databases Reference
In-Depth Information
flexibility of the Itemset
An Itemset can hold an almost infinite number of permutations of objects.
The minimum number of objects in an Itemset is one. Without at least that
first object, the Itemset does not exist. The maximum number of objects
in an Itemset is limited only by the application that created the data and
the application that stores the data. If the application that created the data
has a l imit of two thousand items in a t ransaction, then the maximum
number of objects in an Itemset is that limit—two thousand. As such, an
Itemset is an array. The procedural logic of a stored procedure or Cobol
occurs statement works well with an array. Procedural logic is able to read
through the array, each time cataloging each object in the array until it
reads the end of the array. While reading through the array, procedural
logic can keep a log of the other objects in the array and a tally of the num-
ber of times those other objects occurred in the array. Then, procedural
logic can repeat that process beginning with the second object in the array.
The third time through the array can begin with the third object in the
array. The algorithm for reading through an array is to let the n th object
be the Driver Object in the n th iteration through the array and let all other
objects be the Correlation Object. In this way, procedural logic can read
through the objects in an Itemset by treating the Itemset as an array.
Set logic, the basis of relational SQL, lacks the flexibility of procedural
logic because set logic is based on a recurring set of rows of data. This works
well if the rows of data fit in the same set definition. The flexibility of an
Itemset, however, causes the data of an Itemset to not fit in a single set defi-
nition, as one row will have data in only one column while another row will
have data in two hundred columns. Therefore, the set logic inherent in SQL
does not work well due to the flexible number of objects in each Itemset.
The Market Basket Analysis solution design in Chapter 5 uses relational
tables and SQL. However, the solution design in Chapter 5 breaks the array
of an Itemset into a defined pattern of a Driver Object and a Correlation
Object. That way, the solution design does not need to know how many
objects are in the array. In the solution design the array is broken into pairs
of objects and then the array completely disappears. In this way, the flex-
ibility of the array of an Itemset is replaced by a predetermined definition
of pairs of objects. Once the array of an Itemset is converted into a set of
rows of data, each containing a pair of objects, the advantages of set logic
and relational SQL are leveraged by the most basic of SQL statements—
SUM and GROUP BY.
Search WWH ::




Custom Search