Database Reference
In-Depth Information
The structure of the rest of this paper is as follows: In Section 2, we define relevant
terms used in transaction data, frequent item sets and association rule mining. In
Section 3, we consider a general version of the optimal item packages problem and
present an integer linear programming formulation for the same. In Section 4, we
illustrate the model by a sample profit optimal item packages problem, provide its ILP
formulation and the result of processing it using the commercial mathematical
programming software (CPLEX). Finally, we conclude our paper in Section 5
providing pointers for further work.
2 Transaction Data - Notations and Definitions
Transaction data refers to information about transactions such as the purchases in a
store, each purchase described by a transaction ID, customer ID, date of purchase, and
a list of items and their prices. A web transaction log is another example in which
each transaction may denote a user id, web page and time of access.
Let T denote the total number of transactions. Let I = {1, 2, …, N} denote the set
of all potential items that may be included in any transaction and more precisely the
items included in the t th transaction may be denoted by I t , a subset of I, where t ranges
from 1 to T.
The support s of a subset X of the set I of items, is the percentage of transactions in
which X occurs. A set of items X is a frequent item set if its support s is greater than
or equal to a minimum support threshold specified by the user. An association rule is
of the form X Y , where X and Y are frequent item sets that do not have any item in
common. We say that X Y has support s if s% of transactions includes all the
items in X and Y, and confidence c if c% of transactions containing the items of X
also contains the items of Y. A valid association rule is one where the support s and
the confidence c are above user-defined thresholds for support and confidence
respectively. Association rules [10] [11] [12] identify the presence of any significant
correlation in a given data set.
3 Optimal Item Packages Problem
Brijs et al [2] considered a market basket analysis problem for finding an optimal set
of frequent item sets that returns the maximum profit and proposed a mixed integer
linear programming (MILP) formulation of their problem. Their model proposed
maximizing the profit function of frequent item set X, i.e.
L
-
N
max
M(X)
*
P
Cost
*
Q
,
i
i
X
i
where N is the set of all items and L is the set of all frequent item sets X; M(X) is
gross sales margin generated by X; and P X , Q i
{0,1} are decision variables; subject
to
Q
=
ItemMax
, where i is a basic item and ItemMax is the maximum threshold
i
i
N
set for the number of items in X.
Search WWH ::




Custom Search