Database Reference
In-Depth Information
The structure of the rest of this paper is as follows: In Section 2, we define relevant
terms used in transaction data, frequent item sets and association rule mining. In
Section 3, we consider a general version of the optimal item packages problem and
present an integer linear programming formulation for the same. In Section 4, we
illustrate the model by a sample profit optimal item packages problem, provide its ILP
formulation and the result of processing it using the commercial mathematical
programming software (CPLEX). Finally, we conclude our paper in Section 5
providing pointers for further work.
2 Transaction Data - Notations and Definitions
Transaction data
refers to information about transactions such as the purchases in a
store, each purchase described by a transaction ID, customer ID, date of purchase, and
a list of items and their prices. A web transaction log is another example in which
each transaction may denote a user id, web page and time of access.
Let T denote the total number of transactions. Let I = {1, 2, …, N} denote the set
of all potential items that may be included in any transaction and more precisely the
items included in the t
th
transaction may be denoted by I
t
, a subset of I, where t ranges
from 1 to T.
The
support
s of a subset X of the set I of items, is the percentage of transactions in
which X occurs. A set of items X is a
frequent item set
if
its support
s is greater than
or equal to a minimum support threshold specified by the user. An
association rule
is
of the form
X
⇒
Y
, where X and Y are frequent item sets that do not have any item in
common. We say that
X
⇒
Y
has
support
s if s% of transactions includes all the
items in X and Y, and
confidence
c if c% of transactions containing the items of X
also contains the items of Y. A valid association rule is one where the support s and
the confidence c are above user-defined thresholds for support and confidence
respectively. Association rules [10] [11] [12] identify the presence of any significant
correlation in a given data set.
3 Optimal Item Packages Problem
Brijs et al [2] considered a market basket analysis problem for finding an optimal set
of frequent item sets that returns the maximum profit and proposed a mixed integer
linear programming (MILP) formulation of their problem. Their model proposed
maximizing the profit function of frequent item set X, i.e.
∑
∈
L
-
∑
∈
N
max
M(X)
*
P
Cost
*
Q
,
i
i
X
i
where N is the set of all items and L is the set of all frequent item sets X; M(X) is
gross sales margin generated by X; and P
X
,
Q
i
∈
{0,1} are decision variables; subject
to
∑
∈
Q
=
ItemMax
, where i is a basic item and
ItemMax
is the maximum threshold
i
i
N
set for the number of items in X.
Search WWH ::
Custom Search