Perhaps the biggest problem in performing data mining is in the
availability of useful data. Even if a company is willing to pursue a
data mining project, that project can quickly come to a standstill due
to either a total lack of data to mine, or poor quality data. Data with
many missing values or inaccurate entries can make extracting
meaningful information nearly impossible until the fundamental data
quality or availability issues are resolved. This can involve anything
from establishing a data warehouse, instituting business processes
to collect certain information, or going back to the data sources to
clean up “dirty” data. However, the problem of data availability and
quality is beyond the scope of this topic. We refer readers interested
in this topic to [Kimball1 2004, Phonniah 2004] on data warehous-
ing, and [Pyle 1999] on data preparation.
Getting customers to buy more of a company's products is a key goal
for many marketing managers. Cross-selling is quite prevalent in
online retailers, or e-businesses, where purchasing one product gives
the company an opportunity to sell other products to that same cus-
tomer. Cross-selling can involve identifying complementary or
related products, or even premium products—called up -selling. For
some products, suggestions for cross-selling may be obvious, for
example, staples with staplers and mouse pads with computer mice.
However, others are not so obvious, sometimes involving multiple
products being purchased together.
The term market basket analysis is often applied to this problem.
The data typically used is transaction data , a collection of records
identified by a transaction identifier and containing the set of prod-
ucts or items that were purchased in that transaction. Consider a
supermarket visit at customer checkout. Each customer's shopping
cart or “market basket” contains some set of items. The data mining
technique association rules leverages these buying patterns of custom-
ers to provide insight into which products are commonly purchased
together, known as product relationships . Once we know these rela-
tionships, they can be used to position products on the same Web
page, suggest other products at checkout time, or result in a separate
solicitation via e-mail, mail, or phone.
Another use of identifying product relationships involves under-
standing product demand given promotions on certain products. For
example, consider a store that decides to put a certain computer game