Database Reference
In-Depth Information
On-Line versus Brick-and-Mortar Retailing
We suggested in Section 3.1.3 that an on-line retailer would use similarity measures for items to find pairs of items
that, while they might not be bought by many customers, had a significant fraction of their customers in common. An
on-line retailer could then advertise one item of the pair to the few customers who had bought the other item of the
pair. This methodology makes no sense for a bricks-and-mortar retailer, because unless lots of people buy an item,
it cannot be cost effective to advertise a sale on the item. Thus, the techniques of Chapter 3 are not often useful for
brick-and-mortar retailers.
Conversely, the on-line retailer has little need for the analysis we discuss in this chapter, since it is designed to
search for itemsets that appear frequently. If the on-line retailer was limited to frequent itemsets, they would miss all
the opportunities that are present in the “long tail” to select advertisements for each customer individually.
We shall discuss this aspect of the problem in Section 6.1.3 , but for the moment let us
simply consider the search for frequent itemsets. We will discover by this analysis that
many people buy bread and milk together, but that is of little interest, since we already
knew that these were popular items individually. We might discover that many people buy
hot dogs and mustard together. That, again, should be no surprise to people who like hot
dogs, but it offers the supermarket an opportunity to do some clever marketing. They can
advertise a sale on hot dogs and raise the price of mustard. When people come to the store
for the cheap hot dogs, they often will remember that they need mustard, and buy that too.
Either they will not notice the price is high, or they reason that it is not worth the trouble to
go somewhere else for cheaper mustard.
The famous example of this type is “diapers and beer.” One would hardly expect these
two items to be related, but through data analysis one chain store discovered that people
who buy diapers are unusually likely to buy beer. The theory is that if you buy diapers, you
probably have a baby at home, and if you have a baby, then you are unlikely to be drinking
at a bar; hence you are more likely to bring beer home. The same sort of marketing ploy
that we suggested for hot dogs and mustard could be used for diapers and beer.
However, applications of frequent-itemset analysis is not limited to market baskets. The
same model can be used to mine many other kinds of data. Some examples are:
(1) Related concepts : Let items be words, and let baskets be documents (e.g., Web pages,
blogs, tweets). A basket/document contains those items/words that are present in the
document. If we look for sets of words that appear together in many documents, the
sets will be dominated by the most common words (stop words), as we saw in Example
6.1 . There, even though the intent was to find snippets that talked about cats and dogs,
the stop words “and” and “a” were prominent among the frequent itemsets. However,
if we ignore all the most common words, then we would hope to find among the fre-
quent pairs some pairs of words that represent a joint concept. For example, we would
expect a pair like {Brad, Angelina} to appear with surprising frequency.
Search WWH ::




Custom Search