Databases Reference
In-Depth Information
6.1
The Market-Basket Model
The market-basket model of data is used to describe a common form of many-
many relationship between two kinds of objects. On the one hand, we have
items, and on the other we have baskets, sometimes called “transactions.”
Each basket consists of a set of items (an itemset), and usually we assume that
the number of items in a basket is small - much smaller than the total number
of items. The number of baskets is usually assumed to be very large, bigger
than what can fit in main memory. The data is assumed to be represented in a
file consisting of a sequence of baskets. In terms of the distributed file system
described in Section 2.1, the baskets are the objects of the file, and each basket
is of type “set of items.”
6.1.1 Definition of Frequent Itemsets
Intuitively, a set of items that appears in many baskets is said to be “frequent.”
To be formal, we assume there is a number s, called the support threshold. If
I is a set of items, the support for I is the number of baskets for which I is a
subset. We say I is frequent if its support is s or more.
Example 6.1 : In Fig. 6.1 are sets of words. Each set is a basket, and the
words are items. We took these sets by googling cat dog and taking snippets
from the highest-ranked pages. Do not be concerned if a word appears twice
in a basket, as baskets are sets, and in principle items can appear only once.
Also, ignore capitalization.
1.{Cat, and, dog, bites}
2.{Yahoo, news, claims, a, cat, mated, with, a, dog, and, produced, viable,
offspring}
3.{Cat, killer, likely, is, a, big, dog}
4.{Professional, free, advice, on, dog, training, puppy, training}
5.{Cat, and, kitten, training, and, behavior}
6.{Dog, &, Cat, provides, dog, training, in, Eugene, Oregon}
7.{“Dog, and, cat”, is, a, slang, term, used, by, police, o cers, for, a, male-
female, relationship}
8.{Shop, for, your, show, dog, grooming, and, pet, supplies}
Figure 6.1: Here are eight baskets, each consisting of items that are words
Since the empty set is a subset of any set, the support for∅is 8. However,
we shall not generally concern ourselves with the empty set, since it tells us
Search WWH ::




Custom Search