Databases Reference
In-Depth Information
Object that also include a second Object (i.e., an oil filter, hot dog buns, or
a spoiler). The final calculation is the quotient of the number of Itemsets
including the first and second Objects divided by the number of Itemsets
that include only the first Object. If 50% of the Itemsets that include hot
dogs also include hot dog buns, then the probability of hot dogs coincid-
ing with hot dog buns is 50%. If 25% of the Itemsets that include a quart
of motor oil also include an oil filter, then the probability of motor oil
coinciding with an oil filter is 25%. Knowing that an oil change includes
multiple quarts of oil (in my car, four quarts of oil), you can alter the cal-
culation slightly to look for “oil change” Itemsets—in other words, to look
for the portion of Itemsets that include four quarts of oil and that portion
of Itemsets that include four quarts of oil and an oil filter.
Aἀ nity is calculated as a percentage or probability. When calculated as
a percentage, that percentage is the proportion of past occurrences of two
Objects occurring simultaneously in an Itemset. The probability that those
two Objects will coincide in the future varies directly with the percentage
of past Itemsets wherein the two Objects coincided. No set of two Objects
will always occur in an Itemset. Regardless of the strength of correlation
between two Objects in an Itemset they will not occur simultaneously
every time. Therefore, the Aἀ nity between two Objects is expressed as a
probability. As that probability approaches 100% (i.e., the “always” condi-
tion), the correlation becomes stronger. A set of two Objects will very rarely
reach 100% correlation. Also, as that probability approaches 0% (i.e., the
“never” condition), the correlation becomes weaker and possibly nonexis-
tent (i.e., the two Objects never coincide). The “never” condition (i.e., two
Objects coincide in 0% of the Itemsets) is not so rare. Therefore, the pro-
portion of a set of two Objects coinciding relative to the total number of
Itemsets containing one of the two Objects can be expressed as a percent-
age value between 0% (i.e., never) and 100% (i.e., always). In addition, the
probability of future occurrences of the two Objects in the same Itemset is
based on past occurrences of the two Objects in the same Itemset.
statistics in market Basket Analysis
Clearly, Market Basket Analysis draws heavily from statistics. Itemsets
correspond to Sample Sets. Aἀ nity, Correlation, and Probability are
almost synonymous. So, it stands to reason that a strong background in
statistics would serve a Market Basket Analyst well. An important dis-
tinction between Itemsets of Market Basket Analysis and Sample Sets
Search WWH ::




Custom Search