Databases Reference
In-Depth Information
Referring to Figure 8-2 , the recommender retrieves items and users from the data
model. The data model provides methods that count the total number of users, total
number of items, number of users that prefer a certain item, etc. Similarity functions use
these numbers to compute a similarity value for pairs of items or users. We discussed
several algorithms you can choose from to build a recommender. However, the Mahout
Tanimoto Coefficient Similarity , is a relatively straightforward similarity algorithm that
is widely used in recommendation systems for discovering similarities. Let's illustrate
the algorithm in the context of a webshop. Suppose there are three customers, A, B,
and C, and five products, numbered one up to five. Say each customer has bought a few
products. For this algorithm it does not matter how many products are purchased, only
which products are purchased by which customer.
Table 8-2. Customer-Product Matrix
Customer A
Customer B
Customer C
Product 1
ü
ü
Product 2
ü
ü
Product 3
ü
Product 4
ü
Product 5
ü
ü
ü
Intuitively you may see that the similarity between two products can be expressed
by some ratio of purchases of customers. Simply put, the Tanimoto coefficient uses the
ratio of the intersecting set to the union set as the measure of similarity. Represented as a
mathematical equation:
N
Tab NNN
( ) =
c
,
+−
a
b
c
where
Nc = Number of customers that purchased p1 and p2,
Na = Number of customers that purchased p1, and
Nb = Number of customers that purchased p2
This means that if many customers have bought both the products, the numerator
will be higher and so will be the similarity value. Alternatively, if many people have
bought p1 and many have bought p2, but very few people bought both, then p1 and p2
are probably dissimilar. Table 8-3 shows the calculated Tanimoto coefficients for each
product pair.
 
Search WWH ::




Custom Search