Extracting Value From Big Data: In-Memory Solutions, Real Time Analytics, And Recommendation Systems - Big Data Imperatives

Databases Reference

In-Depth Information

Referring to Figure 8-2 , the recommender retrieves items and users from the data

model. The data model provides methods that count the total number of users, total

number of items, number of users that prefer a certain item, etc. Similarity functions use

these numbers to compute a similarity value for pairs of items or users. We discussed

several algorithms you can choose from to build a recommender. However, the Mahout

Tanimoto Coefficient Similarity , is a relatively straightforward similarity algorithm that

is widely used in recommendation systems for discovering similarities. Let's illustrate

the algorithm in the context of a webshop. Suppose there are three customers, A, B,

and C, and five products, numbered one up to five. Say each customer has bought a few

products. For this algorithm it does not matter how many products are purchased, only

which products are purchased by which customer.

Table 8-2. Customer-Product Matrix

Customer A

Customer B

Customer C

Product 1

ü

Product 2

ü

Product 3

ü

Product 4

ü

Product 5

ü

Intuitively you may see that the similarity between two products can be expressed

by some ratio of purchases of customers. Simply put, the Tanimoto coefficient uses the

ratio of the intersecting set to the union set as the measure of similarity. Represented as a

mathematical equation:

N

Tab NNN

( ) =

c

,

+−

a

b

c

where

Nc = Number of customers that purchased p1 and p2,

Na = Number of customers that purchased p1, and

Nb = Number of customers that purchased p2

This means that if many customers have bought both the products, the numerator

will be higher and so will be the similarity value. Alternatively, if many people have

bought p1 and many have bought p2, but very few people bought both, then p1 and p2

are probably dissimilar. Table 8-3 shows the calculated Tanimoto coefficients for each

product pair.

Big Data Imperatives

Search WWH ::

Custom Search

Home