Recommendation Systems - Mining of Massive Datasets

Databases Reference

In-Depth Information

be little reason to try to cluster into a small number of clusters immediately.

Rather, a hierarchical approach, where we leave many clusters unmerged may

su ce as a first step.

For example, we might leave half as many clusters as

there are items.

HP

TW

SW

A

4

5

1

B

4.67

C

2

4.5

D

3

Figure 9.7: Utility matrix for users and clusters of items

Example 9.10 : Figure 9.7 shows what happens to the utility matrix of Fig. 9.4

if we manage to cluster the three Harry-Potter movies into one cluster, denoted

HP, and also cluster the three Star-Wars movies into one cluster SW.

2

Having clustered items to an extent, we can revise the utility matrix so the

columns represent clusters of items, and the entry for user U and cluster C is

the average rating that U gave to those members of cluster C that U did rate.

Note that U may have rated none of the cluster members, in which case the

entry for U and C is still blank.

We can use this revised utility matrix to cluster users, again using the dis-

tance measure we consider most appropriate. Use a clustering algorithm that

again leaves many clusters, e.g., half as many clusters as there are users. Re-

vise the utility matrix, so the rows correspond to clusters of users, just as the

columns correspond to clusters of items. As for item-clusters, compute the

entry for a user cluster by averaging the ratings of the users in the cluster.

Now, this process can be repeated several times if we like. That is, we can

cluster the item clusters and again merge the columns of the utility matrix that

belong to one cluster. We can then turn to the users again, and cluster the

user clusters. The process can repeat until we have an intuitively reasonable

number of clusters of each kind.

Once we have clustered the users and/or items to the desired extent and

computed the cluster-cluster utility matrix, we can estimate entries in the orig-

inal utility matrix as follows. Suppose we want to predict the entry for user U

and item I:

(a) Find the clusters to which U and I belong, say clusters C and D, respec-

tively.

(b) If the entry in the cluster-cluster utility matrix for C and D is something

other than blank, use this value as the estimated value for the U -I entry

in the original utility matrix.

Mining of Massive Datasets

Search WWH ::

Custom Search

Home