Big Data Analytics - Microsoft Big Data Solutions

Database Reference

In-Depth Information

NOTE

One of the most difficult parts of building a recommendation engine is

determining how to quantify and weight non-numeric explicit data.

Determining the right balance and scale is as much art as science and

requires some experimentation.

In the case of implicit data, when used singularly (that is, only purchase

history or only click history), it represents a Boolean data type. It is not

necessary to represent the negative cases in these models because

missing data translates to false. When multiple implicit data points are

combined, it is necessary to scale them appropriately.

Table 12.2 Examples of Explicit and Implicit Data

Explicit

Implicit

Ratings

Feedback

Demographics

Psychographics (personality/lifestyle/attitude)

Ephemeral need (need for a moment)

Purchase history

Clicks

Browse history

Clustering, unlike collaborative filtering, focuses instead on an item's

taxonomies, attributes, description, or properties. It does not need

behavioral or interaction data, and is often a good choice when the data

required for collaborative filtering is not available.

To generate recommendations, Mahout supports collaborative filtering and

clustering. To understand what each of these are, we can look at the three

common recommendation engine implementations:

1. User-to-user collaborative filtering : In a user-to-user

recommendation implementation, clusters or neighborhoods of similar

users are formed based on some user behavior (for instance, purchasing

an item or attending a movie). Because similar users are clustered

together, these clusters are then used to generate recommendations.

2. Item-to-item collaborative filtering : The item-to-item

recommendation implementation works in a similar manner to that of

Search WWH ::

Custom Search

Home