Data Mining Trends and Research Frontiers - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

“How is the utility of an item estimated for a user?” In content-based methods, it

is estimated based on the utilities assigned by the same user to other items that are

similar. Many such systems focus on recommending items containing textual infor-

mation, such as web sites, articles, and news messages. They look for commonalities

among items. For movies, they may look for similar genres, directors, or actors. For

articles, they may look for similar terms. Content-based methods are rooted in infor-

mation theory. They make use of keywords (describing the items) and user profiles

that contain information about users' tastes and needs. Such profiles may be obtained

explicitly (e.g., through questionnaires) or learned from users' transactional behavior

over time.

A collaborative recommender system tries to predict the utility of items for a user,

u , based on items previously rated by other users who are similar to u . For example,

when recommending topics, a collaborative recommender system tries to find other

users who have a history of agreeing with u (e.g., they tend to buy similar topics, or give

similar ratings for topics). Collaborative recommender systems can be either memory

(or heuristic) based or model based.

Memory-based methods essentially use heuristics to make rating predictions based

on the entire collection of items previously rated by users. That is, the unknown rating

of an item-user combination can be estimated as an aggregate of ratings of the most

similar users for the same item. Typically, a k -nearest-neighbor approach is used, that is,

we find the k other users (or neighbors) that are most similar to our target user, u . Vari-

ous approaches can be used to compute the similarity between users. The most popular

approaches use either Pearson's correlation coefficient (Section 3.3.2) or cosine simi-

larity (Section 2.4.7). A weighted aggregate can be used, which adjusts for the fact that

different users may use the rating scale differently. Model-based collaborative recom-

mender systems use a collection of ratings to learn a model, which is then used to make

rating predictions. For example, probabilistic models, clustering (which finds clusters

of like-minded customers), Bayesian networks, and other machine learning techniques

have been used.

Recommender systems face major challenges such as scalability and ensuring qual-

ity recommendations to the consumer. For example, regarding scalability, collaborative

recommender systems must be able to search through millions of potential neighbors

in real time. If the site is using browsing patterns as indications of product prefer-

ence, it may have thousands of data points for some of its customers. Ensuring quality

recommendations is essential to gain consumers' trust. If consumers follow a system

recommendation but then do not end up liking the product, they are less likely to use

the recommender system again.

As with classification systems, recommender systems can make two types of errors:

false negatives and false positives. Here, false negatives are products that the system

fails to recommend, although the consumer would like them. False positives are prod-

ucts that are recommended, but which the consumer does not like. False positives

are less desirable because they can annoy or anger consumers. Content-based recom-

mender systems are limited by the features used to describe the items they recommend.

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home