Database Reference
In-Depth Information
9.2.1
Item Profiles
In a content-based system, we must construct for each item a profile , which is a record or
collection of records representing important characteristics of that item. In simple cases, the
profile consists of some characteristics of the item that are easily discovered. For example,
consider the features of a movie that might be relevant to a recommendation system.
(1) The set of actors of the movie. Some viewers prefer movies with their favorite actors.
(2) The director. Some viewers have a preference for the work of certain directors.
(3) The year in which the movie was made. Some viewers prefer old movies; others watch
only the latest releases.
(4) The genre or general type of movie. Some viewers like only comedies, others dramas
or romances.
There are many other features of movies that could be used as well. Except for the last,
genre, the information is readily available from descriptions of movies. Genre is a vaguer
concept. However, movie reviews generally assign a genre from a set of commonly used
terms. For example the Internet Movie Database (IMDB) assigns a genre or genres to every
movie. We shall discuss mechanical construction of genres in Section 9.3.3 .
Many other classes of items also allow us to obtain features from available data, even if
that data must at some point be entered by hand. For instance, products often have descrip-
tions written by the manufacturer, giving features relevant to that class of product (e.g.,
the screen size and cabinet color for a TV). Topics have descriptions similar to those for
movies, so we can obtain features such as author, year of publication, and genre. Music
products such as CD's and MP3 downloads have available features such as artist, com-
poser, and genre.
9.2.2
Discovering Features of Documents
There are other classes of items where it is not immediately apparent what the values of
features should be. We shall consider two of them: document collections and images. Docu-
ments present special problems, and we shall discuss the technology for extracting features
from documents in this section. Images will be discussed in Section 9.2.3 as an important
example where user-supplied features have some hope of success.
There are many kinds of documents for which a recommendation system can be useful.
For example, there are many news articles published each day, and we cannot read all of
them. A recommendation system can suggest articles on topics a user is interested in, but
how can we distinguish among topics? Web pages are also a collection of documents. Can
Search WWH ::




Custom Search