Database Reference
In-Depth Information
does not tell us anything useful about their features. We can calculate simple properties of
pixels, such as the average amount of red in the picture, but few users are looking for red
pictures or especially like red pictures.
There have been a number of attempts to obtain information about features of items by
inviting users to tag the items by entering words or phrases that describe the item. Thus,
one picture with a lot of red might be tagged “Tiananmen Square,” while another is tagged
“sunset at Malibu.” The distinction is not something that could be discovered by existing
image-analysis programs.
Two Kinds of Document Similarity
Recall that in Section 3.4 we gave a method for finding documents that were “similar,” using shingling, minhashing,
and LSH. There, the notion of similarity was lexical - documents are similar if they contain large, identical sequences
of characters. For recommendation systems, the notion of similarity is different. We are interested only in the occur-
rences of many important words in both documents, even if there is little lexical similarity between the documents.
However, the methodology for finding similar documents remains almost the same. Once we have a distance measure,
either Jaccard or cosine, we can use minhashing (for Jaccard) or random hyperplanes (for cosine distance; see Section
3.7.2 ) feeding data to an LSH algorithm to find the pairs of documents that are similar in the sense of sharing many
common keywords.
Tags from Computer Games
An interesting direction for encouraging tagging is the “games” approach pioneered by Luis von Ahn. He enabled two
players to collaborate on the tag for an image. In rounds, they would suggest a tag, and the tags would be exchanged.
If they agreed, then they “won,” and if not, they would play another round with the same image, trying to agree sim-
ultaneously on a tag. While an innovative direction to try, it is questionable whether sufficient public interest can be
generated to produce enough free work to satisfy the needs for tagged data.
Almost any kind of data can have its features described by tags. One of the earliest at-
tempts to tag massive amounts of data was the site del.icio.us, later bought by Yahoo!,
which invited users to tag Web pages. The goal of this tagging was to make a new method
of search available, where users entered a set of tags as their search query, and the system
retrieved the Web pages that had been tagged that way. However, it is also possible to use
the tags as a recommendation system. If it is observed that a user retrieves or bookmarks
many pages with a certain set of tags, then we can recommend other pages with the same
tags.
The problem with tagging as an approach to feature discovery is that the process only
works if users are willing to take the trouble to create the tags, and there are enough tags
that occasional erroneous ones will not bias the system too much.
Search WWH ::




Custom Search