Community-Contributed Media Collections: Knowledge at Our Fingertips - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

user activity). To this aim, Facebook's public APIs were exploited to build a social

network application, Tag!t. The application makes it possible to share and interact

with video content in a broader way than allowed by Facebook's features. Besides

sharing videos and aggregating them in collections, users can tag specific time-

stamps of their friends' videoclips and link additional materials (e.g., videoclips,

images, Web pages) to them. The application also keeps track of user interaction

with the media itself (e.g., play/pause/seek events). Experiments with the Tag!t appli-

cation showed that users within a selected group tend to tag in a similar manner. In

addition, the semantics suggested by a user were found to be scantly biased by other

users' tagging of the same content, thus indicating that collaborative tagging leads

to coherent semantics.

2.5.3 Document Annotation

Wikipedia articles can also be used to analyze words in order to improve keyword

extraction from documents and disambiguation algorithms, as shown in the taxon-

omy, depicted in Fig. 2.4 . For example, Semantic MediaWiki [ 35 ] is an extension of

the MediaWiki software to annotate the wiki contents in the articles. The aim of this

tool is to improve consistency in Wikipedia articles by reusing the information

stored in the encyclopedia.

Some approaches have attempted to detect the semantic relatedness between

terms in documents to identify possible document topics. In [ 36 ] the authors intro-

duced “Explicit Semantic Analysis” (ESA) which computes the semantic related-

ness between fragments of natural language text using a concept space. The method

employs machine learning techniques to build a semantic interpreter which maps

fragments of natural language to a weighted sequence, named “interpretation

vector”, and built of Wikipedia concepts ordered by their relevance. The related-

ness between different interpretation vectors is evaluated by means of cosine

similarity.

In [ 37 ], the Wikipedia Link-Based Measure is described. The approach identifies

a set of candidate articles which represent the analyzed concepts and measures the

relatedness between these articles using a similarity measure which can be a tf-idf-

based measure, the Normalized Google Distance, or a combination of both. Experi-

mental results show that the ESA approach is effective in identifying the relatedness

between terms.

The Wikify! system [ 38 ] supports both algorithms for keyword extraction from

documents and word sense disambiguation to assign to each extracted keyword a

link to the correct Wikipedia article. The keyword extraction algorithm is based on

two steps: (a) candidate extraction, which extracts all possible n -grams that are also

present in a controlled dictionary, and (b) keyword ranking, which is based on tf-idf

statistics,

2 independence test or Keyphraseness (i.e., the probability that a term be

selected as a keyword for a document). Three different disambiguation algorithms

are integrated in the system. The first one is based on the overlap between the terms

in the document and a set of ambiguous terms stored in a dictionary. The second one

w

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home