Database Reference
In-Depth Information
user activity). To this aim, Facebook's public APIs were exploited to build a social
network application, Tag!t. The application makes it possible to share and interact
with video content in a broader way than allowed by Facebook's features. Besides
sharing videos and aggregating them in collections, users can tag specific time-
stamps of their friends' videoclips and link additional materials (e.g., videoclips,
images, Web pages) to them. The application also keeps track of user interaction
with the media itself (e.g., play/pause/seek events). Experiments with the Tag!t appli-
cation showed that users within a selected group tend to tag in a similar manner. In
addition, the semantics suggested by a user were found to be scantly biased by other
users' tagging of the same content, thus indicating that collaborative tagging leads
to coherent semantics.
2.5.3 Document Annotation
Wikipedia articles can also be used to analyze words in order to improve keyword
extraction from documents and disambiguation algorithms, as shown in the taxon-
omy, depicted in Fig. 2.4 . For example, Semantic MediaWiki [ 35 ] is an extension of
the MediaWiki software to annotate the wiki contents in the articles. The aim of this
tool is to improve consistency in Wikipedia articles by reusing the information
stored in the encyclopedia.
Some approaches have attempted to detect the semantic relatedness between
terms in documents to identify possible document topics. In [ 36 ] the authors intro-
duced “Explicit Semantic Analysis” (ESA) which computes the semantic related-
ness between fragments of natural language text using a concept space. The method
employs machine learning techniques to build a semantic interpreter which maps
fragments of natural language to a weighted sequence, named “interpretation
vector”, and built of Wikipedia concepts ordered by their relevance. The related-
ness between different interpretation vectors is evaluated by means of cosine
similarity.
In [ 37 ], the Wikipedia Link-Based Measure is described. The approach identifies
a set of candidate articles which represent the analyzed concepts and measures the
relatedness between these articles using a similarity measure which can be a tf-idf-
based measure, the Normalized Google Distance, or a combination of both. Experi-
mental results show that the ESA approach is effective in identifying the relatedness
between terms.
The Wikify! system [ 38 ] supports both algorithms for keyword extraction from
documents and word sense disambiguation to assign to each extracted keyword a
link to the correct Wikipedia article. The keyword extraction algorithm is based on
two steps: (a) candidate extraction, which extracts all possible n -grams that are also
present in a controlled dictionary, and (b) keyword ranking, which is based on tf-idf
statistics,
2 independence test or Keyphraseness (i.e., the probability that a term be
selected as a keyword for a document). Three different disambiguation algorithms
are integrated in the system. The first one is based on the overlap between the terms
in the document and a set of ambiguous terms stored in a dictionary. The second one
w
Search WWH ::




Custom Search