Digital Signal Processing Reference
In-Depth Information
representational schemes provide the information in a suitable and efficient way. In
semantic networks, words or concepts are represented as nodes in a graph. Relations
are represented by named links [ 83 ]. Another form of on-line knowledge sources
in the linguistic domain are annotated dictionaries. There, properties of a term are
stored as tags. However, dictionaries usually do not contain relations between terms.
Some well-known examples of such linguistic open-domain information sources are
now introduced and an approach for using these sources for content and sentiment
analysis based on linguistic cues is described.
6.3.4.1 ConceptNet
ConceptNet is a semantic network of concepts, such as “actor” or “to watch a
movie” . It is freely available for download 3 and provides commonsense knowl-
edge in a machine-readable format. Knowledge is added by crowd-sourcing of non-
specialised humans. The interface for edition by users 4 is capable to a certain extent
to avoid false claims and other mistakes [ 84 ]. ConceptNet's storage format does not
contain syntactic category information. Thus, it has no support for word sense dis-
ambiguation. This can, however, be overcome by formulating sufficiently specific
concepts, since a concept can consist of an arbitrary amount of words. Concepts are
stored in a normalised format. This format aims at ignoring minor syntactic varia-
tions that do not affect the meaning of the concept. A concept is normalised by [ 84 ]:
removal of punctuation and stop words, running each word through Porter's stemmer,
alphabetise the stems, such that the order of words does not matter. Figure 6.7 shows
the histogram of concept size in ConceptNet. As can be seen, multi-word concepts
form the largest part of the database.
Twenty one relations that encode the meaning of the connection between concepts
interlink these. Relations names aim at intuitiveness, such as in IsA or PartOf .The
unit of meaning representation is the predicate . Figure 6.8 shows an exemplary stor-
age of predicates in ConceptNet.
Each predicate consists of two concepts and a relation, e.g., “actor” PartOf
“movie” ( “An actor is part of a movie” ). Further, a concept can be part of many rela-
tions. In the example in Fig. 6.8 , “movie” is also connected to “fun” by a HasProperty
relation. Relations are always unidirectional, as in the majority of cases predicates
are not invariant to order (cf. e.g., “A movie is part of an actor” for a non-sense
inversion of order). Predicates may be negated, such as in “A car cannot travel at the
speed of light” . Furthermore, each predicate has a confidence score on its reliability
initialised at one. It can then be increased/decreased by users. Confidence values
equal to or below zero indicate unreliable ones [ 84 ]. The current version ConceptNet
3 contains 250 556 concepts, and 390 885 predicates for the English language.
3
http://conceptnet.media.mit.edu/
4
http://commons.media.mit.edu/en/
 
Search WWH ::




Custom Search