Image Processing Reference
In-Depth Information
mantics of the document d . In the term frequency-inverse document frequency model, the
weight is typically proportional to the term frequency and inversely proportional to the fre-
quency and length of the documents containing the term. Some studies [ 22 , 23 ] proposed other
considerations to assign different weights to words, such as word length and location.
2.3 Lexical Chain
A LC is a sequence of words, which is in lexical cohesion relations with each other, and the
chained words tend to indicate portions of the context that form semantic units while it is in-
dependent of the grammatical structure of the text. LCs could serve further as a basis for a
segmentation [ 24 ] . This method is usually applied in a summarization generation [ 25 ] . For in-
stance, the string “
” (vector space model) may be parsed as “[
],” (vector) “[
],” (space), and “[
]” (model) if there is no further merging pro-
cess undergoing. Thereby the most informative compound “[ ]” will
be left out. Usually LCs are constructed in a botom-up manner by taking each candidate word
of a text, and finding an appropriate semantic relation offered by a thesaurus. Instead, the pa-
per [ 26 ] proposes a top-down approach of linear text segmentation based on lexical cohesion
of a text. Some scholars suggested to use machine learning approaches to create a set of rules
to calculate the rate of forming a new word by characters for entity recognition including the
maximum entropy model and HMM and claimed this method was able to achieve reasonable
performance with minimal training data [ 27 , 28 ] .
3 Research design
3.1 Research Model Overview
This study covers three tasks: Text processing, anecodote extraction, and annotation evalu-
ation. Since image captions are writen by journalists, it is assumed that a man-made caption
would be faithful to an image scenario. We assigned the extracted anecdotes with the highest
weight as the primary annotation and the second and the third places as secondary annota-
tions. The framework of our research is depicted as Figure 1 . We conducted CLCP and term
weighting for the input data to extract anecodotes and evaluated the annotations by using the
image caption mapping and human judgment, respectively. In the following section, we will
introduce the way of anecodote extraction and the way of annotation evaluation.
FIGURE 1 Research model.
Search WWH ::

Custom Search