Anecdotes extraction from webpage context as image annotation - Emerging Trends in Image Processing, Computer Vision, and Pattern Recognition

Image Processing Reference

In-Depth Information

carry more anecdotes than a short LC could possibly do. For example, it is obvious that “[

]” is a beter anecdote than its substring “[

]” or “[

]” or

“[ ].” To enhance the semantics of candidate anecdotes, we will take a further concat-

enation process as the last step to finalize the CLCP.

3.2.4 Step 4: Execute postprocessing

In the final step, significant words are determined by observing the information mutually

shared by two-overlapped LCs using the following significance estimation (SE) function as (3)

(3)

where fi denotes the LC fi to be estimated, that is, fi = fi fi · fi 2 , …, fi n ; a and b represent the two longest

compound substrings of LCi fi with the length n − 1, that is, a = fi fi · fi 2 , …, fi n − 1 and b = fi 2 · fi 3 , …, fi n . The

fi a , fi b and fi fi are the frequencies of a , b , and fi , respectively. In the above example, the term fi , “[

]” (Yu Chang case), shall gain the SE value of 0.83 based on its frequency 5 and the

frequency 6 of its substring a , “[

]” (Yu Chang), as well as the frequency 5 of the other

substring “[

]” (Chang case). In this case, we will retain term fi “[

]” and its

substring a “[

]” because the frequency of “[

]” is less than “[

]” in-

dicating “[

]” carries useful meanings. Likewise, we will discard the substring b “[

]” because both terms have the same frequency indicating the long term “[

]”

can replace its substring “[

].” As stated above, since fi fi < fi a , we retain both terms, and

discard “[

]” because fi fi = fi b .

3.2.5 Term weighting

It is suggested that the most significant content description often appears in the title and the

irst paragraph. In addition, word frequency and word length are also accepted as the indic-

ators of term discrimination value in a document. Given a word LCi, fi , the term weighting al-

gorithm may be defined as (4) .

Search WWH ::

Custom Search

Home