Image Processing Reference
In-Depth Information
carry more anecdotes than a short LC could possibly do. For example, it is obvious that “[
]” is a beter anecdote than its substring “[
]” or “[
]” or
“[ ].” To enhance the semantics of candidate anecdotes, we will take a further concat-
enation process as the last step to finalize the CLCP.
3.2.4 Step 4: Execute postprocessing
In the final step, significant words are determined by observing the information mutually
shared by two-overlapped LCs using the following significance estimation (SE) function as (3)
(3)
where fi denotes the LC fi to be estimated, that is, fi = fi fi · fi 2 , …, fi n ; a and b represent the two longest
compound substrings of LCi fi with the length n − 1, that is, a = fi fi · fi 2 , …, fi n − 1 and b = fi 2 · fi 3 , …, fi n . The
fi a , fi b and fi fi are the frequencies of a , b , and fi , respectively. In the above example, the term fi , “[
]” (Yu Chang case), shall gain the SE value of 0.83 based on its frequency 5 and the
frequency 6 of its substring a , “[
]” (Yu Chang), as well as the frequency 5 of the other
substring “[
]” (Chang case). In this case, we will retain term fi “[
]” and its
substring a “[
]” because the frequency of “[
]” is less than “[
]” in-
dicating “[
]” carries useful meanings. Likewise, we will discard the substring b “[
]” because both terms have the same frequency indicating the long term “[
]”
can replace its substring “[
].” As stated above, since fi fi < fi a , we retain both terms, and
discard “[
]” because fi fi = fi b .
3.2.5 Term weighting
It is suggested that the most significant content description often appears in the title and the
irst paragraph. In addition, word frequency and word length are also accepted as the indic-
ators of term discrimination value in a document. Given a word LCi, fi , the term weighting al-
gorithm may be defined as (4) .
 
Search WWH ::




Custom Search