Information Technology Reference
In-Depth Information
A x Features. Because of the characteristics of Chinese adjective morphol-
ogy, the ax feature is important for opinion word extraction. For example,
(unforgettable),
難看
(unsightly), and
難吃
(unpalatable) all share the prefix
character “
”, which means “bad” here. It follows that new words with the
” prefix may also be opinion words. We use two prefix features: “first char-
acter” and “first character to middle character”, and two sux features: “last
character” and “middle character to last character”.
Word Features. Given a target word w , the words in the context window, that
is, w itself and words preceding or following w may be useful for determining if
w is an opinion word. In our experience, a suitable window size is seven, i.e., the
three preceding words, the current word, and the two following words.
Length Feature. In Chinese, if a word w 's length (
) is longer than four char-
acters in length, w tends to be a named entity, not an opinion word. Therefore,
the length feature for w is designed as the follows:
|
|
w
|
w
4
5otherwise
|
if
|
w
|≤
Near-Synonym-Cluster Feature. For this feature, we collect similar words
from “Revised Ministry of Education Dictionary” 2 and group them into clusters.
Some of the cluster examples are shown in Table 1. If a given word w appears
in a cluster c i , the value of w 's near-synonym feature is c i . Otherwise, the value
of w 's near-synonym feature is NULL.
Tabl e 1. Cluster examples
Cluster Words in Cluster
C 1 逐一 , 一一
C 2 別名 , 又名 , 別號 , 別稱
C 3
,
一皬間
一剎辣
Conjunction Features. To distinguish feature instances from the source do-
main and the target domain, we generate conjunction features by combining
each aforementioned feature with a domain tag. For example, supposing a fea-
ture “word=
” (gorgeous) is found in the source domain, its corresponding
conjunction feature is “word=
華麗
&source domain”.
華麗
2 http://dict.revised.moe.edu.tw
 
Search WWH ::




Custom Search