Information Technology Reference
In-Depth Information
A
x Features.
Because of the characteristics of Chinese adjective morphol-
ogy, the ax feature is important for opinion word extraction. For example,
難
忘
(unforgettable),
難看
(unsightly), and
難吃
(unpalatable) all share the prefix
character “
”, which means “bad” here. It follows that new words with the
難
“
” prefix may also be opinion words. We use two prefix features: “first char-
acter” and “first character to middle character”, and two sux features: “last
character” and “middle character to last character”.
難
Word Features.
Given a target word
w
, the words in the context window, that
is,
w
itself and words preceding or following
w
may be useful for determining if
w
is an opinion word. In our experience, a suitable window size is seven, i.e., the
three preceding words, the current word, and the two following words.
Length Feature.
In Chinese, if a word
w
's length (
) is longer than four char-
acters in length,
w
tends to be a named entity, not an opinion word. Therefore,
the length feature for
w
is designed as the follows:
|
|
w
|
w
4
5otherwise
|
if
|
w
|≤
Near-Synonym-Cluster Feature.
For this feature, we collect similar words
from “Revised Ministry of Education Dictionary”
2
and group them into clusters.
Some of the cluster examples are shown in Table 1. If a given word
w
appears
in a cluster
c
i
, the value of
w
's near-synonym feature is
c
i
. Otherwise, the value
of
w
's near-synonym feature is NULL.
Tabl e 1.
Cluster examples
Cluster Words in Cluster
C
1
逐一
,
一一
C
2
別名
,
又名
,
別號
,
別稱
C
3
,
一皬間
一剎辣
Conjunction Features.
To distinguish feature instances from the source do-
main and the target domain, we generate conjunction features by combining
each aforementioned feature with a domain tag. For example, supposing a fea-
ture “word=
” (gorgeous) is found in the source domain, its corresponding
conjunction feature is “word=
華麗
&source domain”.
華麗
Search WWH ::
Custom Search