Information Technology Reference
In-Depth Information
Our framework applies NLTL as shown in Fig. 4. We first provide high-quality
and low-quality labels using three methods, and then the features are generated as
described before. Finally, we learn a classifier using both high-quality and low-quality
domain data, and show that low-quality domain data is useful in improving the per-
formance.
5.1
Labeling
We provide high-quality labels (manual labeling) as well as low-quality labels (using
emoticon and sentiment dictionary). The low-quality labeling methods are automatic
and low-cost but the result may contain noises.
Emoticon Labeling. We first manually classify the emoticons which are clearly
positive or negative. Then, we use the emoticons to decide the label (positive or
negative) of the diffusions.
Manual Labeling. Human annotators are asked to label whether the content is
positive, negative, or unknown.
Sentiment Dictionary Labeling. We construct a sentiment dictionary and label
the diffusions based on the voting of the words in the sentiment dictionary.
5.2
Dataset
We first identify 100 top discussion topics from Plurk micro-blog site [8]. We collect
the messages and responses from users who discuss about those topics in the period
from 01/2011 to 05/2011. A diffusion of sentiment is denoted as ,,, , which
means user posts a message of topic , and user responses with sentiment
(positive or negative, labeled by different methods introduced in 5.1). This dataset
contains 699,985 objects, thus is not practical to label them all manually. We choose
17% of the objects to be labeled manually, while other objects are labeled using emo-
ticon and sentiment dictionary. Finally, we obtain 82,277 diffusions from manual
labeling, 117,876 diffusions from emoticon labeling, and 396,370 diffusions from
sentiment dictionary labeling.
5.3
Feature Generation
To perform sentiment prediction, we design the following features. We divide the
proposed features into four types as follows.
Link Sentiment Information. The link sentiment information describes the ten-
dency of each link in the network to be positive or negative for a given topic. For a
link, link sentiment score ( ) is calculated by comparing the number of times that
a positive or negative content is diffused. That is, we increase LS by one for each
positive diffusion and decrease LS by one for each negative diffusion.
Search WWH ::




Custom Search