Information Technology Reference
In-Depth Information
be categorized into two groups of young and elder. Hence, we divide the user into
two major demographic pools: users who are below 30 years old (young) and users
who are above 30 years old (elder). This binary categorization is simple but useful
and reasonable for user modeling. The same setting is also used in [ 28 ].
Relationship . Relationship in Google
has multiple categories, such as single,
married, and in a relationship . For the sake of clarity, we classify users into two
groups of unmarried and married.
+
Occupation . Based on the study of occupation function on the Google
user pages
and referred to the work in [ 20 ], the occupation is described with 15 values, such
as IT professionals, entertainer, and photographers .
+
Interest . Interest refers to the favorite topics based on users' posts. Based on the
analysis of our collected Google
data, we define 12 kinds of topics of interest to
cover a large interest category. Since each user may have several interests, the inter-
est value is vector-based and we treat interest inference as a binary classification
problem.
+
Sentiment Orientation . Posts of a user can reflect his/her specific emotional status.
For example, a user with many interesting and happy posts is more likely to be a
positive person, while posts containing negative content indicate the user's negative
tendency. Sentiment orientation is used to describe the emotional polarity of a
user based on his/her posts. We define three sentiment orientation values: positive,
negative, and neutral.
In our work, user attribute inference is divided into two phases. First, coarse
user attributes are derived by training independent classifiers on extracted features
from user profiles and posts. Second, we attempt to explore the dependency relations
between user attributes to boost the user attribute inference performance. Specifically,
we select a type of user attribute (e.g., occupation) as the target attribute for which
we want to learn a predictive model, and the remained attributes (e.g., age, gender,
and relationship) called auxiliary attribute are used to help learn the model.
Given a collection of Google
+
users
U
, each user u
U
corresponds to a two
dimensional tuple
x K ], where K is the number of attribute
types and x k is the user feature of the k th attribute.
[ X u , A u ]
.
X u =[
x 1 ,...,
A u
=[
a 1 ,...,
a K ]
denotes the
user attribute set. Denote the target attribute as
T
and the auxiliary attribute as
S
.
The whole attribute set is denoted as
A =[ S , T ]
. Thus, the problem is formally
defined as:
Relational User Attribute Inference . Given a collection of Google
+
users
U
and
attribute set
, the goal of relational user attribute inference is to learn
(1) a predictive function f
A =[ S , T ]
(X u ,S ) T u to infer the target attribute label of a
user; (2) attribute relation compatibility
a k ) ∈ R | A |×| A | , where
ʨ(
a i ,
ʨ
indicate
the compatibility strength of attribute relations.
In this study, we collect our experimental dataset from Google
via its publicly
accessible API. As our goal is to predict the user attributes but most of the real
user attributes are missing, we built an evaluation dataset by manually labeling the
attributes for each user. To ease the annotation task, we only collect the popular users
+
Search WWH ::




Custom Search