Information Technology Reference
In-Depth Information
2Sy emOv rv ew
Our study explores the integration of mining relations (and structures) among en-
tities and the learning ranking of entities. For that reason, we first extract relations
and then determine a model based on those relations. Our reasoning is that important
relations can be recognized only when we define some tasks. These tasks include
ranking or scores for entities, i.e. target ranking , such as ranking of companies, CD
sales, popular blogs, and sales of products. In short, our approach consists of two
steps:
Step 1: Constructing Social Networks . Given a list of entities with a target
ranking, we extract a set of social networks among these entities from the web.
Step 2: Ranking learning . Learn a ranking model based on relations and structural
features generated from the networks.
Once we obtain a ranking model, we can use it for prediction for unknown enti-
ties. Additionally, we can obtain the weights for each relation type as well as the
relation structure, which can be considered as important for target rankings. The
social network can be visualized by specifically examining its inherent relations if
the important relations are identified. Alternatively, social network analysis can be
executed based on the relations.
3
Constructing Social Networks
In this step, our task is, given a list of entities V
= {
v 1 ,...,
v n }
, to construct a set of
social networks G i
(
V
,
E i
)
, i
∈{
1
,...,
m
}
where m signifies the number of relations,
and E i
= {
e i
(
v x
,
v y
) |
v x
V
,
v y
V
,
v x
=
v y
}
denotes a set of edges with respect to the
i -th relation.
A social network is obtainable through various approaches [4, 8, 9]. In this
chapter, we detail the web mining approaches— co-occurrence-based approach and
classification-based approach —as a basis of our study. For the co-occurrence-based
approach [8, 9], given a person name list, the strength of relevance of two persons,
x and y , is estimated by putting a query x AND y to a search engine. An edge will
be invented when the relation strength by the co-occurrence measure is higher than
a predefined threshold. Subsequently, we extract co-occurrence-based networks of
two kinds: cooc network ( G cooc ), and overlap network ( G overla p ). The relational
indices are calculated respectively using the matching coefficient n x y and the over-
lap coefficient n x y /
,where n k means the number of hits obtained after
issuing query k to a search engine. For the classification-based approach [8] based
on web co-occurrence networks, edges are classified into those representing one of
several relations using C4.5 as a classifier. In our experiments, we first extract over-
lap network among researchers, then classify the edges into relational networks
of two kinds: an co-affiliation network ( G af filiation )anda co-project network
( G pro ject ). Because of space limitations, we show no details related to the construc-
tion algorithms. Details are provided in an earlier report [8]. Extracted networks
for 253 researchers are portrayed in Fig. 1. It is apparent that social networks vary
min
(
n x ,
n y )
 
Search WWH ::




Custom Search