Ranking Learning Entities on theWeb by Integrating Network-Based Features - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

2Sy emOv rv ew

Our study explores the integration of mining relations (and structures) among en-

tities and the learning ranking of entities. For that reason, we first extract relations

and then determine a model based on those relations. Our reasoning is that important

relations can be recognized only when we define some tasks. These tasks include

ranking or scores for entities, i.e. target ranking , such as ranking of companies, CD

sales, popular blogs, and sales of products. In short, our approach consists of two

steps:

Step 1: Constructing Social Networks . Given a list of entities with a target

ranking, we extract a set of social networks among these entities from the web.

Step 2: Ranking learning . Learn a ranking model based on relations and structural

features generated from the networks.

Once we obtain a ranking model, we can use it for prediction for unknown enti-

ties. Additionally, we can obtain the weights for each relation type as well as the

relation structure, which can be considered as important for target rankings. The

social network can be visualized by specifically examining its inherent relations if

the important relations are identified. Alternatively, social network analysis can be

executed based on the relations.

Constructing Social Networks

In this step, our task is, given a list of entities V

= {

v 1 ,...,

v n }

, to construct a set of

social networks G i

(

E i

)

, i

∈{

,...,

}

where m signifies the number of relations,

and E i

= {

e i

(

v x

v y

) |

v x

∈

v y

∈

v x

v y

}

denotes a set of edges with respect to the

i -th relation.

A social network is obtainable through various approaches [4, 8, 9]. In this

chapter, we detail the web mining approaches— co-occurrence-based approach and

classification-based approach —as a basis of our study. For the co-occurrence-based

approach [8, 9], given a person name list, the strength of relevance of two persons,

x and y , is estimated by putting a query x AND y to a search engine. An edge will

be invented when the relation strength by the co-occurrence measure is higher than

a predefined threshold. Subsequently, we extract co-occurrence-based networks of

two kinds: cooc network ( G cooc ), and overlap network ( G overla p ). The relational

indices are calculated respectively using the matching coefficient n x ∧ y and the over-

lap coefficient n x ∧ y /

,where n k means the number of hits obtained after

issuing query k to a search engine. For the classification-based approach [8] based

on web co-occurrence networks, edges are classified into those representing one of

several relations using C4.5 as a classifier. In our experiments, we first extract over-

lap network among researchers, then classify the edges into relational networks

of two kinds: an co-affiliation network ( G af filiation )anda co-project network

( G pro ject ). Because of space limitations, we show no details related to the construc-

tion algorithms. Details are provided in an earlier report [8]. Extracted networks

for 253 researchers are portrayed in Fig. 1. It is apparent that social networks vary

min

(

n x ,

n y )

Mining and Analyzing Social Networks

Search WWH ::

Custom Search

Home