Ranking Learning Entities on theWeb by Integrating Network-Based Features - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

This model can be augmented easily with other traditional attributes of entities as

features. We can use any technique such as SVM, boosting, and neural networks to

implement the optimization problem. For multi-relational networks, we can gener-

ate features for each single-relational network. Thereby, we can compare the perfor-

mance among them to elucidate which relational network produces more reasonable

features. We can determine which relation(s) is important for the target ranking.

5

Experimental Results

In this section, we describe results to clarify the effectiveness of ranking learning on

extracted social networks. We use data of 253 researchers from The University of

Tokyo to predict a ranking of researchers. In our experiments, we conducted three-

fold cross-validation. In each trial, two folds of actors are used for training, and one

fold for prediction. The results we report in this section are those averaged over

three trials. We use Spearman's rank correlation coefficient to measure the pairwise

ranking correlation between predicted rankings and the target ranking.

5.1

Datasets

We extract social networks for researchers (253 professors of The University of

Tokyo) to learn and predict the ranking of researchers. We use the ranking by the

number of publications (designated as Paper ) as a target ranking, as presented in

Table 2. Academic papers are often the product of several researchers' collaboration.

Therefore, a good position in a social network is derived through good performance.

Is there any relation that is important to predict productivity?

We construct social networks among researchers from the web using a general

search engine. We detail the co-occurrence-based approach (Section 6.3.1) to ex-

tract co-occurrence-based networks of two kinds in English-language web sites and

Japanese web sites respectively: a cooc network ( G Ecooc , G Jcooc ) and an overlap

network ( G Eoverla p , G Joverla p ). Actually, we used English/romanized names of re-

searchers as a query to obtain co-occurrence information for G Ecooc and G Eoverla p ,

and used Japanese names of researchers as a query to obtain co-occurrence infor-

mation for G Jcooc and G Joverla p . Then, based on web co-occurrence networks (using

Japanese web sites), we use the context of web pages retrieved using two names

of persons to classify the relations using C4.5 as a classifier (details presented in

[8]). We use a Jaccard network constructed using the approach described above;

then we classify the edges into relational networks of two kinds: a co-affiliation net-

work ( G af filiation ) and a co-project network ( G pro ject ). Extracted networks for 253

researchers are portrayed in Fig. 1.

For this experiment, we also use researcher attributes of two types: the number

of hits on Japanese web sites JhitNum (using Japanese names as a query) and the

number of hits on the English-language web sites EhitNum ) (using English/

romanized names as a query).

Search WWH ::

Custom Search

Home