Information Technology Reference
In-Depth Information
Ground Truth
Quadrilateral
Simmelian
Jaccard
Density
Random
PPM (50 samples)
Caltech36
Rice31
Smith60
0.9
1.0
0.9
0.7
0.8
0.8
0.8
0.6
0.7
0.6
0.7
0.5
0.6
0.4
0.6
0.4
0.5
100%
75%
50%
25%
100%
75%
50%
25%
100%
75%
50%
25%
100%
75%
50%
25%
remaining edges
Fig. 3. Homophily ( y -axis) is plotted against the number of remaining edges ( x -axis) for
the synthetic model (PPM) and three of the Facebook networks. Overall Quadrilateral
performs better than the others. For the synthetic networks it comes very close to the
ground truth.
4.1 Dataset and Model
As real world samples, we use the Facebook100 dataset [26], which contains so-
cial relations of 100 higher educational institutes in the US. The network size
varies from 762 to 41K vertices and from 16K to 1.6M edges. The dataset is di-
rectly from Facebook, not sampled, and thus very complete in terms of capturing
the social relations according to a widely used service at that time. Additional
attributes obtained from the Facebook profiles are gender, year of graduation,
dormitory, etc. Due to incomplete profiles, a number of attribute values are
missing. We will use the dormitory attribute for our evaluation, because it has
been argued to be important for the creation of social relations in many of the
networks [26].
Note that, in spite of a strong empirical association with homophilous
attribute values, no ground-truth group structure is available for Facebook net-
works. Therefore, we also generated artificial networks from a model that repre-
sents the idealized version of the networks we are considering in this application.
A simple model generating random graphs with cohesive groups that are
connected into a small world is the planted partition model (PPM) [15]. Let
C
=
{
C 1 ,...,C k }
be a partition of V for a graph G =( V,E ). Then
C
is called
a clustering of G with class c ( v )
∈C
for a vertex v
V . The probability of an
edge ( u,v )is p in if c ( u )= c ( v )and p out if c ( u )
= c ( v ). We generated 50 graphs
from a PPM with 500 vertices, k =9, p in =0 . 3, and p out =0 . 01. On top of that,
we ran a random noise model with p in = p out =0 . 1 to obfuscate the underlying
group structure. The resulting graphs are very dense, have a low diameter, and
are real hairballs without any visible structure when laid out using force-directed
methods. The presented results of our model are averaged over 50 samples.
 
Search WWH ::




Custom Search