Social Networks and Data Journalism - Doing Data Science

Databases Reference

In-Depth Information

2-stars (subgraphs consisting of a node with two spokes—so a node

with degree 3 has three 2-stars associated to it) given the number of

nodes, and have these act as variables z i of your model, and then tweak

the associated coefficients θ i to get them tuned to a certain type of

behavior you observe or wish to simulate. If z 1 refers to the number

of triangles, then a positive value for θ 1 would indicate a tendency

toward a larger number of triangles, for example.

Additional graph statistics that have been introduced include k -stars

(subgraphs consisting of a node with k spokes—so a node with degree

k + 1 has k + 1 k -stars associated with it), degree, or alternating k -

stars , an aggregation statistics on the number of k -stars for various k .

Let's give you an idea of what an ERGM might look like formula-wise:

1

κ

Pr Y = y =

θ 1 z 1 y + θ 2 z 2 y + θ 3 z 3 y

Here we're saying that the probability of observing one particular re‐

alization of a random graph or network, Y , is a function of the graph

statistics or properties, which we just described as denoted by z i .

In this framework, a Bernoulli network is a special case of an ERGM,

where we only have one variable corresponding to number of edges.

Inference for ERGMs

Ideally—though in some cases unrealistic in practice—one could ob‐

serve a sample of several networks, Y 1 , ..., Y n , each represented by their

adjacency matrices, say for a fixed number N of nodes.

Given those networks, we could model them as independent and

identically distributed observations from the same probability model.

We could then make inferences about the parameters of that model.

As a first example, if we fix a Bernoulli network, which is specified by

the probability p of the existence of any given edge, we can calculate

the likelihood of any of our sample networks having come from that

Bernoulli network as

D − d i

p d i

n

L =∏ i

1− p

where d i is the number of observed edges in the i th network and D is

the total number of dyads in the network, as earlier. Then we can back

out an estimator for p as follows:

Search WWH ::

Custom Search

Home