what-when-how
In Depth Tutorials and Information
12.4 TheLinkPredictionProblem
forSocialNetworks[3]
12.4.1 Model of the Link Prediction
Let G = <V,E> represent a graph of a social network model. V denotes the all the nodes
in the social network and E denotes all the edges between each nodes. For the e
∈
E, it is
one edge between two nodes at a specific time t. In different time periods, we can record
all the interactions among edges. For a given time period t < t', G[t, t'] represents a subset
or a subgraph of G in which all the communication between nodes must happen in a
given time period [t, t']. he training interval [t
0
, t
0
'] is a time period [t
1
, t
1
'] in which we
know all the edges and t
1
> t
0
. Correspondingly, the test interval is a time period in which
the edges are unknown. For the G[t
0
, t
0
'], it must generate some possible edges that are
not included in the G[t0, t0']. hey are the outcome of the network prediction.
In the experiments explained in [3], we may select some authors, and each of
them has a number of papers. A given set Core represents Jtrain edges in the train-
ing interval and Jtest edges in the test interval. Enews is the set of edges <a,b> in
which authors a and b collaborate during the test interval instead of the set Eold,
which is the set of edges in which the two authors have no collaboration in the
training interval. Gcollab: = <V,Eold>.
12.4.2 Methods for Link Prediction
Score(x,y) denotes the connection weight score. We can calculate the proximity or
similarity between nodes x and y, which are related to the geographical position.
he basic method to find the shortest path between nodes is to use the small-world
network theory. Small-world network can be explained as a type of social network
in which most nodes are not directly connected to each other. However, they can
reach other nodes by a several hops or steps.
Let
φ
(x) denote the set of neighbors of a given node x. We can infer that if the
sets
φ
(x) and
φ
(y) have a large number of intersection sets, the two nodes x and y
have more possibility to contact with each other.
12.4.3 Common Neighbors
Following this idea, we can measure
score
as follows: score(x,y) =
φ
(x)
∩
φ
(y), which
is the number of the common neighbors of x and y. It has been proved that the
number of common neighbors of nodes
a
and
b
at time t has a relationship with the
possibility of their collaborations in the future.
Jaccard'scoeicient:
he Jaccard coefficient was widely applied in the message
retrieval. he weighted score can be measured as the number of the intersection
between x and y, which was compared to the union set of x and y. he formulation
was given as: score(x, y) = |
φ
(x)
∩
φ
(y)|.