what-when-how
In Depth Tutorials and Information
with probability
k
u,v
r
u,v
on any given day, but once the
r
u,v
coin comes up heads, no
further trials are made.
hus, given the transmission graph (and, in particular, each edge's reading fre-
quency
r
and copy probability
k
), the distribution of propagation patterns is now
fully established. Given a community and a timeout interval, the goal is therefore
to learn the arcs and associated probabilities from a set of episodes. Using these
probabilities, given the initial fragment of a new episode, it is able to predict the
propagation pattern of the episode.
In the following text, a closed-world assumption is made that all occurrences
of a topic except the first are the result of communication via edges in the network.
A topic in the following is a URL, phrase, name, or any other representation of a
meme that can be tracked from page to page. All blog entries can be gathered that
contain a particular topic into a list
[(
u t
,
),(
u t
,
),...,(
u t
,
)]
1 1 2 2
sorted by the publi-
cation date of the blog, where
u
i
is the universal identifier for blog user
i
, and
t
i
is the
first time at which user
u
i
contained a reference to the topic. his list can be referred
to as the
traversalsequence
for the topic. he following observation is critically used:
the fact that user
a
appears in a traversal sequence, and user
b
does not appear later
in the same sequence, gives us evidence about the (
a
,
b
) edge—that is, if
b
were a
regular reader of
a
's blog with a reasonable copy probability, then sometimes memes
discussed by
a
should appear in
b
's blog.
An EM-like algorithm is presented to induce the parameters of the transmis-
sion graph [51], in which the model first computes a “soft assignment” of each new
infection to the edges that may have caused it, and then updates the edge param-
eters to increase the likelihood of the assigned infections. Take an initial guess at
the value of
r
and
k
for each edge and improve the estimate of these values. A two-
stage process is adopted:
Soft-Assignment Step
: Using the current version of the transmission graph,
compute for each topic and each pair (
u
,
v
) the probability that the topic traversed
the (
u
,
v
) edge. Given the traversal sequence and the delay between
u
and
v
for a
particular topic
j
as input, for each
v
in the sequence, consider all previous vertices
u
in the sequence and compute the probability
P
u,v
that topic
j
would have been
copied from
u
to
v
. hen normalize by the sum of these probabilities to compute
the posteriors of the probability that each node
u
was
v
's source of inspiration.
hat is, setting
r
=
r
u,v
,
k
=
k
u,v
, and
δ
is to be the delay in days between
u
and
v
in
topic
j
:
k
k
−
−
r
(
1
1
r k
)
δ
=
P
:
∑
(3.14)
u v
,
δ
r
(
r
)
k
w v
,
w v
,
w v
,
w v
,
<
w v
Parameter-UpdateStep
: For fixed
u
and
v
, recompute
r
u,v
and
k
u,v
based on the pos-
terior probabilities just computed. Perform the following operation for each fixed
u
and
v
. Let
S
1
denote the set of topics
j
such that topic
j
appeared first at node
u
and