Database Reference
In-Depth Information
the sets U 0 and S 0 of users and stories that were
gathered in the first data collection phase as the
core dataset, while the rest of the dataset will be
referred to as extended dataset. It occasionally
happened that stories or users were removed from
the system (probably due to spamming behavior)
in which case they were also removed from the
local dataset.
aforementioned distributions. The figure renders
clear the heavy-tail nature of the depicted distri-
butions, by overlaying on top of the observed
distributions their power-law fits according to the
fitting method presented by Clauset et al. (2007).
The proposed method employs an approximation
to the Maximum Likelihood Estimator (MLE) for
the scaling parameter of the power law:
Statistical Analysis of digg usage
-
1
é
ù
x
n
ê ê ê
ú ú ú
å
a
ˆ
@+
1
n
ln
i
x
-
1
2
ë
û
i
=
1
The first step of our analysis involved the study
of the heavy-tail nature of several variables of
interest arising through the mass usage of Digg.
The following distributions were examined:
(6)
min
This estimator assumes that the value x min above
which the power law holds is known. In order to
estimate this value, the authors recommend the
use of the Kolmogorov-Smirnov (KS) statistic as
a measure of goodness-of-fit of the model with
parameters ( α , x min ) with the observed data. The
KS statistic is defined as the maximum distance
between the CDF of the data S(x) and the fitted
model P(x) :
Diggs collected by stories.
Comments collected by stories.
Diggs given by users.
Friends in the
Digg social networks of
users.
Figure 3 provides logarithmic plots of the
Figure 3. Four heavy-tail Digg Cumulative Distribution Functions (CDF) and their power-law approxima-
tions: (a) Diggs per story, (b) comments per story, (c) Diggs per user, and (d) Digg friends per user.
 
Search WWH ::




Custom Search