Information Technology Reference
In-Depth Information
[
.
<
α
<
.
]
from del.icio.us are situated in the interval
. Figure
5.8
shows that both experimental conditions and the aggregated data from del.icio.us
have similar exponents. Our results show that a similar
1
732391
2
249359
α
holds for both the 'tag
suggestion' and 'no tag suggestion' condition.
5.3.3.2
Kolmogorov-Smirnov Complexity
Determining whether a particular distribution is a 'good fit' for a power-law
is difficult, as most goodness-of-fit tests employ some sort of normal Gaussian
assumption that is inappropriate for non-normal power-law distributions. However,
the Kolmogorov-Smirnov Test (abbreviated as the 'KS Test') can be employed as
a 'goodness-of-fit' test for any distribution without implicit parametric assumptions
and is thus ideal for use measuring goodness-of-fit of a given finite distribution to a
power-law function. Intuitively, given a reference distribution
P
(perhaps produced
by some well-known function like a power-law) and a sample distribution
Q
of
size
n
, where one is testing the null hypothesis that
Q
is drawn from
P
, then one
simply compares the cumulative frequency of both
P
and
Q
and then the greatest
discrepancy (the
D
-statistic) between the two distributions is tested against the
critical value for
n
, which varies per function.
For a power-law distribution generating function, we can get a critical
p
-value
by generating artificial data using the scaling exponent
and lower-bound equal
to those found in the supposed fitted power-law distribution. A power-law is fit to
this artificial data, and then the KS test is then done for each distribution that was
artificially generated comparing it to its
own
fitted power-law. The
p
-value is then
just the fraction of the amount of times the
D
-statistic is larger for the artificially-
generated distribution than the
D
-statistic of the empirically-found distribution.
Therefore, the larger the
p
-value, the more likely a genuine power-law has been
found in the empirical data. According to Clauset, “once we have calculated our
p
-value, we need to make a decision about whether it is
small enough to rule out
the power-law hypothesis” (emphasis added) (2007). The power-law hypothesis
is simply that the distribution was generated by a power-law generating function.
The null hypothesis is that by chance a function would generate the power-law
distribution observed in the empirical data. We shall also use
p
α
1.
The KS test for all 11 tagged web-pages, testing both the 'tag suggestion' and 'no
tag suggestion' conditions, is given in Fig.
5.9
. The average D statistic for the 'no
tag suggestion' condition is 0.0313 (S.D. 0.0118) with
p
≤
0
.
1, power-
law found). For the 'tag suggestion' condition the average
D
-statistic is 0.0724
(S.D. 0.0256) with
p
=
0
.
48 (
p
>
0
.
1, no power-law found). These results show
that the power-law function exhibited
only
in the 'no tag suggestion' condition
is significant, the fit is closer for the 'no tag suggestion' condition than the 'tag
suggestion' condition. The
D
-statistic showed a range from 0.0170 to 0.0552 for
'no tag suggestion' condition yet a range of 0.0428-0.1318 for 'tag suggestion.'
Thus, the power-law only significantly appears without tag suggestions, and with
tag suggestions a power-law cannot be reliably found. This is surprising, as tag
=
0
.
08 (
p
≤
0
.
Search WWH ::
Custom Search