Information Technology Reference
In-Depth Information
[
.
< α <
.
]
from del.icio.us are situated in the interval
. Figure 5.8
shows that both experimental conditions and the aggregated data from del.icio.us
have similar exponents. Our results show that a similar
1
732391
2
249359
α
holds for both the 'tag
suggestion' and 'no tag suggestion' condition.
5.3.3.2
Kolmogorov-Smirnov Complexity
Determining whether a particular distribution is a 'good fit' for a power-law
is difficult, as most goodness-of-fit tests employ some sort of normal Gaussian
assumption that is inappropriate for non-normal power-law distributions. However,
the Kolmogorov-Smirnov Test (abbreviated as the 'KS Test') can be employed as
a 'goodness-of-fit' test for any distribution without implicit parametric assumptions
and is thus ideal for use measuring goodness-of-fit of a given finite distribution to a
power-law function. Intuitively, given a reference distribution P (perhaps produced
by some well-known function like a power-law) and a sample distribution Q of
size n , where one is testing the null hypothesis that Q is drawn from P , then one
simply compares the cumulative frequency of both P and Q and then the greatest
discrepancy (the D -statistic) between the two distributions is tested against the
critical value for n , which varies per function.
For a power-law distribution generating function, we can get a critical p -value
by generating artificial data using the scaling exponent
and lower-bound equal
to those found in the supposed fitted power-law distribution. A power-law is fit to
this artificial data, and then the KS test is then done for each distribution that was
artificially generated comparing it to its own fitted power-law. The p -value is then
just the fraction of the amount of times the D -statistic is larger for the artificially-
generated distribution than the D -statistic of the empirically-found distribution.
Therefore, the larger the p -value, the more likely a genuine power-law has been
found in the empirical data. According to Clauset, “once we have calculated our
p -value, we need to make a decision about whether it is small enough to rule out
the power-law hypothesis” (emphasis added) (2007). The power-law hypothesis
is simply that the distribution was generated by a power-law generating function.
The null hypothesis is that by chance a function would generate the power-law
distribution observed in the empirical data. We shall also use p
α
1.
The KS test for all 11 tagged web-pages, testing both the 'tag suggestion' and 'no
tag suggestion' conditions, is given in Fig. 5.9 . The average D statistic for the 'no
tag suggestion' condition is 0.0313 (S.D. 0.0118) with p
0
.
1, power-
law found). For the 'tag suggestion' condition the average D -statistic is 0.0724
(S.D. 0.0256) with p
=
0
.
48 ( p
>
0
.
1, no power-law found). These results show
that the power-law function exhibited only in the 'no tag suggestion' condition
is significant, the fit is closer for the 'no tag suggestion' condition than the 'tag
suggestion' condition. The D -statistic showed a range from 0.0170 to 0.0552 for
'no tag suggestion' condition yet a range of 0.0428-0.1318 for 'tag suggestion.'
Thus, the power-law only significantly appears without tag suggestions, and with
tag suggestions a power-law cannot be reliably found. This is surprising, as tag
=
0
.
08 ( p
0
.
Search WWH ::




Custom Search