Information Technology Reference
In-Depth Information
in which c and
are the constants that characterize the power-law and b being some
constant or variable dependent on x that becomes constant asymptotically. The
α
α
exponent is the scaling exponent that determines the slope of the distribution before
the long tail behavior begins. A power-law function can be transformed to a log-log
scale. So ( 5.1 ) can also be written as:
log y
= α
log x
+
log c
(5.2)
When written in this form, a fundamental property of power-laws becomes
apparent; when plotted in log-log space, power-laws are straight lines. Therefore,
the most simple and widely used method to check whether a distribution follows
a power-law and to deduce its parameters is to apply a logarithmic transformation,
and then perform linear regression in the resulting log-log space. The most widely
used method to check whether a distribution follows a power-law is to apply a
logarithmic transformation, and then perform linear regression, estimating the slope
of the function in logarithmic space to be
. The least-square regression method,
as done previously, has been shown to produce systematic bias due to fluctuations
of the long tail (Clauset et al. 2007). To determine a power-law accurately requires
minimizing the bias in the value of the scaling exponent and the beginning of the
long tail via maximum likelihood estimation. See Newman (2005) for the technical
details. To determine the
α
of the observed distributions, we fitted the data using the
maximum likelihood method recommended by Newman (2005).
The intuitive explanation of power-law parameters in the domain of tagging is
as follows: c represents the number of times the most common tag for that website
is used, while
α
gives the power-law decay parameter for the frequency of tags at
subsequent positions. Thus, the number of times the tag in position p is used (where
p
α
25, since we considered the tags in the top 25 positions) can be approximated
by a function of the form:
=
1
.
c
p α
Frequency
(
p
)=
(5.3)
where
is the frequency of the tag in the
first position in the tag distribution (thus, it is a constant that is specific for each
site/resource).
α >
0and c
=
Frequency
(
p
=
1
)
5.2.2
Empirical Results for Power Law Regression
for Individual Sites
For this analysis, we used two different data sets. The first data set contained a
subset of 500 “Popular” sites from del.icio.us that were tagged at least 2,000 times
(i.e. where we would expect a “converged” power law distribution to appear). The
second data set considers a subset of another 500 sites selected randomly from
the “Recent” section of del.icio.us. Both sections are prominently displayed on
Search WWH ::




Custom Search