Information Technology Reference
In-Depth Information
slopes of each (
parameters) were compared. The slopes indicate the fundamental
characteristic of the power laws, as vertical shifts can and do vary significantly
between different sites.
Our analysis shows that for the subset of heavily tagged sites, the slope
parameters are very similar to one another, with an average of
α
α =
1
.
22 and
a standard deviation
03. Thus, it appears that the power law decay slope is
relatively consistent across all sites. This is quite remarkable, given that these sites
were chosen randomly with the only criteria being that they were heavily tagged.
This pattern where the top tags are considerably more popular than the rest of
the tags seems to indicate a fundamental effect of the way tags are distributed in
individual websites which is independent of the content of individual websites. The
specific content of the tags themselves can be very different from one website to
another and this obviously depends on the content of the tagged site.
For the set of less-heavily tagged sites, we found the slopes differed from each
other to a much greater extent than with the heavily tagged data, with an average
α =
±
0
.
10. Clearly, the power law effect is much
less pronounced for the less-heavily tagged sites as opposed to the heavily tagged
sites, as the standard deviation reveals a much poorer fit of the regression line to the
log-log plotted aggregate data. For sites with relatively few instances of tagging, the
results reveal mostly noise.
5
.
06 and standard deviation
±
6
.
5.2.3
Empirical Results for Power-Law Regression Using
Relative Frequencies
In the previous section, we applied power law regression techniques to individual
sites, using the number of hits for a tag in a given position in the distribution. In
this section, we examine the aggregate case where we no longer use the raw number
of tags (because these are not directly comparable across sites), and instead use the
relative frequencies of tags. The relative frequency is defined as the ratio between
the number of times a tag in a particular position is used for a resource and the total
number of times that resource is tagged. 5 Thus, relative frequencies for a given site
always sum to one. These relative frequencies based on data from all 500 sites of
the “Popular” data set were then averaged. Results are presented in Fig. 5.3 .
As before, a power-law was derived in the log-log space using least-means
squares (LMS) regression. This power law was found to have the slope
25.
The regression error, computed through the LMS method in the normal, not
logarithmic space, was found to be 0.038. Note that the LMS regression error
computation only makes sense when converted back in the normal space, since in
the log-log space exponents are negative and, furthermore, deviations on the y-axis
denote actual error only after the ex p 2 function is applied. This corresponds to an
α =
1
.
5 To be more precise, the denominator is taken as the total number of times the resource is tagged
with a tag from the top 25 positions, given available data.
Search WWH ::




Custom Search