Database Reference
In-Depth Information
Semantic Aspects of digg
Story Popularity
number of Diggs occurring in the first 5-10% of
the users' lifetime). This is not particularly surpris-
ing since users are more enthusiastic and eager to
explore the service once they discover it. As time
goes by, their enthusiasm wears off resulting in
more stable usage patterns.
A further noteworthy observation can be
made by comparing the aggregate user activity
time of Figure 6(a) series with the three sample
activity time series of Figure 6(b) which come
from three individual users. It appears that the
individual activity time series do not present any
distinct pattern. On the other hand, the aggregate
activity is quite stable (though the high variance
indicated by the magnitude of the shaded area in
6(a) implies the instability of the individual time
series). Thus, it appears that a set of independent
behavior patterns of individuals leads to a stable
mass behavior when aggregated.
In this section, we are going to employ the
previously discussed feature selection and text
classification techniques in order to investigate
the potential of popularity prediction based on
text features. For that reason, we consider large
samples of stories out of the dataset collected from
Digg. The majority of Digg stories are written
in English, with few stories being in German,
Spanish, Chinese and Arabic. We filtered out
such non-English text items by means of checking
against characters or symbols that are particular
to those languages (e.g. characters with umlaut,
non-ASCII characters, etc.).
First, a random sample of N = 50,000 English
Digg stories was drawn from the extended dataset
and was processed to extract the text features. For
each story, the information on whether it became
popular or not was available, so, after extracting
Figure 6. Aggregate user Digging behavior. In subfigure 6(a), the embedded graph contains a zoomed
view of the [0.1, 1.0] interval
Search WWH ::




Custom Search