Database Reference
In-Depth Information
A set of quantities measured within real-world
complex systems have been reported to exhibit
power-law behavior (Newman, 2005). Examples
of such quantities are the frequency of words in
the text of the Moby Dick novel, number of cita-
tions to scientific papers, number of calls received
AT&T telephone customers and others (Newman,
2005). Recent research on social web data has
confirmed the power-law nature of a series of
Web 2.0 originating distributions, e.g. tag usage
in delicious (Hotho et al., 2006a) and (Halpin et
al., 2007), number of votes to questions/answers
in the Yahoo! Answers system (Agichtein et al.,
2008), video popularity in YouTube and Daum
(Cha et al. 2007) and story popularity in Digg
(Papadopoulos et al., 2008).
Apart from the classic power-law distribution
of Equation (1) reported in the aforementioned
works and used for the subsequent analysis of
Digg popularity, a set of more elaborate models
have recently been proposed in the literature for
modeling skewed distributions. For instance, the
Discrete Gaussian Exponential is proposed by
Bi et al. (2001) as a generalization of the Zipf
distribution (i.e. power-law) to model a variety
of real-world distributions, e.g. user click-stream
data. Furthermore, statistical analysis of the distri-
bution of 29,684 Digg stories by Wu and Huberman
(2007) resulted in a log-normal distribution model
for the data. The truncated log-normal distribution
was also found by Gómez et al. (2008) to accu-
rately describe the in- and out-degree distributions
of the Slashdot user network formed on the basis
of their participation in the online discussion
threads. Finally, the recently formulated Double
Pareto log-normal distribution was presented by
Seshadri et al. (2008) as an accurate model for
a set of variables in a social network created by
mobile phone calls.
There are significant benefits in recognizing
and understanding the heavy-tail nature of skewed
distributions. As pointed by Bi et al. (2001),
typical statistical measures such as mean, median
and standard deviation are not appropriate for
summarizing skewed distributions. In contrast,
parametric models, such as the power-law or the
log-normal distribution, convey a succinct and ac-
curate view of the observed variable. Furthermore,
comparison of the observed variable with the fit-
ted model may reveal deviant behavior (outliers).
Similar benefits of employing a parsimonious
model such as the power-law to summarize and
mine massive data streams that depict skewed
distribution were reported in (Cormode & Muth-
ukrishnan, 2005). Finally, the work in (Cha et al.,
2007) demonstrated the utility of understanding
the heavy-tail content consumption patterns by
demonstrating a potential for 40% improvement
in video content consumption (in YouTube) by
alleviating information delivery inefficiencies of
the system (e.g. by recommending niche content
lying in the long tail of the distribution). Later
in the chapter, we will confirm the emergence of
heavy-tail distributions in Digg.
temporal Patterns of
content consumption
The study of the temporal aspects of online content
consumption and rating in the context of a Web
2.0 application has been beneficial for a series of
tasks, e.g. planning an online campaign, antici-
pating voluminous requests for content items or
detecting malicious user activities. For instance,
by analyzing the temporal activity patterns of
Slashdot users (i.e. posting and commenting on
posts of others), the authors of (Kaltenbrunner
et al., 2007b) could predict with sufficient accu-
racy the future comment activity attracted by a
particular post. Similarly, the study by Cha et al.
(2007) presents an analysis of the temporal video
content popularity patterns observed in YouTube
and demonstrates the potential for short-term
popularity prediction. Furthermore, studies of the
temporal aspects of story popularity in Digg were
carried out by Lerman (2007) and Papadopoulos
et al. (2008). In both studies, it was confirmed
that Digg stories when moved to the front page
Search WWH ::




Custom Search