Biomedical Engineering Reference
In-Depth Information
States. Using fine-grained analysis on user demographics and geographical locations
along with prediction capabilities will provide public health authorities an insight into
current seasonal flu activities.
This paper is organized as follows: Section 2 describes applications that harness the
collective intelligence of online social network (OSN) users, to predict real-world out-
comes. In Section 3, we give a brief introduction to our data collection and modeling
framework. In Section 4, we introduce our data filtering technique for extracting relevant
information from the Twitter and Facebook datasets. Detailed data analysis is performed
to establish correlation with CDC reports on ILI rates. Then we go one step further and
introduce our influenza prediction model in Section 5. In Section 6, we perform region-
wise analysis of flu activities in the population based on the Twitter and Facebook. Fi-
nally we conclude in Section 7 and acknowledgements are provided in Section 8.
2
Related Work
A number of measurement related studies have been conducted on different forms of
social networks like Del.icio.us, Facebook and Wikipedia etc [8,22]. Sitaram et al.
demonstrated how social media content like chatter from Twitter can be used to pre-
dict real-world outcomes of forecasting box-office revenues for movies [21]. Sakaki
et al. used a probabilistic spatio-temporal model to build an autonomous earthquake
reporting system in Japan using twitter users as sensors and applying Kalman filter-
ing and particle filtering for location estimation [19]. Meme Tracking in news cycles
as explained by Leskovec et al. was an attempt to model information diffusion in
social media like blogs and tracking handoff from professional news media to social
networks [13].
Ginsberg et al. in his paper discussing his approach for estimating Flu trends proposed
that the relative frequency of certain search terms are good indicators of the percentage of
physician visits and established a linear correlation to weekly published ILI percentages
between 2003 and 2007 for all nine regions identified by CDC [9]. Culotta used a docu-
ment classification component to filter misleading messages out of Twitter and showed
that a small number of flu-related keywords can forecast future influenza rates [5].
OSN data has been used for real-time notifications such as large-scale fire emergen-
cies, downtime on services provided by content providers [17] and live traffic updates.
There have been efforts in utilizing twitter data for measuring public interest/concern
about health-related events [18,20], predicting national mood [16], currency tracing and
performing market and risk analysis [10] . Tweetminster, a media utility tool design to
make UK politics open and social, analyses political tweets, to establish the correlations
between buzz on Twitter and election results. In June 2010, we introduced the SNEFT
architecture as a continuous data collection engine which combines the detection and
prediction capability on social networks in discovering real world flu trends [1,2,4].
3
Data Collection
In this section we describe our data collection methodology by introducing the SNEFT
architecture, provide a description of our dataset, explore strategies for data cleaning,
and apply filtering techniques in order to perform quantitative spatio-temporal analysis.
Search WWH ::




Custom Search