Online Social Networks Flu Trend Tracker: A Novel Sensory Approach to Predict Flu Trends - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

using current and past OSN activity, and CDC data from previous weeks. The prediction

of current ILI activity using ILI activity from previous weeks forms the autoregressive

component of the model, while the OSN data from previous weeks serve as exogenous

inputs. By CDC data, we refer to the percentage of visits to a physician for Influenza-

Like Illness (also called ILI rate).

5.1

Influenza Model Structure

Although the percentage of physician visits is between 0% and 100%, the number of

OSN users is bounded below by 0. Simple Linear ARX neglects this fact in the model

structure. Therefore, we introduce a logit link function for CDC data and a logarithmic

transformation of the OSN data as follows:

Logistic ARX Model

log y ( t )

1

=

a i log y ( t

+

m

n− 1

−

i )

b j log( u ( t

−

j )) + c + e ( t ) (1)

−

y ( t )

1

−

y ( t

−

i )

i =1

j =0

where t indexes weeks, y ( t ) denotes the percentage of physician visits due to ILI in

week t , u ( t ) represents the number of unique Twitter/Facebook users with flu related

tweets in week t ,and e ( t ) is a sequence of independent random variables. c is a constant

term to account for offset. In our tests, the number of unique OSN users u ( t ) is defined

as Twitter users without retweets and having no tweets from the same user within syn-

drome elapsed time of 0 week or Facebook users having no posts from the same user

within syndrome elapsed time of 0 week. The flu related messages are defined as posts

with keywords “flu”, “H1N1” and “swine flu”. The rationale for the model structure in

Eq. (1) is that OSN data provides real-time assessment of the flu epidemic. However,

the OSN data may be disturbed at times by events related to flu, such as news reports of

flu in other parts of the world, but not necessarily to local people actually getting sick

due to ILI. On the other hand, the CDC data provides a true, albeit delayed, assessment

of a flu epidemic. Hence, by using the CDC data along with the OSN data, we may be

able to take advantage of the timeliness of the OSN data while overcoming the distur-

bance that may be present in the OSN data.

The objective of the model is to provide timely updates of the percentage of physi-

cian visits. To predict such percentage in week t, we assume that only the CDC data

with at least 2 weeks of lag is available for the prediction, if past CDC data is present

in a model. The 2-week lag is to simulate the typical delay in CDC data reporting and

aggregation. For the OSN data, we assume that the most recent data is always available,

if a model includes the OSN data terms. In other words, the most current CDC or OSN

data that can be used to predict the percentage of physician visits in week t is week t-2

for the CDC data and week t for the OSN data.

In order to predict ILI rates in a particular week given current OSN data and the most

recent ILI data from the CDC we must estimates the coefficients, a i ,b j and c in Eq.

(1). Also, in practice, the model orders m and n are unknown and must be estimated. In

our experiment, we vary m from 0 to 2 and n from0to3inEq.(1)inordertoobtain

the best values of m and n to use for prediction. Intuitively, this answers the question

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home