Online Social Networks Flu Trend Tracker: A Novel Sensory Approach to Predict Flu Trends - Biomedical Engineering Systems and Technologies

Biomedical Engineering Reference

In-Depth Information

evaluated for prediction performance. However, they were still used in the training set

to estimate the values of the coefficients a i and b j in Eq. (1).

Considering the above constraints, our K-fold validation testing procedure is as fol-

lows:

1. For each ( m, n ) pair from m =0 , 1 , 2 and n =0 , 1 , 2 , 3 , repeat the following:

(a) Identify F , the index of first data sample that can actually be predicted. F =

max ( m +1 ,n )

(b) Represent the available data indices as t =1 ,...,T . Then divide the dataset

into K approximately equally sized subsets

{

S 1 ,S 2 ,...,S K }

, with each sub-

set comprising members that have an approximately equal time interval be-

tween them. For example, the first set would be S 1 =

{

y ( F ) ,y ( F + K ) ,y ( F +

2 K ) ,...

}

, the second would be S 2 =

{

y ( F +1) ,y ( F + K +1) ,y ( F +2 K +

and so on.

using all the other subsets with the least squares estimation technique. Based on

the estimated model parameter values and the associated prediction equations

in Eq. (2), predict the value of each member of S k .

2. For each ( m, n ) pair, we have obtained a prediction of the CDC time-series, y ( t )

for t = F mn ,...,T . Note that F still represents the first time index that can be

predicted. However, we use the subscript mn to emphasize the fact that F varies

depending on the values of m and n . By comparing the prediction with the true

CDC data, we calculate the root mean-squared error (RMSE) as follows:

1) ,...

}

F max +1

( y ( t )

−

y ( t )) 2

(4)

−

The RMSE is computed over t = F max ,...,T , regardless of techniques and model

orders to ensure fairness in comparison.

5.3

Cross Validation Results

We fit our model with Twitter data, Facebook data, and the combination of Twitter

and Facebook data. According to the cross validation results in Table 3 1 , the models

corresponding to m =2 and n =0 have the lowest RMSE for both Twitter and Face-

book. This indicates that two most recent data points are required to perform accurate

prediction of influenza rates using Twitter or Facebook data. However the model cor-

responding to m =1 and n =2 for the combination of Twitter and Facebook data

has the lowest RMSE among all models. Thus the model corresponding to m =1 and

n =2 is used for accurate prediction of influenza rates and it uses most recent CDC

ILI data, in addition to the two most recent OSN data points. In general, the addition

of OSN data improves the prediction with past CDC data alone. For the 10-fold cross

validation results presented in Table 3, for example, the AR model ( m =1 ,n =0)

Cross Validation Results presented for Twitter dataset differs from our previous work [2] as

we disregard the scaling effect caused by creation of new Twitter accounts over time.

Biomedical Engineering Systems and Technologies

Search WWH ::

Custom Search

Home