Biomedical Engineering Reference
In-Depth Information
1
1
Without Text Classification
With Text Classification
Without Text Classification
With Text Classification
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
CDC defined Regions and overall USA
CDC defined Regions and overall USA
Fig. 4. Classified OSN (Twitter on left and Facebook on right) dataset achieves higher correlation
with CDC reports on Nationwide and Regional levels
should not be counted in the analysis. Out of 4.5 million tweets we collected, there
are 541K retweets, accounting for 12% of the total number of tweets.
- Syndrome Elapsed Time : An individual patient may have multiple encounters as-
sociated with a single episode of illness (e.g., initial consultation, consultation 1-2
days later for laboratory results, and follow-up consultation a few weeks later).
To avoid double counting from common pattern of ambulatory care, the first en-
counter for each patient within any single syndrome group is reported to CDC, but
subsequent encounters with the same syndrome are not reported as new episodes
until more than six weeks have elapsed since the most recent encounter in the same
syndrome [12]. We call this the Syndrome Elapse time.
Hence, we created different datasets namely: Twitter dataset with No Retweets (Tweets
starting with RT) and Twitter dataset without Retweets and with no tweets from same
user within certain syndrome elapsed time. For Facebook we create dataset namely
Facebook dataset with no posts from same user within certain syndrome elapsed time.
When we compared the different datasets mentioned in Table 2 with CDC data, we
found that Twitter dataset without Retweets showed a high correlation (0.8907) with
CDC Data. Similarly Facebook data with Syndrome elapse time of zero showed a high
correlation of 0.8728. As opposed to a common practice in public health safety, where
medical examiners within U.S. observe a syndrome elapse time period of six weeks
[12], user behaviour on Twitter and Facebook follows a trend wherein we do not ignore
successive posts from same user. Thus Twitter dataset without Retweets is our choice
of dataset for all subsequent experiments. Similarly Facebook data within same week
becomes our choice of dataset for all subsequent analysis.
From Figure 5, we observe that the Complementary Cumulative Distribution Func-
tion (CCDF) of the number of tweets posted by same individual on Twitter can be
fitted by a power law function of exponent -2.6429 and coefficient of determination (R-
square) 0.9978 with a RMSE of 0.1076 using Maximum likelihood estimation. Most
people tweet very few times (e.g., 82.5% of people only tweet once and only 6% of
people tweet more than two times). However, we do not observe the power-law behav-
ior in the CCDF of number of posts per user on Facebook, as shown in plot on the right
hand side of Figure 5.
Search WWH ::




Custom Search