Biomedical Engineering Reference
In-Depth Information
epidemic [7,14]. Therefore, it is important to be able to track and predict the emergence
and spread of flu in the population.
The Center for Disease Control and Prevention (CDC) [3] monitors influenza-like
illness (ILI) cases by collecting data from sentinel medical practices, collating reports
and publishing them on a weekly basis. It is highly authoritative in the medical field but
as diagnoses are made and reported by doctors, the system is almost entirely manual,
resulting in a 1-2 weeks delay between the time a patient is diagnosed and the moment
that data point becomes available in aggregate ILI reports. Public health authorities
need to be forewarned at the earliest to ensure effective preventive intervention, and this
leads to the critical need of more efficient and timely methods of estimating influenza
incidences.
Several innovative surveillance systems have been proposed to capture the health
seeking behaviour and transform them into influenza activity. These include monitor-
ing call volumes to telephone triage advice lines [6], over the counter drug sales [15],
and patients visit logs on Physicians for flu shots. Google Flu Trends uses aggregated
historical log on online web search queries pertaining to influenza to build a compre-
hensive model that can estimate nationwide ILI activity [9].
In this paper, we investigate the use of a novel data source, OSN data, which takes
advantage of the timeliness of early detection to provide a snapshot of the current epi-
demic conditions and makes influenza related predictions on what may lie ahead, on a
daily or even hourly basis. We sought to develop a model which estimates the number
of physician visits per week related to ILI as reported by CDC.
Our approach treats OSN users within United States as “sensors” and collective mes-
sage exchanges showing flu symptoms like “I have Flu”, “down with swine flu”,etc. - as
early indicators and robust predictors of influenza. We expect these posts on OSN's to
be highly correlated to the number of ILI cases in the population. We analyze messages,
build prediction models and discover trends within data to study the characteristics and
dynamics of disease outbreak. We validate our model by measuring how well it fits the
CDC ILI rates over the course of two years from 2009 to 2011. We are interested in
looking at how the seasonal flu spreads within the population across different regions
of USA and among different age groups.
In this paper, we extend our preliminary analysis [1,2], and provide a continuing
study of using OSN's to track the emergence and spread of seasonal flu in the year
2010-2011. OSN data which demonstrated high correlation with CDC ILI rate for the
year 2009-2010, was affected by spurious messages and so text mining techniques were
applied. We show that text mining can significantly enhance the correlation between the
OSN data and the ILI data from CDC, providing a strong base for accurate prediction
of ILI rate.
For prediction, we build an auto-regression with exogenous input (ARX) model
where ILI rates of previous weeks from CDC form the autoregressive component of
the model, and the OSN data serve as exogenous input. Our results show that while
previous ILI data from CDC offer a realistic (but delayed) measure of a flu epidemic,
OSN data provides a real-time assessment of the current epidemic condition and can
be used to compensate for the lack of current ILI data. We observe that the OSN data
are in fact highly correlated with the ILI data across the different regions within United
Search WWH ::




Custom Search