Database Reference
In-Depth Information
use [15]. Capturing some of these aspects during the mining process is
extremely challenging.
2.2.4 Modeling Challenges. There are several challenges
that need to be overcome in the modeling stage of the data mining pro-
cess for medical sensor data. First of all, the time series nature of the
data often requires the application of sequential mining algorithms that
are often more complex than conventional machine learning techniques
(e.g., standard supervised and unsupervised learning approaches). Non-
stationarities in time series data necessitate the use of modeling tech-
niques that can capture the dynamic nature of the state of the underlying
processes that generate the data. Known techniques for such problems,
including discrete state estimation approaches (e.g. dynamic bayesian
networks and hidden Markov models) and continuous state estimation
approaches (e.g. Kalman Filters or recurrent neural networks) have been
used only in limited settings.
Another challenge arises due to the inherent distributed nature of
these applications. In many cases, communication and computational
costs, as well as sharing restrictions for patient privacy prevent the ag-
gregation of the data in a central repository. As a result, the modeling
stage needs to use complex distributed mining algorithms. In remote
settings, there is limited control on the data acquisition at the sensor.
Sensors may be disconnected for privacy reasons or for resource man-
agement reasons (e.g., power constraints), thereby affecting the data
available for analysis. Modeling in these conditions may also require the
distribution of analytic approaches between the central repository and
the sensors. Optimizing the modeling process becomes a challenging dis-
tributed data mining problem that has received only limited attention
in the data mining community.
Modeling in healthcare mining is also hindered by the ability to obtain
ground truth on the data. Labels are often imprecise and noisy in the
medical setting. For instance, a supervised learning approach for the
early detection of a chronic disease requires well-labeled training data.
However, domain experts do not always know exactly when a disease
has started to manifest itself in a body, and can only approximate this
time. Additionally, there are instances of misdiagnosis that can lead to
incorrect or noisy labels that can degrade the quality of any predictive
models.
In clinical settings, physicians do not have the luxury of being able
to try different treatment options on their patients for exploration pur-
poses. As a result, historical data sets used in the mining process tend
to be quite sparse and include natural biases driven by the way care was
Search WWH ::




Custom Search