Database Reference
In-Depth Information
with sophisticated data filtering and interpolation techniques to remove
and correct, when possible, data anomalies. The pre-processing stage
is also impacted by the lack of standard adoption by medical sensor
manufacturers. Indeed, data generated in different formats needs to be
syntactically aligned before any analysis can take place. Furthermore, a
semantic normalization is often required to cope with differences in the
sensing process. As an illustration, a daily reported heart rate measure
may correspond to a daily average heart rate in some cases, while in
other cases it may represent a heart rate average measured every morning
when the subject wakes up. Comparing these values in a data mining
application can yield incorrect conclusions, especially if they are not
semantically distinguished.
Another key pre-processing challenge involves data synchronization.
Sensors report data with timestamps based on their internal clocks.
Given that clocks across sensors are often not synchronized, aligning
the data across sensors can be quite challenging. In addition, sensors
may report data at different rates. Hence, assumptions and alignment
strategies need to be carefully designed.
2.2.3 Transformation Challenges. Feature extraction is
often the most complex stage of the data mining process. The transfor-
mation of sensor data into spaces where good features can be extracted
requires a deep understanding of the problem at hand and needs to
be driven by domain experts. In medical informatics, this transforma-
tion requires expertise on the physiology of the body. Despite immense
progress in medicine and in our understanding of the human body, there
is still much to learn about all the data that we can sense today. For
instance, in neurological intensive care environments, neuro-intensivists
collect and interpret electroencephalograms signals that represent the
brain activity of their patients. These signals are extremely noisy and
not fully understood [14], yet they can be used to diagnose several con-
ditions (e.g., the onset of diverse forms of seizures). Extracting features
from EEG signals is often restricted to spectral analysis techniques de-
fined by domain experts.
In addition to signals that are not well understood, human sensing
adds different types of unstructured data that needs to be effectively in-
tegrated. This includes textual reports from examinations (by physicians
or nurses) that also need to be transformed into relevant features, and
aligned with the rest of the physiological measurements. These inputs
are important to the data mining process as they provide expert data,
personalized to the patients. However these inputs can be biased by
physician experiences, or other diagnosis and prognosis techniques they
Search WWH ::




Custom Search