Database Reference
In-Depth Information
3.3 General Mining of Clinical Sensor Data
In general clinical settings, data mining is often confine to the mining
of Electronic Health Records (EHR) that do contain sensor data (e.g.,
patient vital signs). With the capture of patient medical histories into
EHRs and the strong push worldwide to introduce EHRs in healthcare
systems, systems capable of mining these data are receiving more and
more attention. EHRs are data rich. They include structured and un-
structured comprising of all the key administrative clinical data relevant
to patients, demographics, progress notes, problems, medications, vital
signs, past medical history, immunizations, laboratory data, diverse test
results and radiology reports [16]. Unfortunately, there are no widely
accepted standards for the representation of all these data points stored
in EHR systems. Several code systems (e.g., ICD-9, ICD-10, CPT-4,
SNOWMED-CT [34]) and interoperability standards (e.g., HL7, HIE)
are in use by many systems but there are no overarching standards that
EHR vendors are adhering to. Despite this lack of global standardization
that is hindering the realization of very large scale data mining, many
researchers are spending considerable efforts to analyze these data sets
to improve healthcare in general.
In [16], EHR data are mined to derive relationships between diabetic
patients usage of healthcare resources (e.g., medical facilities, physicians)
and the severity of their diseases. In [17], Reconstructability Analysis
(RA) is applied to EHR data to find risk factors for various complications
of diabetes including myocardial infarction and microalbuminuria. RA is
an information-theoretic technique used for mining of data sets of large
dimensionality. In this setting, RA is used to induce relationships and
correlations between EHR variables by identifying strongly related sub-
sets of variables and to representing this knowledge in simplified models
while eliminating the connections between all other weakly correlated
subsets of variables.
In [18], data quality issues are reported while attempting to analyze
EHR data for a survival analysis study on records of pancreatic cancer
patients. Incomplete pathology reports for most of these patients forced
the authors to exclude them from their study. The authors conclude
this paper by suggesting complementing EHR data with more generic
patient related data to produce more complete patient representations
where such data mining studies can be performed.
Batal et. al. present in [19] an approach to find temporal patterns in
EHR data. At the core of their technique is the representation of longi-
tudinal patient records with temporal abstractions. These abstractions
are essentially summaries of intervals of time series data. For example,
Search WWH ::




Custom Search