Database Reference
In-Depth Information
From CPUs to Semantic Integration
Cerner has long been focused on applying technology to healthcare, with much of our his-
tory emphasizing electronic medical records. However, new problems required a broader
approach, which led us to look into Hadoop.
In 2009, we needed to create better search indexes of medical records. This led to process-
ing needs not easily solved with other architectures. The search indexes required expensive
processing of clinical documentation: extracting terms from the documentation and resolv-
ing their relationships with other terms. For instance, if a user typed “heart disease,” we
wanted documents discussing a myocardial infarction to be returned. This processing was
quite expensive — it can take several seconds of CPU time for larger documents — and we
wanted to apply it to many millions of documents. In short, we needed to throw a lot of
CPUs at the problem, and be cost effective in the process.
Among other options, we considered a staged event-driven architecture (SEDA) approach
to ingest documents at scale. But Hadoop stood out for one important need: we wanted to
reprocess the many millions of documents frequently, in a small number of hours or faster.
The logic for knowledge extraction from the clinical documents was rapidly improving,
and we needed to roll improvements out to the world quickly. In Hadoop, this simply
meant running a new version of a MapReduce job over data already in place. The process
documents were then loaded into a cluster of Apache Solr servers to support application
queries.
These early successes set the stage for more involved projects. This type of system and its
data can be used as an empirical basis to help control costs and improve care across entire
populations. And since healthcare data is often fragmented across systems and institutions,
we needed to first bring in all of that data and make sense of it.
With dozens of data sources and formats, and even standardized data models subject to in-
terpretation, we were facing an enormous semantic integration problem. Our biggest chal-
lenge was not the size of the data — we knew Hadoop could scale to our needs — but the
sheer complexity of cleaning, managing, and transforming it for our needs. We needed
higher-level tools to manage that complexity.
Search WWH ::




Custom Search