Extreme scale clinical analytics with open source software - Open Source Software in Life Science Research

Biomedical Engineering Reference

In-Depth Information

3. SPECIALIST Lexicon and lexical tools. The SPECIALIST Lexicon

adds over 200 000 additional terms from various sources and includes

commonly occurring English words. The lexical tools are used to

assist in Natural Language Processing.

UMLS bridges the terminology users will use in accessing the analytical

data store and the codes contained in the documents. For example, a user

wants to fi nd all documents related to 'Acute Myocardial infarction' in a

clinical data store with documents coded using SNOMED CT. With

UMLS, users can fi nd a mapping from the English term to the SNOMED

CT code, and then do a second query to fi nd all SNOMED CT codes

whose ancestor is 'Acute Myocardial infarction'. The results of this

second query can be used as a fi lter in the analytical data store.

UMLS does not solve all text matching and text scrubbing problems.

Our experience tells us that the last mile of matching is a continuous

refi nement and build-up of rules and samples that can be matched as time

progresses. If made confi gurable, end-users can populate the queries that

help with the mappings and data extraction.

20.6 Open source databases

The next step is data storage. Our use-case poses several challenges on

choice of technologies. First, it is becoming increasingly diffi cult to build

a single system that supports the myriad of implementation details of

even small regional sets of healthcare providers. CDA is fl exible and

extensible, so similarly fl exible mechanisms to store a complete set of

disparate, raw data are required. Second, the volume of data is expected

to be extremely large. Medical records, radiology images, and lab or

research data are notorious for large fi le components of high-fi delity

information that contain more information than is immediately usable

given any immediate questions. These requirements generally wreak

havoc on traditional application development. Third, as our understanding

of healthcare and the human body evolves, we need to support new

questions being asked of old data. The goal, therefore, is to evaluate open

source technology's ability to meet the following requirements:

1. ability to store extreme amounts of data in a fl exible schema;

2. ability to re-process this data on-demand with new business rules;

3. ability to re-process data to create marts or data-cubes that allow ad

hoc analysis on new questions.

Search WWH ::

Custom Search

Home