HTNSystem: Hypertension Information Extraction System for Unstructured Clinical Notes* - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

(EMR) systems [3]. Researchers often spend lot of time and resources in extracting

patient information from unstructured clinical notes. Specifically, it is more challeng-

ing and tedious to extract HTN information manually as the HTN information is

usually mentioned in multiple records for a single patient. At the same time, coding

this HTN information to standard ontologies like SNOMED-CT adds another burden

to the manual extraction.

Simple clinical text mining techniques can be employed to extract HTN informa-

tion very easily from unstructured clinical notes. There are various tools to extract

HTN information from unstructured clinical notes or biomedical text. However, these

tools have limited capabilities in extracting HTN information. For example, MetaMap

[4] is a popular biomedical text information extraction system which is capable of

identifying HTN mentions but can't infer HTN information based on medications or

lab values. On the other hand, there are rule based tools that can recognize blood pres-

sure (BP) values or medications but not capable of inferring whether the values or

medications are relevant to HTN [5-7]. In other words, these systems can not differen-

tiate between high BP and low BP. In addition, differences in what range of BP values

are considered as HTN vary from country to country. In this study, we present a sim-

ple HTN information extraction system called HTNSystem which is capable of ex-

tracting mentions of hypertension and inferring HTN information from BP lab values

from unstructured clinical notes. HTNSystem is a rule-based information system

which implements MetaMap as a core component together with a custom built BP

value extractor and rule-based post processing components. The BP value extractor

component was originally built as part of TMUNSW system developed for 2014

i2b2/UTHealth Shared-Task 2 and 4 [8, 9]. As part of HTNSystem the old BP value

extractor is significantly improved to increase performance (more details in results

section). Overall, HTNSystem is generic and highly configurable allowing end users

and developers to customize HTNSystem according to their preferences or suggested

clinical guidelines.

2

Materials and Methods

2.1

2014 i2b2/UTHealth Shared-Task 2 Corpus

The 2014 i2b2/UTHealth Shared-Task 2 1 corpus is a clinical data set distributed by

organizers [10]. The corpus represents longitudinal data of diabetic patients collected for

the purpose of identifying CVD risk factors. It was distributed as a part of shared Task

in three sets. Table 1 presents a summary level statistics of the corpus. Two training sets

consist of 521 and 269 unstructured clinical notes (from here on referred as records)

respectively and a test set with 514 records. The records in the training data set

were distributed in XML (Extensible Markup Language) format and included annota-

tions on CVD risk factors. Each record in the corpus was manually annotated by three

different annotators. The risk factors identified in the corpus were Hypertension,

Diabetes, Obesity, Medication, Coronary artery disease and Smoking history. Three

1 https://www.i2b2.org/NLP/HeartDisease/

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home