Section Heading Recognition in Electronic Health Records Using Conditional Random Fields - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

EHRs. Apparently, recognition of clinical entities such as drugs and diseases in

clinical narratives is one of the fundamental tasks for clinical NLP systems. Several

clinical NLP systems, such as MedLEE (Friedman, Shagina et al. 2004), MetaMap

(Aronson 2001), and cTAKES (Savova, Masanz et al. 2010) have been developed to

support the clinical entity recognition task. However, the judgment of clinical data

cannot be known solely from named entity level. For instance, “coronary heart dis-

ease” has different clinical significance in the section of past medical history or fami-

ly medical history. The frequent use of author- and domain-specific idiosyncrasies,

acronyms and abbreviations that exist in different parts of an EHR also increases dif-

ficulty for NLP systems to understand the semantics. For example, the acronym “BS”

means “blood sugar” in the laboratory section, but indicates “bowel sounds” in the

section of abdominal exams (Denny, Miller et al. 2008).

The task of recognizing section heading can improve the understanding of clinical

records and aid the disambiguation of the meaning of information (Denny, Miller

et al. 2008). Unfortunately, recognition of section headings appears to be a challeng-

ing task. First of all, the names of section headings do not follow a universal system.

For the section of a chief problem, “chief complaint”, “presenting complaint”, “pre-

senting problems”, “reason for encounter”, or even the use of abbreviation “CC” may

be a possible name. Furthermore, the hierarchies of sections vary from record to

record. For instance, “Laboratory section” and “radiology section” may be two sepa-

rated parts, or both may be put together under the “data section”. “Impression and

assessment” may be separated individually or merged together into one section. Occa-

sionally, the same section name can infer different definitions. “CC” can refer to

“chief complaint” in a discharge summary, or “carbon copy” in a clinical narrative

written in email. “Impression” might mean the overall diagnosis of a patient, or the

subsection of image studies. Therefore, section recognition entirely based on dictiona-

ries or patterns may not always work. In light of this issue, this work compiled a sec-

tion recognition corpus on top of the dataset released by the i2b2 2014 shared task

(Stubbs, Kotfila et al. 2014) and presented a machine learning approach based on the

linear chain conditional random fields (CRF) model (Lafferty, McCallum et al.

2001)to deal with the section heading recognition task for EHRs. A set of features

were proposed and their effectiveness were studied in this work.

2

Methodology

2.1

Section Heading Corpus Generation

To the best of the authors' knowledge, currently there is no openly available corpus

annotated with medical section heading information. Therefore, this work used a sec-

tion heading corpus from the dataset of Track 2 of the i2b2 2014 shared-task (Stubbs,

Kotfila et al. 2014). The section heading strings in the clinical note section header

terminology (SecTag) (Denny, Miller et al. 2008) were used to tag all plausible can-

didate headings mentioned in EHRs. Afterwards, the machine-generated annotations

were manually corrected by the first author of this paper, who is a doctor of medicine.

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home