Information Technology Reference
In-Depth Information
EHRs. Apparently, recognition of clinical entities such as drugs and diseases in
clinical narratives is one of the fundamental tasks for clinical NLP systems. Several
clinical NLP systems, such as MedLEE (Friedman, Shagina et al. 2004), MetaMap
(Aronson 2001), and cTAKES (Savova, Masanz et al. 2010) have been developed to
support the clinical entity recognition task. However, the judgment of clinical data
cannot be known solely from named entity level. For instance, “coronary heart dis-
ease” has different clinical significance in the section of past medical history or fami-
ly medical history. The frequent use of author- and domain-specific idiosyncrasies,
acronyms and abbreviations that exist in different parts of an EHR also increases dif-
ficulty for NLP systems to understand the semantics. For example, the acronym “BS”
means “blood sugar” in the laboratory section, but indicates “bowel sounds” in the
section of abdominal exams (Denny, Miller et al. 2008).
The task of recognizing section heading can improve the understanding of clinical
records and aid the disambiguation of the meaning of information (Denny, Miller
et al. 2008). Unfortunately, recognition of section headings appears to be a challeng-
ing task. First of all, the names of section headings do not follow a universal system.
For the section of a chief problem, “chief complaint”, “presenting complaint”, “pre-
senting problems”, “reason for encounter”, or even the use of abbreviation “CC” may
be a possible name. Furthermore, the hierarchies of sections vary from record to
record. For instance, “Laboratory section” and “radiology section” may be two sepa-
rated parts, or both may be put together under the “data section”. “Impression and
assessment” may be separated individually or merged together into one section. Occa-
sionally, the same section name can infer different definitions. “CC” can refer to
“chief complaint” in a discharge summary, or “carbon copy” in a clinical narrative
written in email. “Impression” might mean the overall diagnosis of a patient, or the
subsection of image studies. Therefore, section recognition entirely based on dictiona-
ries or patterns may not always work. In light of this issue, this work compiled a sec-
tion recognition corpus on top of the dataset released by the i2b2 2014 shared task
(Stubbs, Kotfila et al. 2014) and presented a machine learning approach based on the
linear chain conditional random fields (CRF) model (Lafferty, McCallum et al.
2001)to deal with the section heading recognition task for EHRs. A set of features
were proposed and their effectiveness were studied in this work.
2
Methodology
2.1
Section Heading Corpus Generation
To the best of the authors' knowledge, currently there is no openly available corpus
annotated with medical section heading information. Therefore, this work used a sec-
tion heading corpus from the dataset of Track 2 of the i2b2 2014 shared-task (Stubbs,
Kotfila et al. 2014). The section heading strings in the clinical note section header
terminology (SecTag) (Denny, Miller et al. 2008) were used to tag all plausible can-
didate headings mentioned in EHRs. Afterwards, the machine-generated annotations
were manually corrected by the first author of this paper, who is a doctor of medicine.
Search WWH ::




Custom Search