Information Technology Reference
In-Depth Information
For our manual annotation of EHRs, only the topmost section headings are anno-
tated; if there are section headings whose contents also belong to a superior section or
can be viewed as subsections themselves, these headings are removed from the anno-
tations regardless of their section level in other EHRs. For example, in an EHR, if
both “laboratory” and “radiology” appeared and can be considered as subsections of
“data”, then only the superior section “data” is annotated. But if “laboratory” and
“radiology” are two separate sections without a superior section that covers them,
both sections are annotated. In another case, “impression” is annotated if it is the
topmost section; however, if the content of the “impression” section clearly contains
the data of certain reports, such as X-ray or echography, and its description was fol-
lowed by section headings like “cardiac echography” or “chest X-ray”, then the anno-
tation of the “impression” section is removed. In addition, if the name of a topmost
section consists of two merged concepts, it is still annotated as one section. For in-
stance, some EHRs join “impression” and “plan” as one section “impression/plan”,
while others may separate “plan” from “impression”.
2.2
Section Heading Recognition
For a given EHR, the raw text was extracted and the original line breaks were re-
tained. The text distinguished by the line breaks was processed by the MedPost tagger
(Smith, Rindflesch et al. 2004) to further split it into lines of texts that consist of to-
kens. Each line was then aligned with experts' annotations to generate the training
instances for CRF. This work employed the IOB tag scheme to represent the annota-
tions for section heading; the B tag indicates the beginning of a section heading boun-
dary, the I tag indicates contents inside the boundary of a section heading, and the O
tag means contents outside of a section heading. Figure 1 illustrates an example, in
which the assigned tag is highlighted in bold.
Record /B date /I : /I 2149-03-19 /O
Reason /B for /I visit /I 67 /O yo /O man /O with /O DM /O , /O
Patient /O was /O in /O his /O
Selected /B recent /I labs /I chem /O 7 /O : /O 140 /O / /O 4.0 /O / /O
Fig. 1. A sample EHR annotated with section heading information
For each token, a set of features were extracted and the CRF model was used to
build the section heading recognizer. The following subsections elaborate the features
developed for this work.
Word Features
Apparently, the word of a target token and words preceding or following the target
token can be useful for determining the target token's assigned tag. This work used
Search WWH ::




Custom Search