Java Reference
In-Depth Information
<sentences>
<sentence id="1">
<word>When</word>
<word>the</word>
<word>day</word>
<word>is</word>
<word>done</word>
<word>we</word>
<word>can</word>
<word>sleep</word>
<word>.</word>
</sentence>
<sentence id="2">
<word>When</word>
<word>the</word>
<word>morning</word>
<word>comes</word>
<word>we</word>
<word>can</word>
<word>wake</word>
<word>.</word>
</sentence>
<sentence id="3">
<word>After</word>
<word>that</word>
<word>who</word>
<word>knows</word>
<word>.</word>
</sentence>
</sentences>
</document>
We will reuse the code from the previous example. However, we will open the
XMLText.xml
file instead, and use
DocumentPreprocessor.DocType.XML
as
the second argument of the constructor of the
DocumentPreprocessor
class, as
shown next. This will specify that the processor should treat the text as XML text. In addi-
tion, we will specify that only those XML elements that are within the
<sentence>
tag
should be processed: