Java Reference
In-Depth Information
Although the POI's capabilities are considerable, the process of extracting text is simple.
As shown here, a FileInputStream object representing the file, TestDocu-
ment.docx , is used by the ExtractorFactory class' createExtractor method
to select the appropriate POITextExtractor instance. This is the base class for sever-
al different extractors. The getText method is applied to the extractor to get the text:
try {
FileInputStream fis =
new FileInputStream("TestDocument.docx");
POITextExtractor textExtractor =
ExtractorFactory.createExtractor(fis);
System.out.println(textExtractor.getText());
} catch (IOException ex) {
// Handle exceptions
} catch (OpenXML4JException | XmlException ex) {
// Handle exceptions
}
A part of the output is as follows:
Pirates
Pirates are people who use ships to rob other ships. At
least this is a common definition. They have also been
known as buccaneers, corsairs, and privateers. In
...
Our list includes:
Gan Ning
Awilda
...
Get caught
Walk the plank
This is not a recommended occupation.
It can be useful to know more about a Word document. POI supports a POIXMLProper-
tiesTextExtractor class that gives us access to core, extended, and custom proper-
ties of a document. There are two ways of readily getting a string containing many of
these properties.
Search WWH ::




Custom Search