Advanced Text Capabilities - The Definitive Guide to Java Swing

Java Reference

In-Depth Information

// Finally load the reader into it

// The final true argument says to ignore the character set

parser.parse(reader, callback, true);

// Examine contents

Iterating Through HTML Documents

After you have the HTML document loaded, in addition to just displaying the content inside a

JEditorPane , you may find it necessary to parse through the content yourself to pull out various

pieces. The HTMLDocument supports at least two manners of iteration through the content via

the HTMLDocument.Iterator and ElementIterator classes.

HTMLDocument.Iterator Class

To use the HTMLDocument.Iterator , you ask an HTMLDocument to give you the iterator for a specific

HTML.Tag . Then, for each instance of the tag in the document, you can look at the attributes of

the tag.

The HTML.Tag class includes 76 class constants for all the standard HTML tags (which the

HTMLEditorKit understands), such as HTML.Tag.H1 for the <H1> tag. These constants are listed in

Table 16-7.

Table 16-7. HTML Tag Constants

A

DIR

IMG

SCRIPT

ADDRESS

DIV

IMPLIED

SELECT

APPLET

DL

INPUT

SMALL

AREA

DT

ISINDEX

SPAN

B

EM

KBD

STRIKE

BASE

FONT

LI

STRONG

BASEFONT

FORM

LINK

STYLE

BIG

FRAME

MAP

SUB

BLOCKQUOTE

FRAMESET

MENU

SUP

BODY

H1