Java Reference
In-Depth Information
// Finally load the reader into it
// The final true argument says to ignore the character set
parser.parse(reader, callback, true);
// Examine contents
Iterating Through HTML Documents
After you have the HTML document loaded, in addition to just displaying the content inside a
JEditorPane , you may find it necessary to parse through the content yourself to pull out various
pieces. The HTMLDocument supports at least two manners of iteration through the content via
the HTMLDocument.Iterator and ElementIterator classes.
HTMLDocument.Iterator Class
To use the HTMLDocument.Iterator , you ask an HTMLDocument to give you the iterator for a specific
HTML.Tag . Then, for each instance of the tag in the document, you can look at the attributes of
the tag.
The HTML.Tag class includes 76 class constants for all the standard HTML tags (which the
HTMLEditorKit understands), such as HTML.Tag.H1 for the <H1> tag. These constants are listed in
Table 16-7.
Table 16-7. HTML Tag Constants
A
DIR
IMG
SCRIPT
ADDRESS
DIV
IMPLIED
SELECT
APPLET
DL
INPUT
SMALL
AREA
DT
ISINDEX
SPAN
B
EM
KBD
STRIKE
BASE
FONT
LI
STRONG
BASEFONT
FORM
LINK
STYLE
BIG
FRAME
MAP
SUB
BLOCKQUOTE
FRAMESET
MENU
SUP
BODY
H1
META
TABLE
BR
H2
NOFRAMES
TD
CAPTION
H3
OBJECT
TEXTAREA
CENTER
H4
OL
TH
CITE
H5
OPTION
TITLE
CODE
H6
P
TR
COMMENT
HEAD
PARAM
TT
CONTENT
HR
PRE
U
DD
HTML
S
UL
DFN
I
SAMP
VAR
Search WWH ::




Custom Search