Java Reference
In-Depth Information
■
Note
Only those HTML tag constants that have been previously flagged as a block tag—where the
isBlock()
method for the tag returns
true
—will work with the
HTMLDocument.Iterator
. For instance,
STRONG
is not a block tag, while
H1
is.
After you have the specific iterator to work with, you can look at the specific attributes and
content of each instance of the tag through the help of the class properties shown in Table 16-8.
Table 16-8.
HTMLDocument.Iterator Properties
Property Name
Data Type
Access
attributes
AttributeSet
Read-only
endOffset
int
Read-only
startOffset
int
Read-only
tag
HTML.Tag
Read-only
valid
boolean
Read-only
The other piece of the iteration process is the
next()
method, which lets you get the next
instance of the tag in the document. The basic structure of using this iterator is as follows:
// Get the iterator
HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A);
// For each valid one
while (iterator.isValid()) {
// Process element
// Get the next one
iterator.next();
}
This can also be expressed in a basic
for
loop construct:
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A);
iterator.isValid();
iterator.next()) {
// Process element
}
Listing 16-6 demonstrates the use of
HTMLDocument.Iterator
. This program prompts you
for a URL from the command line, loads the file synchronously, looks for all the
<A>
tags, and
then displays all the anchors listed as
HREF
attributes. Think of this as a simple “spidering”
application in which you can build up a database of URL links between documents. The start