Java Reference
In-Depth Information
Note Only those HTML tag constants that have been previously flagged as a block tag—where the
isBlock() method for the tag returns true —will work with the HTMLDocument.Iterator . For instance,
STRONG is not a block tag, while H1 is.
After you have the specific iterator to work with, you can look at the specific attributes and
content of each instance of the tag through the help of the class properties shown in Table 16-8.
Table 16-8. HTMLDocument.Iterator Properties
Property Name
Data Type
Access
attributes
AttributeSet
Read-only
endOffset
int
Read-only
startOffset
int
Read-only
tag
HTML.Tag
Read-only
valid
boolean
Read-only
The other piece of the iteration process is the next() method, which lets you get the next
instance of the tag in the document. The basic structure of using this iterator is as follows:
// Get the iterator
HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A);
// For each valid one
while (iterator.isValid()) {
// Process element
// Get the next one
iterator.next();
}
This can also be expressed in a basic for loop construct:
for (HTMLDocument.Iterator iterator = htmlDoc.getIterator(HTML.Tag.A);
iterator.isValid();
iterator.next()) {
// Process element
}
Listing 16-6 demonstrates the use of HTMLDocument.Iterator . This program prompts you
for a URL from the command line, loads the file synchronously, looks for all the <A> tags, and
then displays all the anchors listed as HREF attributes. Think of this as a simple “spidering”
application in which you can build up a database of URL links between documents. The start
 
Search WWH ::




Custom Search