Java Reference
In-Depth Information
ElementIterator Class
Another way of examining the contents of an HTMLDocument is through the ElementIterator
(which is not specific to HTML documents). When working with an ElementIterator , you basically
see all the Element objects of the document and ask each one what it is. If the object is some-
thing you are interested in working with, you can get a closer look.
To get the iterator for a document, just ask like this:
ElementIterator iterator = new ElementIterator(htmlDoc);
The ElementIterator is not meant to be a simple sequential iterator. It is bidirectional with
next() and previous() methods and supports going back to the beginning with first() . Although
next() and previous() return the next or previous element to work with, you can also get the
element at the current position by using current() . Here is the basic looping method through
a document:
Element element;
ElementIterator iterator = new ElementIterator(htmlDoc);
while ((element = iterator.next()) != null) {
// Process element
}
How do you find out which element you have in case you want to ignore it if it isn't inter-
esting? You need to get its name and type from its attribute set.
AttributeSet attributes = element.getAttributes();
Object name = attributes.getAttribute(StyleConstants.NameAttribute);
if (name instanceof HTML.Tag) {
Now you can look for specific tag types, such as HTML.Tag.H1 , HTML.Tag.H2 , and so on. The
actual content for the tag would be in a child element of the element. To demonstrate, the
following shows how to search for H1 , H2 , and H3 tags in a document, while displaying the
appropriate titles associated with the tags.
if ((name instanceof HTML.Tag) && ((name == HTML.Tag.H1) ||
(name == HTML.Tag.H2) || (name == HTML.Tag.H3))) {
// Build up content text as it may be within multiple elements
StringBuffer text = new StringBuffer();
int count = element.getElementCount();
for (int i=0; i<count; i++) {
Element child = element.getElement(i);
AttributeSet childAttributes = child.getAttributes();
if (childAttributes.getAttribute(StyleConstants.NameAttribute) ==
HTML.Tag.CONTENT) {
int startOffset = child.getStartOffset();
int endOffset = child.getEndOffset();
int length = endOffset - startOffset;
text.append(htmlDoc.getText(startOffset, length));
}
}
}
Search WWH ::




Custom Search