Java Reference
In-Depth Information
Parsing HTML
The ParseHTML class does HTML parsing. This class is used by all of the recipes
in this chapter. Additionally, many recipes through the remainder of the topic will use the
ParseHTML class. I will begin by showing you how to use the ParseHTML class. A later
section will show you how the ParseHTML class was implemented.
Using ParseHTML
It is very easy to use the ParseHTML class. The following code fragment demon-
strates how to make use of the ParseHTML class.
InputStream is = url.openStream();
ParseHTML parse = new ParseHTML(is);
int ch;
while ((ch = parse.read()) != -1)
{
if (ch == 0)
{
HTMLTag tag = parse.getTag();
System.out.println("Read HTML tag: " + tag);
}
else
{
System.out.println("Read HTML text character: "
+ ((char)ch) );
}
}
As you can see from the above code an InputStream is acquired from a URL. This
InputStream is used to construct a ParseHTML object. The ParseHTML class can
parse HTML from any InputStream object.
Next the code enters a loop calling parse.read() . Once parse.read()
returns a negative one value, there is nothing more to parse, and the program ends. If
parse.read() returns a zero, then an HTML tag was encountered. You can call
parse.getTag() to determine which tag was encountered.
If neither a negative one or zero is returned, then a regular character has been found in
the HTML. This process continues until there is nothing else to read from the HTML file.
This is only a basic example of using ParseHTML . The recipes for this chapter will expand
on this greatly.
Search WWH ::




Custom Search