Java Reference
In-Depth Information
Parsing the Choice List
We are going to extract the state abbreviation, as well as the state name. The
process
method is used to process the list. This method begins by defining several variables that will
be needed to parse the choice list. An
InputStream
is opened to the URL that is being
parsed, and a new
ParseHTML
object is constructed.
String value = "";
InputStream is = url.openStream();
ParseHTML parse = new ParseHTML(is);
StringBuilder buffer = new StringBuilder();
There may be more than one choice list on the page that we are parsing. Each choice
list will be surrounded by a beginning
<select>
tag, and an ending
</select>
tag.
If there is more than one
<select>
list, then we must advance to the correct one. This is
what the
advance
function does.
The
advance
function takes three parameters. The first is the parse object that is be-
ing used to parse the HTML. This object will be advanced to the correct location. The second
parameter is the name of the tag that we are advancing to. In this case we are advancing to
a “select” tag. Finally, the third parameter tells the
advance
function which instance of
the second parameter to look for. Zero specifies the first instance; one specifies the second
instance, and so on.
advance(parse, "select", optionList);
Once we have advanced to the correct location it is time to begin parsing for
<option>
tags. We begin with a
while
loop that begins reading data from the
parse
object. As soon
as the
read
function returns a zero, we know that we have found an HTML tag.
int ch;
while ((ch = parse.read()) != -1)
{
if (ch == 0)
{
HTMLTag tag = parse.getTag();
First, we check to see if it is an opening
<option>
tag. If it is, then we read the
value
attribute. This attribute will hold the abbreviation for that state.
if (tag.getName().equalsIgnoreCase("option"))
{
value = tag.getAttributeValue("value");
buffer.setLength(0);
Next we check to see if the tag encountered is an ending
</option>
tag. If it is, then
we have found one state. The
processOption
method is called to display that state as
part of the comma separated list, which is the output from this recipe.
} else if (tag.getName().equalsIgnoreCase("/option"))