Java Reference
In-Depth Information
{
HTMLTag tag = parse.getTag();
If we find an <li> tag, then we clear the buffer and begin capturing. If there was
data already in the buffer , then we record that item, as it will be one of the fifty states.
if (tag.getName().equalsIgnoreCase("li"))
{
if (buffer.length() > 0)
processItem(buffer.toString());
buffer.setLength(0);
capture = true;
If we find an ending </li> tag then we clear the buffer and prepare for the next
tag. Many times the ending </li> tag is not used, and as a result this recipe does not re-
quire the ending </li> tag to be present. To support not having an ending </li> tag we
first check to see if there is already a tag in the buffer, when we reach the next <li> tag.
} else if (tag.getName().equalsIgnoreCase("/li"))
{
System.out.println(buffer.toString());
processItem(buffer.toString());
buffer.setLength(0);
capture = false;
If we find the ending tag type, then we are done.
} else if (tag.getName().equalsIgnoreCase(listTypeEnd))
{
break;
}
If we found a regular character, and not an HTML tag, then add it to the buffer, if we are
currently capturing characters.
} else
{
if (capture)
buffer.append((char) ch);
}
When the loop completes we will have parsed all fifty states from the HTML list.
Recipe #6.3: Extracting Data from a Table
Many websites contains tables. These tables allow the website to arrange data by rows
and columns. This recipe will extract data from the table, at the following URL:
http://www.httprecipes.com/1/6/table.php
Search WWH ::




Custom Search