EXTRACTING DATA - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

{

HTMLTag tag = parse.getTag();

If we find an <li> tag, then we clear the buffer and begin capturing. If there was

data already in the buffer , then we record that item, as it will be one of the fifty states.

if (tag.getName().equalsIgnoreCase("li"))

{

if (buffer.length() > 0)

processItem(buffer.toString());

buffer.setLength(0);

capture = true;

If we find an ending </li> tag then we clear the buffer and prepare for the next

tag. Many times the ending </li> tag is not used, and as a result this recipe does not re-

quire the ending </li> tag to be present. To support not having an ending </li> tag we

first check to see if there is already a tag in the buffer, when we reach the next <li> tag.

} else if (tag.getName().equalsIgnoreCase("/li"))

{

System.out.println(buffer.toString());

processItem(buffer.toString());

buffer.setLength(0);

capture = false;

If we find the ending tag type, then we are done.

} else if (tag.getName().equalsIgnoreCase(listTypeEnd))

{

break;

}

If we found a regular character, and not an HTML tag, then add it to the buffer, if we are

currently capturing characters.

} else

{

if (capture)

buffer.append((char) ch);

}

When the loop completes we will have parsed all fifty states from the HTML list.

Recipe #6.3: Extracting Data from a Table

Many websites contains tables. These tables allow the website to arrange data by rows

and columns. This recipe will extract data from the table, at the following URL:

http://www.httprecipes.com/1/6/table.php

Search WWH ::

Custom Search

Home