HANDLING SESSIONS AND COOKIES - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

The output from the FormUtility object is posted to the form. A ParseHTML

object is setup to parse the search results.

// Perform the post.

os.write(bos.toByteArray());

// Read the results.

InputStream is = http.getInputStream();

ParseHTML parse = new ParseHTML(is);

Now the HTML will be parsed. Begin looping through, reading each character. When an

HTML tag is located, examine that HTML tag to see what it is.

advance(parse, listType, 0);

int ch;

while ((ch = parse.read()) != -1)

{

if (ch == 0)

{

HTMLTag tag = parse.getTag();

If the tag is an <li> tag, then we have found one of the result items. If there was already

data in the buffer, then process it as a valid state or capital.

if (tag.getName().equalsIgnoreCase("li"))

{

if (buffer.length() > 0)

result.add(buffer.toString());

buffer.setLength(0);

capture = true;

Many web sites do not include ending </li> items; however, if they are present, then

stop capturing text. Process any already captured text as a valid state or capitol.

} else if (tag.getName().equalsIgnoreCase("/li"))

{

result.add(buffer.toString());

buffer.setLength(0);

capture = false;

If we have reached the end of the list, then there is no more data to parse.

} else if (tag.getName().equalsIgnoreCase(listTypeEnd))

{

result.add(buffer.toString());

break;

}

Search WWH ::

Custom Search

Home