EXTRACTING DATA - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

HTMLTag tag = parse.getTag();

if (tag.getName().equalsIgnoreCase("a"))

{

value = tag.getAttributeValue("href");

URL u = new URL(url, value.toString());

value = u.toString();

buffer.setLength(0);

When the </a> tag is found, the tag's text and href value are both displayed.

} else if (tag.getName().equalsIgnoreCase("/a"))

{

processOption(buffer.toString(), value);

}

If we found a regular character, and not an HTML tag, then add it to the buffer .

} else

{

buffer.append((char) ch);

}

This loop continues until all links in the file have been processed.

Recipe #6.5: Extracting Images from HTML

Images are very common on web sites. We have already seen how an image can be down-

loaded as a binary file. We can also create a bot that examines the <img> tags on a site and

then downloads the images that it finds. This recipe will extract all of the images from the

following URL.

http://www.httprecipes.com/1/6/image.php

You can see this image list in Figure 6.5.

Search WWH ::

Custom Search

Home