Java Reference
In-Depth Information
HTMLTag tag = parse.getTag();
if (tag.getName().equalsIgnoreCase("a"))
value = tag.getAttributeValue("href");
URL u = new URL(url, value.toString());
value = u.toString();
When the </a> tag is found, the tag's text and href value are both displayed.
} else if (tag.getName().equalsIgnoreCase("/a"))
processOption(buffer.toString(), value);
If we found a regular character, and not an HTML tag, then add it to the buffer .
} else
buffer.append((char) ch);
This loop continues until all links in the file have been processed.
Recipe #6.5: Extracting Images from HTML
Images are very common on web sites. We have already seen how an image can be down-
loaded as a binary file. We can also create a bot that examines the <img> tags on a site and
then downloads the images that it finds. This recipe will extract all of the images from the
following URL.
You can see this image list in Figure 6.5.
Search WWH ::

Custom Search