USING A SPIDER - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

e.printStackTrace();

}

/**

* Called when the spider is ready to process an HTML

* URL. Download the contents of the URL to a local file.

*

* @param url

* The URL that the spider is about to process.

* @param parse

* An object that will allow you you to parse the

* HTML on this page.

* @throws IOException

* Thrown if an IO error occurs while processing

* the page.

*/

public void spiderProcessURL(URL url, SpiderParseHTML parse)

throws IOException {

String filename =

URLUtility.convertFilename(this.path, url, true);

OutputStream os = new FileOutputStream(filename);

parse.getStream().setOutputStream(os);

parse.readAll();

os.close();

}

/**

* Called when the spider tries to process a URL but gets

* an error. This method is not used in tries manager.

*

* @param url

* The URL that generated an error.

*/

public void spiderURLError(URL url) {

}

Much of Recipe 13.2's SpiderReportable implementation is similar to Recipe 13.1.

However, unlike 13.1, Recipe 13.2 will actually download what it finds. This downloading func-

tionality is implemented in the two overloaded instances of the spiderProcessURL meth-

ods. The first spiderProcessURL method is designed to take an InputStream .

public void spiderProcessURL(URL url, InputStream stream)

Search WWH ::

Custom Search

Home