Java Reference
In-Depth Information
Spider HTML Parsing
The Heaton Research Spider provides a SpiderHTMLParse object to the
spiderProcessURL method of a SpiderReportable object. This object allows
the HTML found by the spider to be parsed. However, it also allows the spider to extract links
from the HTML. The SpiderHTMLParse class is shown in Listing 14.4.
Listing 14.4: HTML Parsing (SpiderHTMLParse.java)
package com.heatonresearch.httprecipes.spider;
import java.io.*;
import java.net.*;
import java.util.logging.*;
import com.heatonresearch.httprecipes.html.*;
import com.heatonresearch.httprecipes.spider.workload.*;
public class SpiderParseHTML extends ParseHTML {
/**
* The logger.
*/
private static Logger logger = Logger
.getLogger("com.heatonresearch.httprecipes.spider.Spider-
ParseHTML");
/**
* The Spider that this page is being parsed for.
*/
private Spider spider;
/**
* The URL that is being parsed.
*/
private URL base;
/**
* The depth of the page being parsed.
*/
private int depth;
/**
* The InputStream that is being parsed.
*/
private SpiderInputStream stream;
Search WWH ::




Custom Search