Java Reference
In-Depth Information
String href = tag.getAttributeValue("href");
handleA(href);
} else if (tag.getName().equalsIgnoreCase("img")) {
String src = tag.getAttributeValue("src");
addURL(src, SpiderReportable.URLType.IMAGE);
} else if (tag.getName().equalsIgnoreCase("style")) {
String src = tag.getAttributeValue("src");
addURL(src, SpiderReportable.URLType.STYLE);
} else if (tag.getName().equalsIgnoreCase("link")) {
String href = tag.getAttributeValue("href");
addURL(href, SpiderReportable.URLType.SCRIPT);
} else if (tag.getName().equalsIgnoreCase("base")) {
String href = tag.getAttributeValue("href");
this.base = new URL(this.base, href);
}
}
return result;
For most tag types, the addURL method will be called. However, the anchor tag is
handled differently with a call to the handleA method.
Adding a URL
The addURL method is called to add a URL. It begins by rejecting any null URLs.
if (u == null) {
return;
}
First, the URL is converted to the fully qualified form. For example, if the href
of “images/me.gif” were found on the page, then the fully qualified URL would be
http://www.httprecipes.com/1/images/me.gif .
try {
URL url = URLUtility.constructURL(this.base, u, true);
url = this.spider.getWorkloadManager().convertURL(url.to-
String());
Next, the protocol is checked. If the URL's protocol is anything other than http or
https , the URL is ignored.
if (url.getProtocol().equalsIgnoreCase("http")
|| url.getProtocol().equalsIgnoreCase("https")) {
The spiderFoundURL function is then called to determine if the URL should be
added. If the URL should be added, then the spider's addURL method is called.
if (this.spider.getReport().spiderFoundURL(url, this.base, type))
{
try {
Search WWH ::




Custom Search