Java Reference
In-Depth Information
{
this.active = true;
} else
{
if ((this.userAgent != null) &&
rest.equalsIgnoreCase(this.userAgent))
{
this.active = true;
}
}
}
if (this.active)
{
if (command.equalsIgnoreCase("disallow"))
{
if (rest.trim().length() > 0)
{
URL url = new URL(this.robotURL, rest);
add(url.getFile());
}
}
}
}
}
The RobotsFilter class defines four instance variables, which are listed here:
• robotURL
• exclude
• active
• userAgent
The robotURL variable holds the URL to the robots.txt file that was most re-
cently received. Each time the newHost method is called, a new robotURL variable is
constructed by concatenating the string “robots.txt” to the host name.
The exclude variable contains a list of the URLs that are to be excluded. This list is
built each time a new host is encountered. The exclude list must be cleared for each new
host.
The active variable keeps track of whether or not the loading process is actively
tracking Disallow lines. The loader becomes active when a User-agent line matches the
user agent string being used by the spider.
The userAgent variable holds the user agent string that the spider is using. This
variable is passed into the newHost method.
Search WWH ::




Custom Search