Java Reference
In-Depth Information
loadLine(str);
}
Once complete, the streams are closed. The close commands are placed inside of a
finally block to ensure that they are executed, even if an exception is thrown while read-
ing from the robots.txt file.
} finally {
r.close();
isr.close();
}
Each line read by the newHost method is passed on to the loadLine method. This
method will be discussed in the next section.
Loading a Line from the Robots.txt File
The loadLine method interprets each of the lines contained in the robots.txt
file. The loadLine method begins by trimming the line passed in and searching for the
first occurrence of a colon (:). As you will recall from earlier in the chapter, lines in the
robots.txt file consist of a command and a value, separated by a colon.
str = str.trim();
int i = str.indexOf(':');
If the colon is not found, the line starts with a pound sign (#) or the line is empty, then
the method returns. As you will recall from earlier in the chapter, a pound sign signifies a
comment line.
if ((str.length() == 0) || (str.charAt(0) == '#') || (i == -1)) {
return;
}
Next, the line is parsed into a command, and another variable named rest . The rest
variable contains whatever text occurred to the right of the colon.
String command = str.substring(0, i);
String rest = str.substring(i + 1).trim();
First, we check to see if this is a User-agent command.
if (command.equalsIgnoreCase("User-agent")) {
this.active = false;
If an asterisk (*) is specified as the user agent, then the program becomes “active” im-
mediately, since this applies to all bots. Being “active” implies that the program will begin
tracking Disallow commands.
if (rest.equals("*")) {
this.active = true;
} else {
Search WWH ::




Custom Search