Java Reference
In-Depth Information
loadLine(str);
}
Once complete, the streams are closed. The
close
commands are placed inside of a
finally
block to ensure that they are executed, even if an exception is thrown while read-
ing from the
robots.txt
file.
} finally {
r.close();
isr.close();
}
Each line read by the
newHost
method is passed on to the
loadLine
method. This
method will be discussed in the next section.
Loading a Line from the Robots.txt File
The
loadLine
method interprets each of the lines contained in the
robots.txt
file. The
loadLine
method begins by trimming the line passed in and searching for the
first occurrence of a colon (:). As you will recall from earlier in the chapter, lines in the
robots.txt
file consist of a command and a value, separated by a colon.
str = str.trim();
int i = str.indexOf(':');
If the colon is not found, the line starts with a pound sign (#) or the line is empty, then
the method returns. As you will recall from earlier in the chapter, a pound sign signifies a
comment line.
if ((str.length() == 0) || (str.charAt(0) == '#') || (i == -1)) {
return;
}
Next, the line is parsed into a command, and another variable named
rest
. The
rest
variable contains whatever text occurred to the right of the colon.
String command = str.substring(0, i);
String rest = str.substring(i + 1).trim();
First, we check to see if this is a
User-agent
command.
if (command.equalsIgnoreCase("User-agent")) {
this.active = false;
If an asterisk (*) is specified as the user agent, then the program becomes “active” im-
mediately, since this applies to all bots. Being “active” implies that the program will begin
tracking
Disallow
commands.
if (rest.equals("*")) {
this.active = true;
} else {