Java Reference
In-Depth Information
This method is called to download images and other binary objects. Anything that is not
HTML is downloaded by this method. HTML is handled differently because HTML contains
links to other pages. This method begins by creating a buffer to read the binary data.
byte[] buffer = new byte[1024];
int length;
Next, a filename is created. The filename uses the
convertFilename
function to
convert the URL into a file that can be saved to the local computer. The
convertFilename
function also creates the directory structure to hold the specified file.
String filename = URLUtility.convertFilename(this.path, url,
true);
Next, the data is read in. It is read using the
buffer
variable that was created earlier.
try {
OutputStream os = new FileOutputStream(filename);
do {
length = stream.read(buffer);
if (length != -1) {
os.write(buffer, 0, length);
}
} while (length != -1);
Once the data has been read, the output stream can be closed.
os.close();
If any exceptions are caught, they are displayed to the user.
} catch (FileNotFoundException e) {
e.printStackTrace();
}
This recipe also has to handle HTML data. If a URL has HTML data, then the second
form of the
spiderProcessURL
method is used.
public void spiderProcessURL(URL url, SpiderParseHTML parse)
First, a filename is generated, just as was done for the binary URL. An
OutputStream
is opened to write the file to.
String filename =
URLUtility.convertFilename(this.path, url, true);
OutputStream os = new FileOutputStream(filename);
The
OutputStream
is then attached to the
ParseHTML
object, so that any data
ready from the HTML stream is also written to the
OutputStream
. This saves the HTML
file to the local computer.
parse.getStream().setOutputStream(os);