USING JAVASCRIPT - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

... article continues here ...

</p>

To extract the article text, we must find the HTML tags that completely enclose the ar-

ticle. The article begins with a <center> tag. The article also ends with a <center>

tag (not a </center> tag). This is because the ending <center> tag is actually used to

center the automatic choice list; and the automatic choice list occurs at the end of the article

text.

If you are extracting data from other web sites, you will need to find the bounding tags

for that article. It may even be a series of tags, for example </p></center> could be the

ending tag for a page. It all depends on the data you are reading.

To do this, the extractNoCase function is used.

final String token = "<center>";

String contents = downloadPage(url);

String result = extractNoCase(contents,token,token,0);

return token+result;

Once the page has been read, the article text is returned.

Recipe #9.2: JavaScript Includes

“JavaScript includes” are another common use of JavaScript. They allow an HTML docu-

ment to include JavaScript from another HTML document. This recipe will demonstrate how

to read an HTML document that uses “JavaScript includes”. This recipe will output a “com-

pound document” that will replace the JavaScript include statements with the text that is

contained in these included JavaScript documents.

This recipe will read the HTML text located at the following URL:

http://www.httprecipes.com/1/9/includes.php

This recipe is shown in Listing 9.2.

Listing 9.2: JavaScript Includes (Includes.java)

package com.heatonresearch.httprecipes.ch9.recipe2;

import java.io.*;

import java.net.*;

import com.heatonresearch.httprecipes.html.*;

Search WWH ::

Custom Search

Home