Java Reference
In-Depth Information
... article continues here ...
</p>
<center><select onchange="menuLink(this)">
To extract the article text, we must find the HTML tags that completely enclose the ar-
ticle. The article begins with a <center> tag. The article also ends with a <center>
tag (not a </center> tag). This is because the ending <center> tag is actually used to
center the automatic choice list; and the automatic choice list occurs at the end of the article
text.
If you are extracting data from other web sites, you will need to find the bounding tags
for that article. It may even be a series of tags, for example </p></center> could be the
ending tag for a page. It all depends on the data you are reading.
To do this, the extractNoCase function is used.
final String token = "<center>";
String contents = downloadPage(url);
String result = extractNoCase(contents,token,token,0);
return token+result;
Once the page has been read, the article text is returned.
Recipe #9.2: JavaScript Includes
“JavaScript includes” are another common use of JavaScript. They allow an HTML docu-
ment to include JavaScript from another HTML document. This recipe will demonstrate how
to read an HTML document that uses “JavaScript includes”. This recipe will output a “com-
pound document” that will replace the JavaScript include statements with the text that is
contained in these included JavaScript documents.
This recipe will read the HTML text located at the following URL:
http://www.httprecipes.com/1/9/includes.php
This recipe is shown in Listing 9.2.
Listing 9.2: JavaScript Includes (Includes.java)
package com.heatonresearch.httprecipes.ch9.recipe2;
import java.io.*;
import java.net.*;
import com.heatonresearch.httprecipes.html.*;
Search WWH ::




Custom Search