Java Reference
In-Depth Information
{
if (ch == 0)
{
HTMLTag tag = parse.getTag();
if (tag.getName().equalsIgnoreCase("a"))
{
value = tag.getAttributeValue("href");
URL u = new URL(url, value.toString());
value = u.toString();
processSubPage(u);
}
}
}
}
/**
* The main method, create a new instance of the object and call
* process.
* @param args not used.
*/
public static void main(String args[])
{
try
{
URL u = new URL(
"http://www.httprecipes.com/1/6/subpage.php");
ExtractSubPage parse = new ExtractSubPage();
parse.process(u);
} catch (Exception e)
{
e.printStackTrace();
}
}
}
There are two tasks performed by this recipe. First, a list of the sub-pages must be ob-
tained from the main page. Secondly, each sub-page must be loaded, and its data extracted.
Obtaining the List of Sub-Pages
The process method of the ExtractSubPage class obtains a list of all sub-pag-
es and passes each sub-page to the processSubPage method. This method begins by
opening an InputStream to the URL that contains the list of hyperlinks. A ParseHTML
object is created to parse this InputStream .
String value = "";
InputStream is = url.openStream();
Search WWH ::




Custom Search