EXTRACTING DATA - HTTP Programming Recipes for Java Bots

Java Reference

In-Depth Information

} else

{

buffer.append((char) ch);

}

/**

* The main method, create a new instance of the object and call

* process.

* @param args not used.

*/

public static void main(String args[])

{

try

{

URL u = new URL("http://www.httprecipes.com/1/6/link.php");

ExtractLinks parse = new ExtractLinks();

parse.process(u, 1);

} catch (Exception e)

{

e.printStackTrace();

}

The process method of ExtractLinks is called to process the hyperlinks. The

method begins by creating a few variables that are needed to process the links. This method

begins by opening an InputStream to the URL that contains the table. A ParseHTML

object is created to parse this InputStream . A variable named buffer is created to

hold the data for each link.

String value = "";

InputStream is = url.openStream();

ParseHTML parse = new ParseHTML(is);

StringBuilder buffer = new StringBuilder();

The method loops across every tag and text character in the HTML file.

int ch;

while ((ch = parse.read()) != -1)

{

When an HTML tag is found it is checked to see if it is an <a> tag. If the tag is an anchor

then the href attribute is saved to the value variable. Additionally, the buffer variable

is cleared.

if (ch == 0)

{

Search WWH ::

Custom Search

Home