Java Reference
In-Depth Information
}
}
/**
* The main method, create a new instance of the object and call
* process.
* @param args not used.
*/
public static void main(String args[])
{
try
{
URL u = new URL("http://www.httprecipes.com/1/6/list.php");
ParseList parse = new ParseList();
parse.process(u, "ul", 1);
} catch (Exception e)
{
e.printStackTrace();
}
}
}
The process method of the ParseList class extracts the data from the list. This
method begins by creating several variables that will be needed to parse the list. The type of
list must be passed in, because there are several list types in HTML, such as <ul> , <ol> ,
etc. Because of this, the variable listTypeEnd is created to contain the ending tag. For
example, an <ol> list would end with a </ol> tag.
The capture variable keeps track of if we are capturing the “non-tag” text or not. This
variable will be enabled when we reach a <li> tag, which means we need to start capturing
the text of the current item.
String listTypeEnd = listType + "/";
InputStream is = url.openStream();
ParseHTML parse = new ParseHTML(is);
StringBuilder buffer = new StringBuilder();
boolean capture = false;
The advance method will take us to the correct list in the HTML page. The advance
method is discussed in Recipe 6.1.
advance(parse, listType, optionList);
Next we begin reading the HTML tags. We continue until the end of the file is reached.
int ch;
while ((ch = parse.read()) != -1)
{
if (ch == 0)
Search WWH ::




Custom Search