Java Reference
In-Depth Information
</tr>
<tr>
<td>Alabama</td>
<td>AL</td>
<td>Montgomery</td>
<td>
<a href="http://www.alabama.gov/">http://www.alabama.gov/
</a></td>
</tr>
<tr>
<td>Alaska</td>
<td>AK</td>
<td>Juneau</td>
<td>
<a href="http://www.state.ak.us/">http://www.state.ak.us/
</a></td>
</tr>
...
<tr>
<td>Wyoming</td>
<td>WY</td>
<td>Cheyenne</td>
<td><a href="http://wyoming.gov/">http://wyoming.gov/</a></td>
</tr>
</table>
The data that we will parse is located between the <td> and </td> tags. However, the
other tags tell us which row the data belongs to.
Parsing the Table
The table is parsed by the process method of the ParseTable class. This method
begins by opening an InputStream to the URL that contains the table. A ParseHTML
object is created to parse this InputStream . A variable named buffer is created to
hold the data for each table cell. A variable named list is created to hold each column
of data for a row. A variable named capture is used to keep track of if we are capturing
HTML text into the buffer variable or not. Capturing will occur when we are between <td>
and </td> tags.
InputStream is = url.openStream();
ParseHTML parse = new ParseHTML(is);
StringBuilder buffer = new StringBuilder();
List<String> list = new ArrayList<String>();
boolean capture = false;
The advance method will take us to the correct table in the HTML page. The advance
method is discussed in Recipe 6.1.
Search WWH ::




Custom Search