Java Reference
In-Depth Information
This simple script does the following: it connects to the Wikipedia page using jsoup . Apart
from being an excellent HTML parser, jsoup also provides user‐friendly methods to per-
form HTTP requests, so you don't have to build your own makeRequest method around
HttpURLConnection . Next, it fetches the HTML table elements with the wikitable class,
searches until it finds the right table, and then loops through the tr and td elements to extract
the contents of the tables.
If you're wondering how to know which element to fetch from the HTML structure, most browsers
provide a way to inspect the source of each web page you view. See Figure 10-29.
figure 10-29  
To retrieve elements from the HTML tree, jsoup applies a selector method. Basic selectors include:
tagname : Finds elements by the tag name, e.g., table
#id : Finds elements based on ID, e.g., #main-table
.class : Finds elements based on class name, e.g., .wikitable
Search WWH ::




Custom Search