Java Reference
In-Depth Information
This simple script does the following: it connects to the Wikipedia page using
jsoup
. Apart
from being an excellent HTML parser,
jsoup
also provides user‐friendly methods to per-
form HTTP requests, so you don't have to build your own
makeRequest
method around
HttpURLConnection
. Next, it fetches the HTML
table
elements with the
wikitable
class,
searches until it finds the right table, and then loops through the
tr
and
td
elements to extract
the contents of the tables.
If you're wondering how to know which element to fetch from the HTML structure, most browsers
provide a way to inspect the source of each web page you view. See Figure 10-29.
figure 10-29
To retrieve elements from the HTML tree,
jsoup
applies a selector method. Basic selectors include:
➤
tagname
: Finds elements by the tag name, e.g.,
table
➤
#id
: Finds elements based on ID, e.g.,
#main-table
➤
.class
: Finds elements based on class name, e.g.,
.wikitable
Search WWH ::
Custom Search