Database Reference
In-Depth Information
<link rel="edit" title="Edit this page" href="/w/index.php?title=List_of_countr
<link rel="apple-touch-icon" href="//bits.wikimedia.org/apple-touch/wikipedia.p
<link rel="shortcut icon" href="//bits.wikimedia.org/favicon/wikipedia.ico" />
<link rel="search" type="application/opensearchdescription+xml" href="/w/opense
That seems to be in order. (Note that we're only showing the first 79 characters of
each line so that output fits on the page.)
Using the developer tools of our browser, we were able to determine that the root
HTML element that we're interested in is a <table> with the class wikitable . This
allows us to look at the part that we're interested in using grep (the -A option below
specifies the number of lines we want to see after the matching line):
$ < wiki.html grep wikitable -A 21
<table class="wikitable sortable">
<tr>
<th>Rank</th>
<th>Country or territory</th>
<th>Total length of land borders (km)</th>
<th>Total surface area (km²)</th>
<th>Border/area ratio (km/km²)</th>
</tr>
<tr>
<td>1</td>
<td>Vatican City</td>
<td>3.2</td>
<td>0.44</td>
<td>7.2727273</td>
</tr>
<tr>
<td>2</td>
<td>Monaco</td>
<td>4.4</td>
<td>2</td>
<td>2.2000000</td>
</tr>
The next step is to extract the necessary elements from the HTML file. For this we use
the scrape tool:
$ < wiki.html scrape -b -e 'table.wikitable > tr:not(:first-child)' \
> > table.html
$ head -n 21 data/table.html
<!DOCTYPE html>
<html>
<body>
<tr><td>1</td>
<td>Vatican City</td>
<td>3.2</td>
<td>0.44</td>
<td>7.2727273</td>
</tr>
Search WWH ::




Custom Search