Database Reference
In-Depth Information
How it works…
After examining the web page, each family is wrapped in an
article
tag that contains a
header with an
h2
tag.
get-family
pulls that tag out and returns its text.
get-person
processes each person. The people in each family are in an unordered list
(
ul
), and each person is in an
li
tag. The person's name itself is in an
em
tag.
let
gets the
contents of the
li
tag and decomposes it in order to pull out the name and relationship
strings.
get-person
puts both pieces of information into a map and returns it.
get-rows
processes each
article
tag. It calls
get-family
to get that information from
the header, gets the list item for each person, calls
get-person
on the list item, and adds
the family to each person's mapping.
Here's how the HTML structures correspond to the functions that process them. Each function
name is mentioned beside the elements it parses:
Finally,
load-data
ties the process together by downloading and parsing the HTML ile and
pulling the
article
tags from it. It then calls
get-rows
to create the data mappings and
converts the output to a dataset.