Databases Reference
In-Depth Information
that as an estimator for shade. So the next step is to use a regular expression to parse
the tree properties, such as address and species, from the
misc
field:
(
defn
re-seq-chunks
[
pattern
s
]
(
rest
(
first
(
re-seq
pattern
s
))))
(
def
parse-tree
"parses the special fields in the tree format"
(
partial
re-seq-chunks
#
"^\s+Private\:\s+(\S+)\s+Tree ID\:\s+(\d+)\s+.*Situs
Number\:\s+(\d+)\s+Tree Site\:\s+(\d+)\s+Species\:\s+(\S.*\S)\s+Source.*"
))
Great, now we begin to have some structured data about trees:
Identifier: 474
Tree ID: 412
Tree: 412 site 1 at 115 HAWTHORNE AV
Tree Site: 1
Street_Name: HAWTHORNE AV
Situs Number: 115
Private: -1
Species: Liquidambar styraciflua
Source: davey tree
Hardscape: None
We can use the species name to join with a table of tree species metadata and look up
average height, along with inferring other valuable data. Take a look in the
data/
meta_tree.tsv
file to see the metadata about trees, which was derived from Wikipe‐
dia.org, Calflora.org, USDA.gov, etc. The species
liquidambar styraciflua
, commonly
known as an American sweetgum, grows to a height that ranges between 20 and 35
meters.
The next section of code completes our definition of a data product about trees. The
geo-tree
function parses the geo coordinates: latitude, longitude, and altitude. The
trees-fields
function defines the fields used to describe trees throughout the app;
other fields get discarded. The
get-trees
function is the subquery used to filter, merge,
and refine the estimators about trees.
(
def
geo-tree
"parses geolocation for tree format"
(
partial
re-seq-chunks
#
"^(\S+),(\S+),(\S+)\s*$"
))
(
def
trees-fields
[
"?blurb"
"?tree_id"
"?situs"
"?tree_site"
"?species"
"?wikipedia"
"?calflora"
"?avg_height"
"?tree_lat"
"?tree_lng"
"?tree_alt"
"?geohash"
])
(
defn
get-trees
[
src
tree-meta
trap
]
"subquery to parse/filter the tree data"
(
<-
trees-fields
(
src
?blurb
?misc
?geo
?kind
)