Database Reference
In-Depth Information
We can do this in Clojure because of its macro system. ->> simply rewrites the calls into
Clojure's native, nested format as the form is read. The irst parameter of the macro is
inserted into the next expression as the last parameter. This structure is inserted into the third
expression as the last parameter, and so on, until the end of the form. Let's trace this through
a few steps. Say, we start off with the expression (->> x first (map length) (apply
+)) . As Clojure builds the inal expression, here's each intermediate step (the elements to be
combined are highlighted at each stage):
1. (->> x first (map length) (apply +))
2. (->> (first x) (map length) (apply +))
3. (->> (map length (first x)) (apply +) )
4. (apply + (map length (first x)))
Comparing XML and JSON
XML and JSON (from the Reading JSON data into Incanter datasets recipe) are very similar.
Arguably, much of the popularity of JSON is driven by disillusionment with XML's verboseness.
When we're dealing with these formats in Clojure, the biggest difference is that JSON is
converted directly to native Clojure data structures that mirror the data, such as maps and
vectors Meanwhile, XML is read into record types that relect the structure of XML, not the
structure of the data.
In other words, the keys of the maps for JSON will come from the domains, first_name or
age , for instance. However, the keys of the maps for XML will come from the data format, such
as tag, attribute, or children, and the tag and attribute names will come from the domain.
This extra level of abstraction makes XML more unwieldy.
Scraping data from tables in web pages
There's data everywhere on the Internet. Unfortunately, a lot of it is dificult to reach. It's
buried in tables, articles, or deeply nested div tags. Web scraping (writing a program that
walks over a web page and extracts data from it) is brittle and laborious, but it's often the only
way to free this data so it can be used in our analyses. This recipe describes how to load a
web page and dig down into its contents so that you can pull the data out.
To do this, we're going to use the Enlive ( https://github.com/cgrand/enlive/wiki )
library. This uses a domain speciic language (DSL, a set of commands that make a small set
of tasks very easy and natural) based on CSS selectors to locate elements within a web page.
This library can also be used for templating. In this case, we'll just use it to get data back out
of a web page.
 
Search WWH ::




Custom Search