Database Reference
In-Depth Information
How to do it…
Now, let's see how to load this ile into an Incanter dataset:
1.
The solution for this recipe is a little more complicated, so we'll wrap it into a function:
(defn load-xml-data [xml-file first-data next-data]
(let [data-map (fn [node]
[(:tag node) (first (:content node))])]
(->>
(xml/parse xml-file)
zip/xml-zip
first-data
(iterate next-data)
(take-while #(not (nil? %))
(map zip/children)
(map #(mapcat data-map %))
(map #(apply array-map %))
i/to-dataset)))
2.
We can call the function like this. Because there are so many columns, we'll just
verify the data that is loaded by looking at the column names and the row count:
user=> (def d
(load-xml-data "data/crime_incidents_2013_plain.xml"
zip/down zip/right))
user=> (i/col-names d)
[:dcst:ccn :dcst:reportdatetime :dcst:shift :dcst:offense
:dcst:method :dcst:lastmodifieddate :dcst:blocksiteaddress
:dcst:blockxcoord :dcst:blockycoord :dcst:ward :dcst:anc
:dcst:district :dcst:psa :dcst:neighborhoodcluster :dcst:busi
nessimprovementdistrict :dcst:block_group :dcst:census_tract
:dcst:voting_precinct :dcst:start_date :dcst:end_date]
user=> (i/nrow d)
35826
This looks good. This gives you the number of crimes reported in the dataset.
How it works…
This recipe follows a typical pipeline for working with XML:
1. Parsing an XML data ile
2.
Extracting the data nodes
3.
Converting the data nodes into a sequence of maps representing the data
4.
Converting the data into an Incanter dataset
 
Search WWH ::




Custom Search