Importing Data for Analysis - Clojure Data Analysis

Database Reference

In-Depth Information

Finally, convert everything to a dataset. incanter.core/dataset is a lower level

constructor than incanter.core/to-dataset . It requires you to pass in the column

names and data matrix as separate sequences:

(i/dataset headers rows)))

It's important to realize that the code, as presented here, is the result of a lot of trial and error.

Screen scraping usually is. Generally, I download the page and save it, so I don't have to keep

requesting it from the web server. Next, I start the REPL and parse the web page there. Then,

I can take a look at the web page and HTML with the browser's view source function, and I can

examine the data from the web page interactively in the REPL. While working, I copy and paste

the code back and forth between the REPL and my text editor, as it's convenient. This worklow

and environment (sometimes called REPL-driven-development) makes screen scraping

(a iddly, dificult task at the best of times) almost enjoyable.

See also

F The next recipe, Scraping textual data from web pages , has a more involved example

of data scraping on an HTML page

F The Aggregating data from different formats recipe has a practical, real-life example

of data scraping in a table

Scraping textual data from web pages

Not all of the data on the Web is in tables, as in our last recipe. In general, the process

to access this nontabular data might be more complicated, depending on how the page

is structured.

Getting ready

First, we'll use the same dependencies and the require statements as we did in the last

recipe, Scraping data from tables in web pages .

Next, we'll identify the ile to scrape the data from. I've put up a ile at http://www.

This is a much more modern example of a web page. Instead of using tables, it marks up the

text with the section and article tags and other features from HTML5, which help convey

what the text means, not just how it should look.

Search WWH ::

Custom Search

Home