Graphics Programs Reference
In-Depth Information
In this example, you had to find two patterns. The first was in the URL,
and the second was in the loaded web page to get the actual temperature
value. To load the page for a different day in 2009, you changed the month
and day portions of the URL. The temperature value was enclosed in the
sixth occurrence of the nobr class in the HTML page. If there is no obvious
pattern to the URL, try to figure out how you can get the URLs of all the
pages you want to scrape. Maybe the site has a site map, or maybe you can
go through the index via a search engine. In the end, you need to know all
the URLs of the pages of data.
After you find the patterns, you iterate. That is, you visit all the pages pro-
grammatically, load them, and parse them. Here you did it with Beautiful
Soup, which makes parsing XML and HTML easy in Python. There's prob-
ably a similar library if you choose a different programming language.
Lastly, you need to store it somewhere. The easiest solution is to store
the data as a plain text file with comma-delimited values, but if you have a
database set up, you can also store the values in there.
Things can get trickier as you run into web pages that use JavaScript to
load all their data into view, but the process is still the same.
Formatting Data
Different visualization tools use different data formats, and the structure
you use varies by the story you want to tell. So the more flexible you are
with the structure of your data, the more possibilities you can gain. Make
use of data formatting applications, and couple that with a little bit of pro-
gramming know-how, and you can get your data in any format you want to
fit your specific needs.
The easy way of course is to find a programmer who can format and parse
all of your data, but you'll always be waiting on someone. This is especially
evident during the early stages of any project where iteration and data
exploration are key in designing a useful visualization. Honestly, if I were
in a hiring position, I'd likely just get the person who knows how to work
with data, over the one who needs help at the beginning of every project.
Search WWH ::




Custom Search