Handling Data - Data Points: Visualization That Means Something

Graphics Programs Reference

In-Depth Information

The first thing you need to do is load the page that shows historical weather

information. The URL for historical weather in Buffalo on October 1, 2010,

follows:

www.wunderground.com/history/airport/KBUF/2010/10/1/DailyHistory

.html?req_city=NA&req_state=NA&req_statename=NA

If you remove everything after .html in the preceding URL, the same page

still loads, so get rid of those. You don't care about those right now.

www.wunderground.com/history/airport/KBUF/2010/10/1/DailyHistory.html

The date is indicated in the URL with /2010/10/1 . Using the drop-down

menu, change the date to January 1, 2009, because you're going to scrape

temperature for all of 2009. The URL is now this:

www.wunderground.com/history/airport/KBUF/2009/1/1/DailyHistory.html

everything is the same as the URL for October 1, except the portion

that indicates the date. It's /2009/1/1 now. Interesting. Without using the

drop-down menu, how can you load the page for January 2, 2009? Simply

change the date parameter so that the URL looks like this:

www.wunderground.com/history/airport/KBUF/2009/1/2/DailyHistory.html

Load the preceding URL in your browser and you get the historical sum-

mary for January 2, 2009. So all you have to do to get the weather for a

specific date is to modify the Weather Underground URL. Keep this in mind

for later.

Now load a single page with Python, using the urllib2 library by importing

it with the following line of code:

import urllib2

To load the January 1 page with Python, use the urlopen function.

page = urllib2.urlopen(“www.wunderground.com/history/airport/

KBUF/2009/1/1/DailyHistory.html”)

This loads all the HTML that the URL points to in the page variable. The

next step is to extract the maximum temperature value you're interested

Search WWH ::

Custom Search

Home