Handling Data - Data Points: Visualization That Means Something

Graphics Programs Reference

In-Depth Information

move on to the next month. If you want to scrape multiple years, you need

to use an additional if statement to handle leap years.

Similarly, if it's not February, but instead April, June, September, or

November, move on to the next month if the current day is greater than 30.

# Check if already gone through month

if (m == 2 and d > 28):

break

elif (m in [4, 6, 9, 11] and d > 30):

break

Again, the next few lines of code should look familiar. You used them to

scrape a single page from Weather Underground. The difference is in

the month and day variable in the URL. Change that for each day instead

of leaving it static; the rest is the same. Load the page with the urllib2

library, parse the contents with Beautiful Soup, and then extract the maxi-

mum temperature, but look for the sixth appearance of the nobr class.

# Open wunderground.com url

url = “http://www.wunderground.com/history/airport/KBUF/2009/” +

str(m) + “/” + str(d) + “/DailyHistory.html”

page = urllib2.urlopen(url)

# Get temperature from page

soup = BeautifulSoup(page)

# dayTemp = soup.body.nobr.b.string

dayTemp = soup.findAll(attrs={“class”:”nobr”})[5].span.string

The next to last chunk of code puts together a timestamp based on the

year, month, and day. Timestamps are put into this format: yyyymmdd. You

can construct any format here, but keep it simple for now.

# Format day for timestamp

if len(str(d)) < 2:

dStamp = '0' + str(d)

else:

dStamp = str(d)

# Build timestamp

timestamp = '2009' + mStamp + dStamp

Finally, the temperature and timestamp are written to 'wunder-data.txt'

using the write() method.

Search WWH ::

Custom Search

Home