Graphics Programs Reference
In-Depth Information
move on to the next month. If you want to scrape multiple years, you need
to use an additional if statement to handle leap years.
Similarly, if it's not February, but instead April, June, September, or
November, move on to the next month if the current day is greater than 30.
# Check if already gone through month
if (m == 2 and d > 28):
break
elif (m in [4, 6, 9, 11] and d > 30):
break
Again, the next few lines of code should look familiar. You used them to
scrape a single page from Weather Underground. The difference is in
the month and day variable in the URL. Change that for each day instead
of leaving it static; the rest is the same. Load the page with the urllib2
library, parse the contents with Beautiful Soup, and then extract the maxi-
mum temperature, but look for the sixth appearance of the nobr class.
# Open wunderground.com url
url = “http://www.wunderground.com/history/airport/KBUF/2009/” +
str(m) + “/” + str(d) + “/DailyHistory.html”
page = urllib2.urlopen(url)
# Get temperature from page
soup = BeautifulSoup(page)
# dayTemp = soup.body.nobr.b.string
dayTemp = soup.findAll(attrs={“class”:”nobr”})[5].span.string
The next to last chunk of code puts together a timestamp based on the
year, month, and day. Timestamps are put into this format: yyyymmdd. You
can construct any format here, but keep it simple for now.
# Format day for timestamp
if len(str(d)) < 2:
dStamp = '0' + str(d)
else:
dStamp = str(d)
# Build timestamp
timestamp = '2009' + mStamp + dStamp
Finally, the temperature and timestamp are written to 'wunder-data.txt'
using the write() method.
Search WWH ::




Custom Search