Graphics Programs Reference
In-Depth Information
in from that HTML, and for that, Beautiful Soup makes your task much
easier. After urllib2 , import Beautiful Soup like so:
from BeautifulSoup import BeautifulSoup
At the end of your file, use Beautiful Soup to read (that is, parse) the page.
soup = BeautifulSoup(page)
Without getting into nitty-gritty details, this line of code reads the HTML,
which is essentially one long string, and then stores elements of the page,
such as the header or images, in a way that is easier to work with.
Beautiful Soup
provides good
documentation
and straightfor-
ward examples,
so if any of this
is confusing, I
strongly encour-
age you to check
those out on the
same Beautiful
Soup site you used
to download the
library.
For example, if you want to find all the images in the page, you can
use this:
images = soup.findAll('img')
This gives you a list of all the images on the Weather Underground page
displayed with the <img /> HTML tag. Want the first image on the page? Do
this:
first_image = images[0]
Want the second image? Change the zero to a one. If you want the src value
in the first <img /> tag, you would use this:
src = first_image['src']
Okay, you don't want images. You just want that one value: maximum
temperature on January 1, 2009, in Buffalo, New York. It was 26 degrees
Fahrenheit. It's a little trickier finding that value in your soup than it was
finding images, but you still use the same method. You just need to figure
out what to put in findAll() , so look at the HTML source.
You can easily do this in all the major browsers. In Firefox, go to the View
menu, and select Page Source. A window with the HTML for your current
page appears, as shown in Figure 2-5.
Scroll down to where it shows Mean Temperature, or just search for it,
which is faster. Spot the 26 . That's what you want to extract.
The row is enclosed by a <span> tag with a nobr class. That's your key. You
can find all the elements in the page with the nobr class.
nobrs = soup.findAll(attrs={“class”:”nobr”})
Search WWH ::




Custom Search