Databases Reference
In-Depth Information
lynx and lynx --dump
Good if you pine for the 1970s. Oh wait, 1992. Whatever.
Beautiful Soup
Robust but kind of slow.
Mechanize (or here )
Super cool as well, but it doesn't parse JavaScript.
PostScript
Image classification.
Thought Experiment: Image Recognition
How do you determine if an image is a landscape or a headshot?
Start with collecting data. You either need to get someone to label
these things, which is a lot of work, or you can grab lots of pictures
from flickr and ask for photos that have already been tagged.
Represent each image with a binned RGB (red, green, blue) intensity
histogram. In other words, for each pixel, and for each of red, green,
and blue, which are the basic colors in pixels, you measure the inten‐
sity, which is a number between 0 and 255. Represent each image with
a binned RGB (red, green, blue) intensity histogram. In other words,
for each pixel, and for each of red, green, and blue, which are the basic
colors in pixels, you measure the intensity, which is a number between
0 and 255.
Then draw three histograms, one for each basic color, showing how
many pixels had which intensity. It's better to do a binned histogram,
so have counts of the number of pixels of intensity 0-51, etc. In the
end, for each picture, you have 15 numbers, corresponding to 3 colors
and 5 bins per color. We are assuming here that every picture has the
same number of pixels.
Finally, use k-NN to decide how much “blue” makes a landscape ver‐
sus a headshot. You can tune the hyperparameters, which in this case
are the number of bins as well as k.
 
Search WWH ::




Custom Search