Graphics Programs Reference
In-Depth Information
XML is relatively easy to parse with libraries such as Beautiful Soup in
Python. You can get a better feel for XML, along with CSV and JSON, in the
sections that follow.
Formatting Tools
Just a couple of years ago, quick scripts were always written to handle and
format data. After you've written a few scripts, you start to notice patterns
in the logic, so it's not super hard to write new scripts for specific data-
sets, but it does take time. Luckily, with growing volumes of data, some
tools have been developed to handle the boiler plate routines.
GooGLE rEFINE
Google Refine is the evolution of Freebase Gridworks. Gridworks was first
developed as an in-house tool for an open data platform, Freebase; how-
ever, Freebase was acquired by Google, therefore the new name. Google
Refine is essentially Gridworks 2.0 with an easier-to-use interface (Fig-
ure 2-8) with more features.
It runs on your desktop (but still through your browser), which is great,
because you don't need to worry about uploading private data to Google's
servers. All the processing happens on your computer. Refine is also open
source, so if you feel ambitious, you can cater the tool to your own needs
with extensions.
When you open Refine, you see a familiar spreadsheet interface with your
rows and columns. You can easily sort by field and search for values. You
can also find inconsistencies in your data and consolidate in a relatively
easy way.
For example, say for some reason you have an inventory list for your
kitchen. You can load the data in Refine and quickly find inconsistencies
such as typos or differing classifications. Maybe a fork was misspelled as
“frk,” or you want to reclassify all the forks, spoons, and knives as uten-
sils. You can easily find these things with Refine and make changes. If you
don't like the changes you made or make a mistake, you can revert to the
old dataset with a simple undo.
Search WWH ::




Custom Search