Database Reference
In-Depth Information
{"kind": "customsearch#search",
"url": {
"type": "application/json",
...
"items": [
{
"kind": "customsearch#result",
"title": "mana cross pang confidante surplus fine formic beach metallurgy ...",
"htmlTitle": "mana cross pang confidante surplus fine formic beach metallurgy
\u003cb\u003e...\u003c/b\u003e",
"link":
"http://www.cs.caltech.edu/courses/cs11/material/advjava/lab4/unsorted_words.txt",
"displayLink": "www.cs.caltech.edu",
"snippet": "... phonic phenotype exchangeable Pete pesticide exegete exercise
persuasion .... lopsided judiciary Lear proverbial warden Sumatra Hempstead
confiscate ...",
},
...
Wikipedia
Wikipedia doesn't ofer an API, but it does ofer bulk data downloads of almost everything on
the site. One of my favorite uses for this information is extracting the titles of all the articles to
create a list of the names of people, places, and concepts to match text against. The hardest part
about this is the pollution of the data set with many obscure or foreign titles, so I usually use the
traic statistics that are available as a separate bulk download to restrict my matching to only the
most popular topics. Once you've got this shortlist, you can use it to extract interesting words or
phrases from free text, without needing to do any more complex semantic analysis.
Google Suggest
Though it's not an oicial API, the autocomplete feature that's used in Google's toolbars is a
fascinating source of user-generated data. It returns the top ten search terms that begin with the
phrase you pass in, along with rough counts for the popularity of each search. The data is ac-
cessed through a simple web URL, and it is returned as XML. Unfortunately, since it's not a
documented interface, you're probably technically violating Google's terms of service by using it
outside of a toolbar, and it would be unwise to call the API too frequently:
curl "http://google.com/complete/search?output=toolbar&q=%22San+Francisco+is+"
<?xml version="1.0"?><toplevel>
<CompleteSuggestion><suggestion data="san francisco is in what county"/>
<num_queries int="77100000"/></CompleteSuggestion>
<CompleteSuggestion><suggestion data="san francisco is full of characters"/>
Search WWH ::




Custom Search