Database Reference
In-Depth Information
We'll also use tokenize , get-sentences , normalize , load-stopwords , and is-
stopword from the earlier recipes.
We'll also use the value of the tokens that we saw in the Focusing on content words with
stoplists recipe. Here it is again:
(def tokens
(map #(remove is-stopword (normalize (tokenize %)))
(get-sentences
"I never saw a Purple Cow.
I never hope to see one.
But I can tell you, anyhow.
I'd rather see than be one.")))
How to do it…
Of course, the standard function to count items in a sequence is frequencies . We can use
this to get the token counts for each sentence, but then we'll also want to fold those into a
frequency table using merge-with :
(def token-freqs
(apply merge-with + (map frequencies tokens)))
We can print or query this table to get the count for any token or piece of punctuation,
as follows:
user=> (pprint token-freqs)
{"see" 2,
"purple" 1,
"tell" 1,
"cow" 1,
"anyhow" 1,
"hope" 1,
"never" 2,
"saw" 1,
"'d" 1,
"." 4,
"one" 2,
"," 1,
"rather" 1}
 
Search WWH ::




Custom Search