Database Reference
In-Depth Information
2. Now, let's use this function in order to deine a function to rescale by a group:
(defn rescale-by-group [src group dest coll]
(->> coll
(sort-by group)
(group-by group)
vals
(mapcat #(rescale-by-total src dest %))))
3.
We can easily make up some data to test this:
(def word-counts
[{:word 'the, :freq 92, :doc 'a}
{:word 'a, :freq 76,:doc 'a}
{:word 'jack, :freq 4,:doc 'a}
{:word 'the, :freq 3,:doc 'b}
{:word 'a, :freq 2,:doc 'b}
{:word 'mary, :freq 1,:doc 'b}])
Now, we can see how it works:
user=> (pp/pprint (rescale-by-group :freq :doc :scaled
word-counts))
({:freq 92, :word the, :scaled 23/43, :doc a}
{:freq 76, :word a, :scaled 19/43, :doc a}
{:freq 4, :word jack, :scaled 1/43, :doc a}
{:freq 3, :word the, :scaled 1/2, :doc b}
{:freq 2, :word a, :scaled 1/3, :doc b}
{:freq 1, :word mary, :scaled 1/6, :doc b})
We can immediately see that the scaled values are more easily comparable. The scaled
frequencies for the , for example, are approximately in line with each other in the way that
the raw frequencies just aren't (0.53 and 0.5 versus 92 and 3). Of course, since this isn't
a real dataset, the frequencies are meaningless, but this still illustrates the method and
how it improves the dataset.
How it works…
For each function, we pass in a couple of keys: a source key and a destination key.
The irst function, rescale-by-total , totals the values for the source key, and then
sets the destination key to the ratio of the source key for that item and the total for the
source key in all of the items in the collection.
The second function, rescale-by-group , uses another key: the group key. It sorts and
groups the items by group key and then passes each group to rescale-by-total .
 
Search WWH ::




Custom Search