Database Reference
In-Depth Information
user=> [@finished @running-report]
[false {:frequency 79, :ratio 6.933839E-4, :term :committee}]
user=> [@finished @running-report]
[false {:frequency 105, :ratio 2.5916903E-4, :term :committee}]
user=> [@finished @running-report]
[false {:frequency 164, :ratio 1.845714E-4, :term :committee}]
user=> [@finished @running-report]
[true {:frequency 168, :ratio 1.4468178E-4, :term :committee}]
We can see from the ratio of the committee frequency to the total frequency that initially
the word committee occurred relatively often (0.07 percent, which is approximately the
frequency of other common words in the overall corpus). However, by the end of processing,
its frequency had settled down to about 0.014 percent of the total number of words, which is
closer to what we would expect.
How it works…
In this recipe, compute-frequencies triggers everything. It creates a new agent
that processes the input iles one-by-one and updates most of the references in the
compute-file function.
The compute-report function handles the updating of the running report. It bases that report
on the frequency map and the total words. However, it doesn't change either of the two. But to
keep everything synchronized, it calls ensure on both. Otherwise, there's a chance that the
count of total words comes from one set of documents and the term frequency from another
set. This isn't likely given that only one agent is updating those values, but if we decided to have
more than one agent processing the iles, that would be a possibility. To generate a report for
a new term without reading all of the iles again, we can deine this function:
(defn get-report [term]
(send running-report #(assoc % :term term))
(send running-report compute-report)
(await running-report)
@running-report)
Introducing safe side effects into the STM
The STM isn't safe as far as side effects are concerned. Since a dosync block may get retried,
possibly more than once, any side effects will be executed again and again, whether they
should be or not. Values may get written to the screen or logile multiple times. Worse, values
may be written to the database more than once.
However, all programs must produce side effects. The trick is adding them while getting a
handle on complexity. The easiest way to do that is to keep side effects out of transactions.
 
Search WWH ::




Custom Search