Database Reference
In-Depth Information
["|N00030973|" 9900]
["|N00005656|" 11598514])
This solution uses agents to handle the work, and it uses the STM to manage shared data
structures. The
main
function irst assigns each input ile to an agent. Each agent then reads
the input ile and totals the amount of contributions for each candidate. It takes those totals
and uses the STM to update the shared counts.
Maintaining consistency with ensure
When we use the STM, we are trying to coordinate and maintain consistency between several
values, all of which keep changing. However, we'll sometimes want to maintain consistency
with those references that won't change and therefore won't be included in the transaction.
We can signal that the STM should include these other references in the transaction by
using the
ensure
function.
This helps simplify the data processing system by ensuring that the data structures stay
synchronized and consistent. The
ensure
function allows us to have more control over
what gets managed by the STM.
For this recipe, we'll use a slightly contrived example. We'll process a set of text iles
and compute the frequency of a term as well as the total number of words. We'll do this
concurrently, and we'll be able to watch the results get updated as we progress.
For the set of text iles, we'll use the Brown corpus. Constructed in the 1960s, this was
one of the irst digital collections of texts (or corpora) assembled for linguists to use to
study language. At that time, its size (one million words) was huge. Today, similar corpora
contain 100 million words or more.
Getting ready
We'll need to include the
clojure.string
library and have easy access to the
File
class:
(require '[clojure.string :as string])
(import '[java.io File])
We'll also need to download the Brown corpus. We can download it at
http://www.nltk.
org/nltk_data/
. Actually, you can use any large collection of texts, but the Brown corpus
has each word's part of speech listed in the ile, so we'll need to parse it specially. If you use
a different corpus, you can just change the
tokenize-brown
function, as explained in the
next section, to work with your texts.