Database Reference
In-Depth Information
["|N00030973|" 9900]
["|N00005656|" 11598514])
This solution uses agents to handle the work, and it uses the STM to manage shared data
structures. The main function irst assigns each input ile to an agent. Each agent then reads
the input ile and totals the amount of contributions for each candidate. It takes those totals
and uses the STM to update the shared counts.
Maintaining consistency with ensure
When we use the STM, we are trying to coordinate and maintain consistency between several
values, all of which keep changing. However, we'll sometimes want to maintain consistency
with those references that won't change and therefore won't be included in the transaction.
We can signal that the STM should include these other references in the transaction by
using the ensure function.
This helps simplify the data processing system by ensuring that the data structures stay
synchronized and consistent. The ensure function allows us to have more control over
what gets managed by the STM.
For this recipe, we'll use a slightly contrived example. We'll process a set of text iles
and compute the frequency of a term as well as the total number of words. We'll do this
concurrently, and we'll be able to watch the results get updated as we progress.
For the set of text iles, we'll use the Brown corpus. Constructed in the 1960s, this was
one of the irst digital collections of texts (or corpora) assembled for linguists to use to
study language. At that time, its size (one million words) was huge. Today, similar corpora
contain 100 million words or more.
Getting ready
We'll need to include the clojure.string library and have easy access to the File class:
(require '[clojure.string :as string])
(import '[java.io File])
We'll also need to download the Brown corpus. We can download it at http://www.nltk.
org/nltk_data/ . Actually, you can use any large collection of texts, but the Brown corpus
has each word's part of speech listed in the ile, so we'll need to parse it specially. If you use
a different corpus, you can just change the tokenize-brown function, as explained in the
next section, to work with your texts.
 
Search WWH ::




Custom Search