Database Reference
In-Depth Information
Now, we can take these three functions and use them to train a topic model. While training,
it will output some information about the process, and inally, it will list the top terms for
each topic:
user=> (def pipe-list (make-pipe-list))
user=> (add-directory-files pipe-list "sotu/")
user=> (def tm (train-model 10 4 50 pipe-list))
INFO:
0 0.1 government federal year national congress war
1 0.1 world nation great power nations people
2 0.1 world security years programs congress program
3 0.1 law business men work people good
4 0.1 america people americans american work year
5 0.1 states government congress public people united
6 0.1 states public made commerce present session
7 0.1 government year department made service legislation
8 0.1 united states congress act government war
9 0.1 war peace nation great men people
How it works…
It's dificult to succinctly and clearly explain how topic modeling works. Conceptually, it assigns
words from the documents to buckets (topics). This is done in such a way that randomly
drawing words from the buckets will most probably recreate the documents.
Interpreting the topics is always interesting. Generally, it involves taking a look at the top
words for each topic and cross-referencing them with the documents that scored most highly
for this topic.
For example, take the fourth topic, with the top words law , business , men , and work .
The top-scoring document for this topic was the 1908 SOTU, with a distribution of 0.378.
This was given by Theodore Roosevelt, and in his speech, he talked a lot about labor issues
and legislation to rein in corrupt corporations. All of the words mentioned were used a lot,
but understanding exactly what the topic is about isn't evident without actually taking a look
at the document itself.
See also…
There are a number of good papers and tutorials on topic modeling. There's a good
tutorial written by Shawn Graham, Scott Weingart, and Ian Milligan at http://
programminghistorian.org/lessons/topic-modeling-and-mallet
 
Search WWH ::




Custom Search