Database Reference
In-Depth Information
Figure 9.4
The intuitions behind LDA
The reader can refer to the original paper [29] for the mathematical detail of LDA.
Basically, LDA can be viewed as a case of hierarchical Bayesian estimation with a
posterior distribution to group data such as documents with similar topics.
Many programming tools provide software packages that can perform LDA over
datasets. R comes with an
lda
package [31] that has built-in functions and sample
datasets. The
lda
package was developed by David M. Blei's research group [32].
Figure 9.5
shows the distributions of ten topics on nine scientific documents
randomly drawn from the
cora
dataset of the
lda
package. The
cora
dataset is
a collection of 2,410 scientific documents extracted from the Cora search engine
[33].