This is a C implementation of variational EM for latent Dirichlet allocation (LDA), a topic model for text or other discrete data. LDA allows you to analyze of corpus, and extract the topics that combined to form its documents. For example, click here to see the topics estimated from a small corpus of Associated Press documents. LDA is fully described in Blei et al. (2003) .
This code contains:
Download the readme.txt .
Download the code: lda-c.tgz .
2246 documents from the Associated Press [ download ].
Top 20 words from 100 topics estimated from the AP corpus [pdf].
To learn about bug-fixes, updates, and discuss LDA and related techniques, please join the topic-models mailing list, topic-models [at] lists.cs.princeton.edu.
To join, click here .