Topic modeling
Topic models are a suite of algorithms that uncover the hidden thematic structure in document collections. These algorithms help us develop new ways to search, browse and summarize large archives of texts.
Below, you will find links to introductory materials and open source software (from my research group) for topic modeling.
Introductory materials
- I wrote a general introduction to topic modeling .
- John Lafferty and I wrote a more technical review paper about this field.
- Here are slides from some of my talks about topic modeling:
- "Probabilistic Topic
Models"
(2012 ICML Tutorial) - "Probabilistic Topic
Models"
(2012 Machine Learning Summer School, 211 slides) -
"Probabilistic Topic Models: Origins and Challenges"
(2013 Topic Modeling Workshop at NIPS)
- "Probabilistic Topic
Models"
- The topic models mailing list is a good forum for discussing topic modeling.
Topic modeling software
There are many open-source packages available for topic modeling.
Our research group regularly releases code associated with our
papers. We use a GitHub
organization to release it. (If you are not familiar with using
GitHub, see this table for
some of the code.)