Probabilistic Models of Discrete Data
Spring 2016, Columbia University
David M. Blei
Day/Time: Fridays, 12:10PM-2:30PM
Location: Mudd 633
Piazza site
Course Description
We will study probabilistic models of discrete data, especially
focusing on large-scale data sets that are high-dimensional and
sparse. Discrete data sets are found in diverse applications of
statistical machine learning, such as natural language processing,
recommendation systems, computational neuroscience, and statistical
genetics. Topics will include embeddings, mixed-membership models
(topic models), scalable computation, Bayesian nonparametrics, and
model diagnosis. During the course of the semester, each student will
be expected to complete an ambitious project around a real-world
problem.
Prerequisites. The prerequisite course is Foundations of
Graphical Models and you should be comfortable with its material.
Specifically, you should be able to write down a new model where each
complete conditional is in the exponential family, derive and
implement an approximate inference algorithm for the model, and
understand how to interpret its results. You should also be fluent in
the semantics of graphical models. Finally, note this is a
seminar. It is only open to PhD students. Auditors are not permitted.
Reading assignments and notes
- Introduction and logistics
- Notes (Class logistics, requirements, etc.)
- Word embeddings I
- Word embeddings II
- Word embeddings III
- Factorization I
- Aside: Stochastic optimization and variational inference
- Inverse regression for text
- Bayesian nonparametrics
- Reading (introductory):
Blei
and Gershman, 2013 (Sections 1,2,3)
- Reading (introductory): Neal, 2000 (Sections 1,2,3)
- Reading (advanced):
Jordan, 2013 (Sections 1, 2)
- Reading (advanced):
Teh and Jordan, 2010 (Sections 1, 2, 5)
- Lecture
notes from FOGM
- Scribe notes (I) (Kui Tang)
- Scribe notes (II) (Benjamin Bloem-Reddy)
- More Bayesian nonparametrics
- Discrete choice models
- Statistical analysis of networks