Y. Bengio, A. Courville, and P. Vincent. Representation learning: A
review and new perspectives. IEEE Transactions on Pattern Analysis and
Machine Intelligence,
35(8):1798-1828, 2013. [PDF]
Background
L. Brown. Fundamentals of Statistical Exponential
Families. Institute of Mathematical Statistics, Hayward, CA, 1986.
P. McCullagh and J. A. Nelder. Generalized Linear Models. London:
Chapman and Hall, 1989.
Probabilistic PCA
M. Tipping and C. Bishop. Probabilistic principal component
analysis. Journal of the Royal Statistical Society: Series B
(Statistical Methodology),
61(3):611-622, 1999. [PDF]
M. Collins, S. Dasgupta, and R. Schapire. A generalization of
principal component analysis to the exponential family. In Neural
Information Processing
Systems, 2002. [PS]
Statistics and deep learning
D. MacKay. A practical Bayesian framework for backpropagation
networks. Neural Computation,
4(448-472), 1992. [PDF]
R. Neal. Bayesian Learning for Neural
Networks. Springer, 1996. [PDF]
S. Mohamed. A statistical view of deep
learning. 2015. [PDF]
Deep generative models
D. Kingma and M. Welling. Auto-encoding variational
Bayes. arXiv 1312.6114, 2013. [PDF]
D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation
and approximate inference in deep generative models. In
International Conference on Machine
Learning, 2014. [PDF]
J. Tomczak and M. Welling. VAE with a VampPrior. Artificial
Intelligence and Statistics, 2018. [PDF]
A. Dieng, Y. Kim, A. Rush, and D. Blei. Avoiding latent variable
collapse with generative skip models. In Artificial Intelligence and
Statistics, 2019. [PDF]
D. Kingma and M. Welling. An introduction to variational
autoencoders. Foundations and Trends in Machine
Learning, 2019. [PDF]
Posterior inference
R. Ranganath, S. Gerrish, and D. Blei. Black box variational
inference. In Artificial Intelligence and Statistics, 2014.
D. Blei, A. Kucukelbir, and J. McAuliffe. Variational inference: A
review for statisticians. Journal of American Statistical
Association, 112(518):859-877, 2017.
G. Papamakarios, E. Nalisnick, D. Rezende, S. Mohamed, and
B. Lakshminarayanan. Normalizing flows for probabilistic modeling
and inference. arXiv:1912.02762, 2019.
Information theory and representation learning
A. Achille and S. Soatto. Emergence of invariance and disentangling
in deep representations. Journal of Machine Learning Research,
19(1-34), 2018. [PDF]
A. Alemi, I. Fischer, J. Dillon, and K. Murphy. Deep variational
information bottleneck. In International Conference on Learning
Representations, 2017. [PDF]
A. Alemi, B. Poole, I. Fischer, J. Dillon, R. Saurous, and
K. Murphy. Fixing a broken
ELBO. arXiv:1711.00464, 2017. [PDF]
Disentanglement and invariance
J. Peters, P. Buhlmann, and N. Meinshausen. Causal inference by
using invariant prediction: Identification and confidence
intervals. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 78(5):947-1012, 2016.
R. Suter, D. Miladinovic, S. Bauer, and
B. Schoelkopf. Interventional robustness of deep latent variable
models. arXiv:1811.00007, 2018.
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant
risk minimization. arXiv:1907.02893, 2019.
A. Buja, L. Brown, A. K. Kuchibhotla, R. Berk, E. George, and
L. Zhao. Models as approximations II: A model-free theory of
parametric regression. Statistical Science, 34(4):545-565, 2019.
Embeddings
D. Rumelhart and A. Abrahamson. A model for analogical
reasoning. Cognitive Psychology, 5(1):1-28, 1973.
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural
probabilistic language model. Journal of Machine Learning Research,
3:1137-1155, 2003.
M. Rudolph, F. Ruiz, S. Mandt, and D. Blei. Exponential family
embeddings. In Neural Information Processing Systems. 2016.
F. Ruiz, S. Athey, and D. Blei. SHOPPER: A probabilistic model of
consumer choice with substitutes and complements. Annals of Applied
Statistics, to appear.