Motivation

Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013. [PDF]

Background

L. Brown. Fundamentals of Statistical Exponential Families. Institute of Mathematical Statistics, Hayward, CA, 1986.
P. McCullagh and J. A. Nelder. Generalized Linear Models. London: Chapman and Hall, 1989.

Probabilistic PCA

M. Tipping and C. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3):611-622, 1999. [PDF]
M. Collins, S. Dasgupta, and R. Schapire. A generalization of principal component analysis to the exponential family. In Neural Information Processing Systems, 2002. [PS]

Statistics and deep learning

D. MacKay. A practical Bayesian framework for backpropagation networks. Neural Computation, 4(448-472), 1992. [PDF]
R. Neal. Bayesian Learning for Neural Networks. Springer, 1996. [PDF]
S. Mohamed. A statistical view of deep learning. 2015. [PDF]

Deep generative models

D. Kingma and M. Welling. Auto-encoding variational Bayes. arXiv 1312.6114, 2013. [PDF]
D. Rezende, S. Mohamed, and D. Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, 2014. [PDF]
J. Tomczak and M. Welling. VAE with a VampPrior. Artificial Intelligence and Statistics, 2018. [PDF]
A. Dieng, Y. Kim, A. Rush, and D. Blei. Avoiding latent variable collapse with generative skip models. In Artificial Intelligence and Statistics, 2019. [PDF]
D. Kingma and M. Welling. An introduction to variational autoencoders. Foundations and Trends in Machine Learning, 2019. [PDF]

Posterior inference

R. Ranganath, S. Gerrish, and D. Blei. Black box variational inference. In Artificial Intelligence and Statistics, 2014.
D. Blei, A. Kucukelbir, and J. McAuliffe. Variational inference: A review for statisticians. Journal of American Statistical Association, 112(518):859-877, 2017.
G. Papamakarios, E. Nalisnick, D. Rezende, S. Mohamed, and B. Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. arXiv:1912.02762, 2019.

Information theory and representation learning

A. Achille and S. Soatto. Emergence of invariance and disentangling in deep representations. Journal of Machine Learning Research, 19(1-34), 2018. [PDF]
A. Alemi, I. Fischer, J. Dillon, and K. Murphy. Deep variational information bottleneck. In International Conference on Learning Representations, 2017. [PDF]
A. Alemi, B. Poole, I. Fischer, J. Dillon, R. Saurous, and K. Murphy. Fixing a broken ELBO. arXiv:1711.00464, 2017. [PDF]

Disentanglement and invariance

J. Peters, P. Buhlmann, and N. Meinshausen. Causal inference by using invariant prediction: Identification and confidence intervals. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(5):947-1012, 2016.
R. Suter, D. Miladinovic, S. Bauer, and B. Schoelkopf. Interventional robustness of deep latent variable models. arXiv:1811.00007, 2018.
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv:1907.02893, 2019.
A. Buja, L. Brown, A. K. Kuchibhotla, R. Berk, E. George, and L. Zhao. Models as approximations II: A model-free theory of parametric regression. Statistical Science, 34(4):545-565, 2019.

Embeddings

D. Rumelhart and A. Abrahamson. A model for analogical reasoning. Cognitive Psychology, 5(1):1-28, 1973.
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137-1155, 2003.
M. Rudolph, F. Ruiz, S. Mandt, and D. Blei. Exponential family embeddings. In Neural Information Processing Systems. 2016.
F. Ruiz, S. Athey, and D. Blei. SHOPPER: A probabilistic model of consumer choice with substitutes and complements. Annals of Applied Statistics, to appear.