Unsupervised Learning -- List of papers

Clustering

Gonzalez. Clustering to minimize the maximum intercluster distance.
Arthur and Vassilvitskii. Kmeans++: the advantages of careful seeding.
Telgarsky and Vattani. Hartigan’s Method: k-means Clustering without Voronoi.
Balcan and Blum. Clustering with Interactive Feedback.
Meila. Comparing Clusterings - An Axiomatic View.
Bachem, Lucic, Hassani, Krause. Fast and Provably Good Seedings for k-Means.
Kanungo, Mount, Netanyahu, Piatko, Silverman, and Wu. A local search approximation algorithm for k-means clustering.
Hartigan. Clustering algorithms.
Lin and Vitter. Approximation algorithms for geometric median problems.
Dasgupta. A cost function for similarity-based hierarchical clustering.
Dasgupta and Long. Performance guarantees for hierarchical clustering.
Ackerman and Dasgupta. Incremental clustering: the case for extra clusters.
Jain, Meka and Dhillon. Simultaneous Unsupervised Learning of Disparate Clusterings
Jardine and Sibson. Mathematical taxonomy.
Kleinberg. An impossibility theorem for clustering.
Balcan, Blum, and Vempala. A discriminative framework for clustering via similarity functions.
Drineas, Kannan, Frieze, Vempala, and Vinay. Clustering large graphs via the singular value decomposition.
Bshouty and Long. Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering.
Chaudhuri, Chung, Tsiatas. Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model.

Dimensionality Reduction

Fakcharoenphol, Rao and Talwar. A tight bound on approximating arbitrary metrics by tree metrics.
rinkman and Charikar. On the Impossibility of Dimension Reduction in L1.
Dasgupta and Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss.
Larsen and Nelson. Optimality of the Johnson-Lindenstrauss lemma.
Verma. A note on random projections for preserving paths on a manifold.
Verma. Distance preserving embeddings for general n-dimensional manifolds.
Schoenberg. Metric spaces and positive definite functions.
Kruskal and Wish. Multidimensional scaling.
Matousek. Lectures on discrete geometry, Chapter 15: Embedding finite metric spaces into normed spaces.
Tenenbaum, de Silva, and Langford. A global geometric framework for nonlinear dimensionality reduction.
Roweis and Saul. Nonlinear dimensionality reduction by locally linear embedding.
Cayton. Algorithms for manifold learning. (survey)
Verma. Mathematical advances in manifold learning. (survey)
Clarkson. Tighter bounds for random projections of manifolds.
Indyk and Naor. Nearest neighbor preserving embeddings
van der Maaten and Hinton. Visualizing High-Dimensional Data Using t-SNE.
Arora, Hu, Kothari. An Analysis of the t-SNE Algorithm for Data Visualization.
Tony Cai and Ma. Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data.
Im, Verma and Branson. Stochastic neighbor embedding under f-divergences.
Arias-Castro. Some theory for ordinal embedding.
Nickel and Kiela. Poincare Embeddings for Learning Hierarchical Representations.
De Sa, Gu, Re and Sala. Representation Tradeoffs for Hyperbolic Embeddings.

Density Estimation

Dasgupta and Schulman. A two-round variant of EM for Gaussian mixtures.
Dasgupta. Learning mixtures of Gaussians.
Dasgupta and Kpotufe. Optimal rates for k-NN density and mode estimation.
Blei, Ng, Jordan. Latent Dirichlet Allocation.
Anandkumar, Ge, Hsu, Kakade, Telgarsky. Tensor decompositions for learning latent variable models.
Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu. A Practical Algorithm for Topic Modeling with Provable Guarantees.
Arora, Ge, Liang, Ma, Zhang. Generalization and Equilibrium in Generative Adversarial Nets
Arora, Risteski, Zhang. Do GANs learn the distribution? Some theory and empirics

Structure discovery

Diaconis, Goel, Holmes. Horseshoes in multidimensional scaling and local kernel methods.
Freund, Dasgupta, Kabra, Verma. Learning the structure of manifolds using random projections.
Fefferman, Mitter, Narayanan. Testing the Manifold Hypothesis.
Chazal and Michel. An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists.

Misc. Topics

Dasgupta and Sinha. Randomized partition trees for nearest neighbor search.
Chaudhuri and Dasgupta. Rates of convergence for the cluster tree.
Dasgupta and Freund. Random projection trees and low dimensional manifolds.
Dasgupta, Hsu, Verma. A concentration theorem for projections.
Clarkson. Nearest-neighbor searching and metric space dimensions.
Niyogi, Smale, Weinberger. Finding the Homology of Submanifolds with High Confidence from Random Samples.
Niyogi, Smale, Weinberger. A Topological View of Unsupervised Learning and Clustering.
Gionis, Indyk, Motwani. Similarity search in high dimensions via hashing.
Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality.
Donoho. Compressed Sensing.