Unsupervised Learning -- List of papers
Clustering
- Gonzalez. Clustering to minimize the maximum intercluster distance.
- Arthur and Vassilvitskii. Kmeans++: the advantages of careful seeding.
- Telgarsky and Vattani. Hartigan’s Method: k-means Clustering without Voronoi.
- Balcan and Blum. Clustering with Interactive Feedback.
- Meila. Comparing Clusterings - An Axiomatic View.
- Bachem, Lucic, Hassani, Krause. Fast and Provably Good Seedings for k-Means.
- Kanungo, Mount, Netanyahu, Piatko, Silverman, and Wu. A local search approximation algorithm for k-means clustering.
- Hartigan. Clustering algorithms.
- Lin and Vitter. Approximation algorithms for geometric median problems.
- Dasgupta. A cost function for similarity-based hierarchical clustering.
- Dasgupta and Long. Performance guarantees for hierarchical clustering.
- Ackerman and Dasgupta. Incremental clustering: the case for extra clusters.
- Jain, Meka and Dhillon. Simultaneous Unsupervised Learning of Disparate Clusterings
- Jardine and Sibson. Mathematical taxonomy.
- Kleinberg. An impossibility theorem for clustering.
- Balcan, Blum, and Vempala. A discriminative framework for clustering via similarity functions.
- Drineas, Kannan, Frieze, Vempala, and Vinay. Clustering large graphs via the singular value decomposition.
- Bshouty and Long. Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering.
- Chaudhuri, Chung, Tsiatas. Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model.
Dimensionality Reduction
- Fakcharoenphol, Rao and Talwar. A tight bound on approximating arbitrary metrics by tree metrics.
- rinkman and Charikar. On the Impossibility of Dimension Reduction in L1.
- Dasgupta and Gupta. An elementary proof of a theorem of Johnson and Lindenstrauss.
- Larsen and Nelson. Optimality of the Johnson-Lindenstrauss lemma.
- Verma. A note on random projections for preserving paths on a manifold.
- Verma. Distance preserving embeddings for general n-dimensional manifolds.
- Schoenberg. Metric spaces and positive definite functions.
- Kruskal and Wish. Multidimensional scaling.
- Matousek. Lectures on discrete geometry, Chapter 15: Embedding finite metric spaces into normed spaces.
- Tenenbaum, de Silva, and Langford. A global geometric framework for nonlinear dimensionality reduction.
- Roweis and Saul. Nonlinear dimensionality reduction by locally linear embedding.
- Cayton. Algorithms for manifold learning. (survey)
- Verma. Mathematical advances in manifold learning. (survey)
- Clarkson. Tighter bounds for random projections of manifolds.
- Indyk and Naor. Nearest neighbor preserving embeddings
- van der Maaten and Hinton. Visualizing High-Dimensional Data Using t-SNE.
- Arora, Hu, Kothari. An Analysis of the t-SNE Algorithm for Data Visualization.
- Tony Cai and Ma. Theoretical Foundations of t-SNE for Visualizing
High-Dimensional Clustered Data.
- Im, Verma and Branson. Stochastic neighbor embedding under f-divergences.
- Arias-Castro. Some theory for ordinal embedding.
- Nickel and Kiela. Poincare Embeddings for Learning Hierarchical Representations.
- De Sa, Gu, Re and Sala. Representation Tradeoffs for Hyperbolic Embeddings.
Density Estimation
- Dasgupta and Schulman. A two-round variant of EM for Gaussian mixtures.
- Dasgupta. Learning mixtures of Gaussians.
- Dasgupta and Kpotufe. Optimal rates for k-NN density and mode estimation.
- Blei, Ng, Jordan. Latent Dirichlet Allocation.
- Anandkumar, Ge, Hsu, Kakade, Telgarsky. Tensor decompositions for learning latent variable models.
- Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, Zhu. A Practical Algorithm for Topic Modeling with Provable Guarantees.
- Arora, Ge, Liang, Ma, Zhang. Generalization and Equilibrium in Generative Adversarial Nets
- Arora, Risteski, Zhang. Do GANs learn the distribution? Some theory and empirics
Structure discovery
- Diaconis, Goel, Holmes. Horseshoes in multidimensional scaling and local kernel methods.
- Freund, Dasgupta, Kabra, Verma. Learning the structure of manifolds using random projections.
- Fefferman, Mitter, Narayanan. Testing the Manifold Hypothesis.
- Chazal and Michel. An introduction to Topological Data Analysis: fundamental
and practical aspects for data scientists.
Misc. Topics
-
Dasgupta and Sinha. Randomized partition trees for nearest neighbor search.
- Chaudhuri and Dasgupta. Rates of convergence for the cluster tree.
- Dasgupta and Freund. Random projection trees and low dimensional manifolds.
- Dasgupta, Hsu, Verma. A concentration theorem for projections.
- Clarkson. Nearest-neighbor searching and metric space dimensions.
- Niyogi, Smale, Weinberger. Finding the Homology of Submanifolds with High Confidence from Random Samples.
- Niyogi, Smale, Weinberger. A Topological View of Unsupervised Learning and Clustering.
- Gionis, Indyk, Motwani. Similarity search in high dimensions via hashing.
- Indyk and Motwani. Approximate nearest neighbors: towards removing the curse of dimensionality.
- Donoho. Compressed Sensing.