Introduction to Data Science
New York University - Spring 2019
Class is held in 60FA 110, Wed 3:00-4:40pm
Office hours: CDS 618
Friday 10-11am: Lecturer, Iddo Drori
Wednesday 11-12pm: Section Leader, Andrea Wang
Monday 1-2pm: Grader, Sai Anirudh Kondaveeti
Thursday 11-12pm: Grader, Divyanch Khanna
Spring 2019 classes begin (Monday, January 28)
Lecture 1 (Wednesday, January 30): Introduction
Data collection, cleaning, storage, retrieval, learning, visualization.
Readings: CASI Epilogue, DSB Ch 1-3.
Science and data science, David M. Blei and Padhraic Smyth, PNAS 2017.
Lab 1: Tableau
Homework 1: in Tableau, due February 11.
Lecture 2 (Wednesday, February 6): Supervised learning, fitting
Readings: DSB Ch 4-5.
Optional: CASI Ch 8.
Lab 2: Python, NumPy, Sklearn.
Lecture 3 (Wednesday, February 13): Unsupervised learning, clustering, dimensionality reduction
Readings: DSB Ch 6.
Lab 3: In-class Kaggle competition
Homework 2: due March 6.
Presidents' day, no classes scheduled (Monday, February 18)
Lecture 4 (Wednesday, February 20): Performance measures, ROC curve
Readings: DSB Ch 7-9.
Lab 4: Sklearn confusion matrix, ROC
Lecture 5 (Wednesday, February 27): SGD, Feature extraction and selection, random forests, gradient boosting
Readings: CASI Ch 16.
Lab 5: Sklearn SGD, feature extraction and selection
Lecture 6 (Wednesday, March 6): Neural Networks, text mining, sequence models
Readings: DSB Ch 10.
Optional: CASI Ch 18.
Lab 6: Spelling correction, NLTK
Lecture 7 (Wednesday, March 13): Features, combining models, example midterm
Readings: DSB Ch 11-12.
Lab 7: Sklearn random forest, boosting
Spring recess (Monday, March 18 to Sunday, March 24)
Lecture 8 (Wednesday, March 27): Midterm in class
Lecture 9 (Wednesday, April 3): Data visualization
Readings: Visualizations that really work, Scott Berinato, HBR, 2016.
Optional: The visual display of quantitative information, Edward Tufte, 2001.
Lab 9: Tableau with Python
Lecture 10 (Wednesday, April 10): Automatic data science.
Lab 10: Autosklearn, TPOT
Lecture 11 (Wednesday, April 17): Bayesian and variational inference.
Readings: CASI Ch 3.
Optional: Variational inference: a review for statisticians, Blei et al., 2018.
Lab 11
Lecture 12 (Wednesday, April 24): Bias and fairness.
Lecture 13 (Wednesday, May 1): Recommender systems.
Lecture 14 (Wednesday, May 8): Quantum information science.
Last day of Spring 2019 classes (Monday, May 13)
Final exam (Friday, May 17): in class 2:00-3:50pm